All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-18 17:14 ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-18 17:14 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kees Cook
  Cc: gthelen, labbott, kernel-hardening, linux-kernel, linux-mm,
	Thomas Garnier

Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
SLAB freelist. The list is randomized during initialization of a new set
of pages. The order on different freelist sizes is pre-computed at boot
for performance. This security feature reduces the predictability of the
kernel SLAB allocator against heap overflows rendering attacks much less
stable.

For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

Also, since v4.6 the freelist was moved at the end of the SLAB. It means
a controllable heap is opened to new attacks not yet publicly discussed.
A kernel heap overflow can be transformed to multiple use-after-free.
This feature makes this type of attack harder too.

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available at that boot stage. In the worse case this function
will fallback to the get_random_bytes sub API.

The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.

Performance results highlighted no major changes:

Netperf average on 10 runs:

threads,base,change
16,576943.10,585905.90 (101.55%)
32,564082.00,569741.20 (101.00%)
48,558334.30,561851.20 (100.63%)
64,552025.20,556448.30 (100.80%)
80,552294.40,551743.10 (99.90%)
96,552435.30,547529.20 (99.11%)
112,551320.60,550183.20 (99.79%)
128,549138.30,550542.70 (100.26%)
144,549344.50,544529.10 (99.12%)
160,550360.80,539929.30 (98.10%)

slab_test 1 run on boot. After is faster except for odd result on size
2048.

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 118 cycles
10000 times kmalloc(16)/kfree -> 118 cycles
10000 times kmalloc(32)/kfree -> 118 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 118 cycles
10000 times kmalloc(256)/kfree -> 115 cycles
10000 times kmalloc(512)/kfree -> 115 cycles
10000 times kmalloc(1024)/kfree -> 115 cycles
10000 times kmalloc(2048)/kfree -> 115 cycles
10000 times kmalloc(4096)/kfree -> 115 cycles
10000 times kmalloc(8192)/kfree -> 115 cycles
10000 times kmalloc(16384)/kfree -> 115 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 115 cycles
10000 times kmalloc(16)/kfree -> 115 cycles
10000 times kmalloc(32)/kfree -> 115 cycles
10000 times kmalloc(64)/kfree -> 120 cycles
10000 times kmalloc(128)/kfree -> 127 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 112 cycles
10000 times kmalloc(1024)/kfree -> 112 cycles
10000 times kmalloc(2048)/kfree -> 112 cycles
10000 times kmalloc(4096)/kfree -> 112 cycles
10000 times kmalloc(8192)/kfree -> 112 cycles
10000 times kmalloc(16384)/kfree -> 112 cycles

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160418
---
 init/Kconfig |   9 ++++
 mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d..ee35418 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,15 @@ config SLOB
 
 endchoice
 
+config FREELIST_RANDOM
+	default n
+	depends on SLAB
+	bool "SLAB freelist randomization"
+	help
+	  Randomizes the freelist order used on creating new SLABs. This
+	  security feature reduces the predictability of the kernel slab
+	  allocator against heap overflows.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slab.c b/mm/slab.c
index b70aabf..8371d80 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -116,6 +116,7 @@
 #include	<linux/kmemcheck.h>
 #include	<linux/memory.h>
 #include	<linux/prefetch.h>
+#include	<linux/log2.h>
 
 #include	<net/sock.h>
 
@@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/*
+ * Master lists are pre-computed random lists
+ * Lists of different sizes are used to optimize performance on SLABS with
+ * different object counts.
+ */
+static freelist_idx_t master_list_2[2];
+static freelist_idx_t master_list_4[4];
+static freelist_idx_t master_list_8[8];
+static freelist_idx_t master_list_16[16];
+static freelist_idx_t master_list_32[32];
+static freelist_idx_t master_list_64[64];
+static freelist_idx_t master_list_128[128];
+static freelist_idx_t master_list_256[256];
+const static struct m_list {
+	size_t count;
+	freelist_idx_t *list;
+} master_lists[] = {
+	{ ARRAY_SIZE(master_list_2), master_list_2 },
+	{ ARRAY_SIZE(master_list_4), master_list_4 },
+	{ ARRAY_SIZE(master_list_8), master_list_8 },
+	{ ARRAY_SIZE(master_list_16), master_list_16 },
+	{ ARRAY_SIZE(master_list_32), master_list_32 },
+	{ ARRAY_SIZE(master_list_64), master_list_64 },
+	{ ARRAY_SIZE(master_list_128), master_list_128 },
+	{ ARRAY_SIZE(master_list_256), master_list_256 },
+};
+
+/* Pre-compute the Freelist master lists at boot */
+static void __init freelist_random_init(void)
+{
+	unsigned int seed;
+	size_t z, i, rand;
+	struct rnd_state slab_rand;
+
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&slab_rand, seed);
+
+	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
+		for (i = 0; i < master_lists[z].count; i++)
+			master_lists[z].list[i] = i;
+
+		/* Fisher-Yates shuffle */
+		for (i = master_lists[z].count - 1; i > 0; i--) {
+			rand = prandom_u32_state(&slab_rand);
+			rand %= (i + 1);
+			swap(master_lists[z].list[i],
+				master_lists[z].list[rand]);
+		}
+	}
+}
+#else
+static inline void __init freelist_random_init(void) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
 	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
 		slab_max_order = SLAB_MAX_ORDER_HI;
 
+	freelist_random_init();
+
 	/* Bootstrap is tricky, because several objects are allocated
 	 * from caches that do not exist yet:
 	 * 1) initialize the kmem_cache cache: it contains the struct
@@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 #endif
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/* Identify if the target freelist matches the pre-computed list */
+enum master_type {
+	match,
+	less,
+	more
+};
+
+/* Hold information during a freelist initialization */
+struct freelist_init_state {
+	unsigned int padding;
+	unsigned int pos;
+	unsigned int count;
+	struct m_list master_list;
+	unsigned int master_count;
+	enum master_type type;
+};
+
+/* Select the right pre-computed master list and initialize state */
+static void freelist_state_initialize(struct freelist_init_state *state,
+				      unsigned int count)
+{
+	unsigned int idx;
+	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
+
+	memset(state, 0, sizeof(*state));
+	state->count = count;
+	state->pos = 0;
+	/* count is always >= 2 */
+	idx = ilog2(count) - 1;
+	if (idx >= last_idx)
+		idx = last_idx;
+	else if (roundup_pow_of_two(idx + 1) != count)
+		idx++;
+	state->master_list = master_lists[idx];
+	if (state->master_list.count == state->count)
+		state->type = match;
+	else if (state->master_list.count > state->count)
+		state->type = more;
+	else
+		state->type = less;
+}
+
+/* Get the next entry on the master list depending on the target list size */
+static freelist_idx_t get_next_entry(struct freelist_init_state *state)
+{
+	if (state->type == less && state->pos == state->master_list.count) {
+		state->padding += state->pos;
+		state->pos = 0;
+	}
+	BUG_ON(state->pos >= state->master_list.count);
+	return state->master_list.list[state->pos++];
+}
+
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
+{
+	freelist_idx_t cur, entry;
+
+	entry = get_next_entry(state);
+
+	if (state->type != match) {
+		while ((entry + state->padding) >= state->count)
+			entry = get_next_entry(state);
+		cur = entry + state->padding;
+		BUG_ON(cur >= state->count);
+	} else {
+		cur = entry;
+	}
+
+	return cur;
+}
+
+/* Shuffle the freelist initialization state based on pre-computed lists */
+static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
+			     unsigned int count)
+{
+	unsigned int i;
+	struct freelist_init_state state;
+
+	if (count < 2) {
+		for (i = 0; i < count; i++)
+			set_free_obj(page, i, i);
+		return;
+	}
+
+	/* Last chunk is used already in this case */
+	if (OBJFREELIST_SLAB(cachep))
+		count--;
+
+	freelist_state_initialize(&state, count);
+	for (i = 0; i < count; i++)
+		set_free_obj(page, i, next_random_slot(&state));
+
+	if (OBJFREELIST_SLAB(cachep))
+		set_free_obj(page, i, i);
+}
+#else
+static inline void shuffle_freelist(struct kmem_cache *cachep,
+				    struct page *page, unsigned int count) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct page *page)
 {
@@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			kasan_poison_object_data(cachep, objp);
 		}
 
-		set_free_obj(page, i, i);
+		/* If enabled, initialization is done in shuffle_freelist */
+		if (!config_enabled(CONFIG_FREELIST_RANDOM))
+			set_free_obj(page, i, i);
 	}
+
+	shuffle_freelist(cachep, page, cachep->num);
 }
 
 static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-18 17:14 ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-18 17:14 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kees Cook
  Cc: gthelen, labbott, kernel-hardening, linux-kernel, linux-mm,
	Thomas Garnier

Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
SLAB freelist. The list is randomized during initialization of a new set
of pages. The order on different freelist sizes is pre-computed at boot
for performance. This security feature reduces the predictability of the
kernel SLAB allocator against heap overflows rendering attacks much less
stable.

For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

Also, since v4.6 the freelist was moved at the end of the SLAB. It means
a controllable heap is opened to new attacks not yet publicly discussed.
A kernel heap overflow can be transformed to multiple use-after-free.
This feature makes this type of attack harder too.

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available at that boot stage. In the worse case this function
will fallback to the get_random_bytes sub API.

The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.

Performance results highlighted no major changes:

Netperf average on 10 runs:

threads,base,change
16,576943.10,585905.90 (101.55%)
32,564082.00,569741.20 (101.00%)
48,558334.30,561851.20 (100.63%)
64,552025.20,556448.30 (100.80%)
80,552294.40,551743.10 (99.90%)
96,552435.30,547529.20 (99.11%)
112,551320.60,550183.20 (99.79%)
128,549138.30,550542.70 (100.26%)
144,549344.50,544529.10 (99.12%)
160,550360.80,539929.30 (98.10%)

slab_test 1 run on boot. After is faster except for odd result on size
2048.

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 118 cycles
10000 times kmalloc(16)/kfree -> 118 cycles
10000 times kmalloc(32)/kfree -> 118 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 118 cycles
10000 times kmalloc(256)/kfree -> 115 cycles
10000 times kmalloc(512)/kfree -> 115 cycles
10000 times kmalloc(1024)/kfree -> 115 cycles
10000 times kmalloc(2048)/kfree -> 115 cycles
10000 times kmalloc(4096)/kfree -> 115 cycles
10000 times kmalloc(8192)/kfree -> 115 cycles
10000 times kmalloc(16384)/kfree -> 115 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 115 cycles
10000 times kmalloc(16)/kfree -> 115 cycles
10000 times kmalloc(32)/kfree -> 115 cycles
10000 times kmalloc(64)/kfree -> 120 cycles
10000 times kmalloc(128)/kfree -> 127 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 112 cycles
10000 times kmalloc(1024)/kfree -> 112 cycles
10000 times kmalloc(2048)/kfree -> 112 cycles
10000 times kmalloc(4096)/kfree -> 112 cycles
10000 times kmalloc(8192)/kfree -> 112 cycles
10000 times kmalloc(16384)/kfree -> 112 cycles

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160418
---
 init/Kconfig |   9 ++++
 mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d..ee35418 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,15 @@ config SLOB
 
 endchoice
 
+config FREELIST_RANDOM
+	default n
+	depends on SLAB
+	bool "SLAB freelist randomization"
+	help
+	  Randomizes the freelist order used on creating new SLABs. This
+	  security feature reduces the predictability of the kernel slab
+	  allocator against heap overflows.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slab.c b/mm/slab.c
index b70aabf..8371d80 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -116,6 +116,7 @@
 #include	<linux/kmemcheck.h>
 #include	<linux/memory.h>
 #include	<linux/prefetch.h>
+#include	<linux/log2.h>
 
 #include	<net/sock.h>
 
@@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/*
+ * Master lists are pre-computed random lists
+ * Lists of different sizes are used to optimize performance on SLABS with
+ * different object counts.
+ */
+static freelist_idx_t master_list_2[2];
+static freelist_idx_t master_list_4[4];
+static freelist_idx_t master_list_8[8];
+static freelist_idx_t master_list_16[16];
+static freelist_idx_t master_list_32[32];
+static freelist_idx_t master_list_64[64];
+static freelist_idx_t master_list_128[128];
+static freelist_idx_t master_list_256[256];
+const static struct m_list {
+	size_t count;
+	freelist_idx_t *list;
+} master_lists[] = {
+	{ ARRAY_SIZE(master_list_2), master_list_2 },
+	{ ARRAY_SIZE(master_list_4), master_list_4 },
+	{ ARRAY_SIZE(master_list_8), master_list_8 },
+	{ ARRAY_SIZE(master_list_16), master_list_16 },
+	{ ARRAY_SIZE(master_list_32), master_list_32 },
+	{ ARRAY_SIZE(master_list_64), master_list_64 },
+	{ ARRAY_SIZE(master_list_128), master_list_128 },
+	{ ARRAY_SIZE(master_list_256), master_list_256 },
+};
+
+/* Pre-compute the Freelist master lists at boot */
+static void __init freelist_random_init(void)
+{
+	unsigned int seed;
+	size_t z, i, rand;
+	struct rnd_state slab_rand;
+
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&slab_rand, seed);
+
+	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
+		for (i = 0; i < master_lists[z].count; i++)
+			master_lists[z].list[i] = i;
+
+		/* Fisher-Yates shuffle */
+		for (i = master_lists[z].count - 1; i > 0; i--) {
+			rand = prandom_u32_state(&slab_rand);
+			rand %= (i + 1);
+			swap(master_lists[z].list[i],
+				master_lists[z].list[rand]);
+		}
+	}
+}
+#else
+static inline void __init freelist_random_init(void) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
 	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
 		slab_max_order = SLAB_MAX_ORDER_HI;
 
+	freelist_random_init();
+
 	/* Bootstrap is tricky, because several objects are allocated
 	 * from caches that do not exist yet:
 	 * 1) initialize the kmem_cache cache: it contains the struct
@@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 #endif
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/* Identify if the target freelist matches the pre-computed list */
+enum master_type {
+	match,
+	less,
+	more
+};
+
+/* Hold information during a freelist initialization */
+struct freelist_init_state {
+	unsigned int padding;
+	unsigned int pos;
+	unsigned int count;
+	struct m_list master_list;
+	unsigned int master_count;
+	enum master_type type;
+};
+
+/* Select the right pre-computed master list and initialize state */
+static void freelist_state_initialize(struct freelist_init_state *state,
+				      unsigned int count)
+{
+	unsigned int idx;
+	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
+
+	memset(state, 0, sizeof(*state));
+	state->count = count;
+	state->pos = 0;
+	/* count is always >= 2 */
+	idx = ilog2(count) - 1;
+	if (idx >= last_idx)
+		idx = last_idx;
+	else if (roundup_pow_of_two(idx + 1) != count)
+		idx++;
+	state->master_list = master_lists[idx];
+	if (state->master_list.count == state->count)
+		state->type = match;
+	else if (state->master_list.count > state->count)
+		state->type = more;
+	else
+		state->type = less;
+}
+
+/* Get the next entry on the master list depending on the target list size */
+static freelist_idx_t get_next_entry(struct freelist_init_state *state)
+{
+	if (state->type == less && state->pos == state->master_list.count) {
+		state->padding += state->pos;
+		state->pos = 0;
+	}
+	BUG_ON(state->pos >= state->master_list.count);
+	return state->master_list.list[state->pos++];
+}
+
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
+{
+	freelist_idx_t cur, entry;
+
+	entry = get_next_entry(state);
+
+	if (state->type != match) {
+		while ((entry + state->padding) >= state->count)
+			entry = get_next_entry(state);
+		cur = entry + state->padding;
+		BUG_ON(cur >= state->count);
+	} else {
+		cur = entry;
+	}
+
+	return cur;
+}
+
+/* Shuffle the freelist initialization state based on pre-computed lists */
+static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
+			     unsigned int count)
+{
+	unsigned int i;
+	struct freelist_init_state state;
+
+	if (count < 2) {
+		for (i = 0; i < count; i++)
+			set_free_obj(page, i, i);
+		return;
+	}
+
+	/* Last chunk is used already in this case */
+	if (OBJFREELIST_SLAB(cachep))
+		count--;
+
+	freelist_state_initialize(&state, count);
+	for (i = 0; i < count; i++)
+		set_free_obj(page, i, next_random_slot(&state));
+
+	if (OBJFREELIST_SLAB(cachep))
+		set_free_obj(page, i, i);
+}
+#else
+static inline void shuffle_freelist(struct kmem_cache *cachep,
+				    struct page *page, unsigned int count) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct page *page)
 {
@@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			kasan_poison_object_data(cachep, objp);
 		}
 
-		set_free_obj(page, i, i);
+		/* If enabled, initialization is done in shuffle_freelist */
+		if (!config_enabled(CONFIG_FREELIST_RANDOM))
+			set_free_obj(page, i, i);
 	}
+
+	shuffle_freelist(cachep, page, cachep->num);
 }
 
 static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kernel-hardening] [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-18 17:14 ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-18 17:14 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kees Cook
  Cc: gthelen, labbott, kernel-hardening, linux-kernel, linux-mm,
	Thomas Garnier

Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
SLAB freelist. The list is randomized during initialization of a new set
of pages. The order on different freelist sizes is pre-computed at boot
for performance. This security feature reduces the predictability of the
kernel SLAB allocator against heap overflows rendering attacks much less
stable.

For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

Also, since v4.6 the freelist was moved at the end of the SLAB. It means
a controllable heap is opened to new attacks not yet publicly discussed.
A kernel heap overflow can be transformed to multiple use-after-free.
This feature makes this type of attack harder too.

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available at that boot stage. In the worse case this function
will fallback to the get_random_bytes sub API.

The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.

Performance results highlighted no major changes:

Netperf average on 10 runs:

threads,base,change
16,576943.10,585905.90 (101.55%)
32,564082.00,569741.20 (101.00%)
48,558334.30,561851.20 (100.63%)
64,552025.20,556448.30 (100.80%)
80,552294.40,551743.10 (99.90%)
96,552435.30,547529.20 (99.11%)
112,551320.60,550183.20 (99.79%)
128,549138.30,550542.70 (100.26%)
144,549344.50,544529.10 (99.12%)
160,550360.80,539929.30 (98.10%)

slab_test 1 run on boot. After is faster except for odd result on size
2048.

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 118 cycles
10000 times kmalloc(16)/kfree -> 118 cycles
10000 times kmalloc(32)/kfree -> 118 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 118 cycles
10000 times kmalloc(256)/kfree -> 115 cycles
10000 times kmalloc(512)/kfree -> 115 cycles
10000 times kmalloc(1024)/kfree -> 115 cycles
10000 times kmalloc(2048)/kfree -> 115 cycles
10000 times kmalloc(4096)/kfree -> 115 cycles
10000 times kmalloc(8192)/kfree -> 115 cycles
10000 times kmalloc(16384)/kfree -> 115 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 115 cycles
10000 times kmalloc(16)/kfree -> 115 cycles
10000 times kmalloc(32)/kfree -> 115 cycles
10000 times kmalloc(64)/kfree -> 120 cycles
10000 times kmalloc(128)/kfree -> 127 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 112 cycles
10000 times kmalloc(1024)/kfree -> 112 cycles
10000 times kmalloc(2048)/kfree -> 112 cycles
10000 times kmalloc(4096)/kfree -> 112 cycles
10000 times kmalloc(8192)/kfree -> 112 cycles
10000 times kmalloc(16384)/kfree -> 112 cycles

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160418
---
 init/Kconfig |   9 ++++
 mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d..ee35418 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,15 @@ config SLOB
 
 endchoice
 
+config FREELIST_RANDOM
+	default n
+	depends on SLAB
+	bool "SLAB freelist randomization"
+	help
+	  Randomizes the freelist order used on creating new SLABs. This
+	  security feature reduces the predictability of the kernel slab
+	  allocator against heap overflows.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slab.c b/mm/slab.c
index b70aabf..8371d80 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -116,6 +116,7 @@
 #include	<linux/kmemcheck.h>
 #include	<linux/memory.h>
 #include	<linux/prefetch.h>
+#include	<linux/log2.h>
 
 #include	<net/sock.h>
 
@@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/*
+ * Master lists are pre-computed random lists
+ * Lists of different sizes are used to optimize performance on SLABS with
+ * different object counts.
+ */
+static freelist_idx_t master_list_2[2];
+static freelist_idx_t master_list_4[4];
+static freelist_idx_t master_list_8[8];
+static freelist_idx_t master_list_16[16];
+static freelist_idx_t master_list_32[32];
+static freelist_idx_t master_list_64[64];
+static freelist_idx_t master_list_128[128];
+static freelist_idx_t master_list_256[256];
+const static struct m_list {
+	size_t count;
+	freelist_idx_t *list;
+} master_lists[] = {
+	{ ARRAY_SIZE(master_list_2), master_list_2 },
+	{ ARRAY_SIZE(master_list_4), master_list_4 },
+	{ ARRAY_SIZE(master_list_8), master_list_8 },
+	{ ARRAY_SIZE(master_list_16), master_list_16 },
+	{ ARRAY_SIZE(master_list_32), master_list_32 },
+	{ ARRAY_SIZE(master_list_64), master_list_64 },
+	{ ARRAY_SIZE(master_list_128), master_list_128 },
+	{ ARRAY_SIZE(master_list_256), master_list_256 },
+};
+
+/* Pre-compute the Freelist master lists at boot */
+static void __init freelist_random_init(void)
+{
+	unsigned int seed;
+	size_t z, i, rand;
+	struct rnd_state slab_rand;
+
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&slab_rand, seed);
+
+	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
+		for (i = 0; i < master_lists[z].count; i++)
+			master_lists[z].list[i] = i;
+
+		/* Fisher-Yates shuffle */
+		for (i = master_lists[z].count - 1; i > 0; i--) {
+			rand = prandom_u32_state(&slab_rand);
+			rand %= (i + 1);
+			swap(master_lists[z].list[i],
+				master_lists[z].list[rand]);
+		}
+	}
+}
+#else
+static inline void __init freelist_random_init(void) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
 	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
 		slab_max_order = SLAB_MAX_ORDER_HI;
 
+	freelist_random_init();
+
 	/* Bootstrap is tricky, because several objects are allocated
 	 * from caches that do not exist yet:
 	 * 1) initialize the kmem_cache cache: it contains the struct
@@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 #endif
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/* Identify if the target freelist matches the pre-computed list */
+enum master_type {
+	match,
+	less,
+	more
+};
+
+/* Hold information during a freelist initialization */
+struct freelist_init_state {
+	unsigned int padding;
+	unsigned int pos;
+	unsigned int count;
+	struct m_list master_list;
+	unsigned int master_count;
+	enum master_type type;
+};
+
+/* Select the right pre-computed master list and initialize state */
+static void freelist_state_initialize(struct freelist_init_state *state,
+				      unsigned int count)
+{
+	unsigned int idx;
+	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
+
+	memset(state, 0, sizeof(*state));
+	state->count = count;
+	state->pos = 0;
+	/* count is always >= 2 */
+	idx = ilog2(count) - 1;
+	if (idx >= last_idx)
+		idx = last_idx;
+	else if (roundup_pow_of_two(idx + 1) != count)
+		idx++;
+	state->master_list = master_lists[idx];
+	if (state->master_list.count == state->count)
+		state->type = match;
+	else if (state->master_list.count > state->count)
+		state->type = more;
+	else
+		state->type = less;
+}
+
+/* Get the next entry on the master list depending on the target list size */
+static freelist_idx_t get_next_entry(struct freelist_init_state *state)
+{
+	if (state->type == less && state->pos == state->master_list.count) {
+		state->padding += state->pos;
+		state->pos = 0;
+	}
+	BUG_ON(state->pos >= state->master_list.count);
+	return state->master_list.list[state->pos++];
+}
+
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
+{
+	freelist_idx_t cur, entry;
+
+	entry = get_next_entry(state);
+
+	if (state->type != match) {
+		while ((entry + state->padding) >= state->count)
+			entry = get_next_entry(state);
+		cur = entry + state->padding;
+		BUG_ON(cur >= state->count);
+	} else {
+		cur = entry;
+	}
+
+	return cur;
+}
+
+/* Shuffle the freelist initialization state based on pre-computed lists */
+static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
+			     unsigned int count)
+{
+	unsigned int i;
+	struct freelist_init_state state;
+
+	if (count < 2) {
+		for (i = 0; i < count; i++)
+			set_free_obj(page, i, i);
+		return;
+	}
+
+	/* Last chunk is used already in this case */
+	if (OBJFREELIST_SLAB(cachep))
+		count--;
+
+	freelist_state_initialize(&state, count);
+	for (i = 0; i < count; i++)
+		set_free_obj(page, i, next_random_slot(&state));
+
+	if (OBJFREELIST_SLAB(cachep))
+		set_free_obj(page, i, i);
+}
+#else
+static inline void shuffle_freelist(struct kmem_cache *cachep,
+				    struct page *page, unsigned int count) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct page *page)
 {
@@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			kasan_poison_object_data(cachep, objp);
 		}
 
-		set_free_obj(page, i, i);
+		/* If enabled, initialization is done in shuffle_freelist */
+		if (!config_enabled(CONFIG_FREELIST_RANDOM))
+			set_free_obj(page, i, i);
 	}
+
+	shuffle_freelist(cachep, page, cachep->num);
 }
 
 static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-18 17:14 ` Thomas Garnier
  (?)
@ 2016-04-19  7:15   ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-19  7:15 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> SLAB freelist. The list is randomized during initialization of a new set
> of pages. The order on different freelist sizes is pre-computed at boot
> for performance. This security feature reduces the predictability of the
> kernel SLAB allocator against heap overflows rendering attacks much less
> stable.

I'm not familiar on security but it doesn't look much secure than
before. Is there any other way to generate different sequence of freelist
for each new set of pages? Current approach using pre-computed array will
generate same sequence of freelist for all new set of pages having same size
class. Is it sufficient?

> For example this attack against SLUB (also applicable against SLAB)
> would be affected:
> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> 
> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> a controllable heap is opened to new attacks not yet publicly discussed.
> A kernel heap overflow can be transformed to multiple use-after-free.
> This feature makes this type of attack harder too.
> 
> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available at that boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API.
> 
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.

If this feature will be applied to the SLUB, it's better to put common
code to mm/slab_common.c.

> 
> Performance results highlighted no major changes:
> 
> Netperf average on 10 runs:
> 
> threads,base,change
> 16,576943.10,585905.90 (101.55%)
> 32,564082.00,569741.20 (101.00%)
> 48,558334.30,561851.20 (100.63%)
> 64,552025.20,556448.30 (100.80%)
> 80,552294.40,551743.10 (99.90%)
> 96,552435.30,547529.20 (99.11%)
> 112,551320.60,550183.20 (99.79%)
> 128,549138.30,550542.70 (100.26%)
> 144,549344.50,544529.10 (99.12%)
> 160,550360.80,539929.30 (98.10%)
> 
> slab_test 1 run on boot. After is faster except for odd result on size
> 2048.

Hmm... It's odd result. It adds more logic and it should
decrease performance. I guess it would be experimental error but
do you have any analysis about this result?

> 
> Before:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 118 cycles
> 10000 times kmalloc(16)/kfree -> 118 cycles
> 10000 times kmalloc(32)/kfree -> 118 cycles
> 10000 times kmalloc(64)/kfree -> 121 cycles
> 10000 times kmalloc(128)/kfree -> 118 cycles
> 10000 times kmalloc(256)/kfree -> 115 cycles
> 10000 times kmalloc(512)/kfree -> 115 cycles
> 10000 times kmalloc(1024)/kfree -> 115 cycles
> 10000 times kmalloc(2048)/kfree -> 115 cycles
> 10000 times kmalloc(4096)/kfree -> 115 cycles
> 10000 times kmalloc(8192)/kfree -> 115 cycles
> 10000 times kmalloc(16384)/kfree -> 115 cycles
> 
> After:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 115 cycles
> 10000 times kmalloc(16)/kfree -> 115 cycles
> 10000 times kmalloc(32)/kfree -> 115 cycles
> 10000 times kmalloc(64)/kfree -> 120 cycles
> 10000 times kmalloc(128)/kfree -> 127 cycles
> 10000 times kmalloc(256)/kfree -> 119 cycles
> 10000 times kmalloc(512)/kfree -> 112 cycles
> 10000 times kmalloc(1024)/kfree -> 112 cycles
> 10000 times kmalloc(2048)/kfree -> 112 cycles
> 10000 times kmalloc(4096)/kfree -> 112 cycles
> 10000 times kmalloc(8192)/kfree -> 112 cycles
> 10000 times kmalloc(16384)/kfree -> 112 cycles
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160418
> ---
>  init/Kconfig |   9 ++++
>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 174 insertions(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 0dfd09d..ee35418 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,15 @@ config SLOB
>  
>  endchoice
>  
> +config FREELIST_RANDOM
> +	default n
> +	depends on SLAB
> +	bool "SLAB freelist randomization"
> +	help
> +	  Randomizes the freelist order used on creating new SLABs. This
> +	  security feature reduces the predictability of the kernel slab
> +	  allocator against heap overflows.
> +
>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slab.c b/mm/slab.c
> index b70aabf..8371d80 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -116,6 +116,7 @@
>  #include	<linux/kmemcheck.h>
>  #include	<linux/memory.h>
>  #include	<linux/prefetch.h>
> +#include	<linux/log2.h>
>  
>  #include	<net/sock.h>
>  
> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/*
> + * Master lists are pre-computed random lists
> + * Lists of different sizes are used to optimize performance on SLABS with
> + * different object counts.
> + */

If it is for optimization, it would be one option to have separate
random list for each kmem_cache. It would consume more memory but it
would be marginal. And, it provides more un-predictability and it can
give better performance because we don't need state->type (more, less)
and special handling related for it.

> +static freelist_idx_t master_list_2[2];
> +static freelist_idx_t master_list_4[4];
> +static freelist_idx_t master_list_8[8];
> +static freelist_idx_t master_list_16[16];
> +static freelist_idx_t master_list_32[32];
> +static freelist_idx_t master_list_64[64];
> +static freelist_idx_t master_list_128[128];
> +static freelist_idx_t master_list_256[256];
> +const static struct m_list {
> +	size_t count;
> +	freelist_idx_t *list;
> +} master_lists[] = {
> +	{ ARRAY_SIZE(master_list_2), master_list_2 },
> +	{ ARRAY_SIZE(master_list_4), master_list_4 },
> +	{ ARRAY_SIZE(master_list_8), master_list_8 },
> +	{ ARRAY_SIZE(master_list_16), master_list_16 },
> +	{ ARRAY_SIZE(master_list_32), master_list_32 },
> +	{ ARRAY_SIZE(master_list_64), master_list_64 },
> +	{ ARRAY_SIZE(master_list_128), master_list_128 },
> +	{ ARRAY_SIZE(master_list_256), master_list_256 },
> +};
> +
> +/* Pre-compute the Freelist master lists at boot */
> +static void __init freelist_random_init(void)
> +{
> +	unsigned int seed;
> +	size_t z, i, rand;
> +	struct rnd_state slab_rand;
> +
> +	get_random_bytes_arch(&seed, sizeof(seed));
> +	prandom_seed_state(&slab_rand, seed);
> +
> +	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
> +		for (i = 0; i < master_lists[z].count; i++)
> +			master_lists[z].list[i] = i;
> +
> +		/* Fisher-Yates shuffle */
> +		for (i = master_lists[z].count - 1; i > 0; i--) {
> +			rand = prandom_u32_state(&slab_rand);
> +			rand %= (i + 1);
> +			swap(master_lists[z].list[i],
> +				master_lists[z].list[rand]);
> +		}
> +	}
> +}
> +#else
> +static inline void __init freelist_random_init(void) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
> +
>  /*
>   * Initialisation.  Called after the page allocator have been initialised and
>   * before smp_init().
> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>  	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>  		slab_max_order = SLAB_MAX_ORDER_HI;
>  
> +	freelist_random_init();
> +
>  	/* Bootstrap is tricky, because several objects are allocated
>  	 * from caches that do not exist yet:
>  	 * 1) initialize the kmem_cache cache: it contains the struct
> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>  #endif
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/* Identify if the target freelist matches the pre-computed list */
> +enum master_type {
> +	match,
> +	less,
> +	more
> +};
> +
> +/* Hold information during a freelist initialization */
> +struct freelist_init_state {
> +	unsigned int padding;
> +	unsigned int pos;
> +	unsigned int count;
> +	struct m_list master_list;
> +	unsigned int master_count;
> +	enum master_type type;
> +};
> +
> +/* Select the right pre-computed master list and initialize state */
> +static void freelist_state_initialize(struct freelist_init_state *state,
> +				      unsigned int count)
> +{
> +	unsigned int idx;
> +	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
> +
> +	memset(state, 0, sizeof(*state));
> +	state->count = count;
> +	state->pos = 0;

Using pos = 0 here looks not good in terms of security. In this case,
every new page having same size class have same sequence of freelist since boot.

How about using random value to set pos? It provides some more randomness
with minimal overhead.

> +	/* count is always >= 2 */
> +	idx = ilog2(count) - 1;
> +	if (idx >= last_idx)
> +		idx = last_idx;
> +	else if (roundup_pow_of_two(idx + 1) != count)
> +		idx++;
> +	state->master_list = master_lists[idx];
> +	if (state->master_list.count == state->count)
> +		state->type = match;
> +	else if (state->master_list.count > state->count)
> +		state->type = more;
> +	else
> +		state->type = less;
> +}
> +
> +/* Get the next entry on the master list depending on the target list size */
> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> +{
> +	if (state->type == less && state->pos == state->master_list.count) {
> +		state->padding += state->pos;
> +		state->pos = 0;
> +	}
> +	BUG_ON(state->pos >= state->master_list.count);
> +	return state->master_list.list[state->pos++];
> +}
> +
> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> +{
> +	freelist_idx_t cur, entry;
> +
> +	entry = get_next_entry(state);
> +
> +	if (state->type != match) {
> +		while ((entry + state->padding) >= state->count)
> +			entry = get_next_entry(state);
> +		cur = entry + state->padding;
> +		BUG_ON(cur >= state->count);
> +	} else {
> +		cur = entry;
> +	}
> +
> +	return cur;
> +}
> +
> +/* Shuffle the freelist initialization state based on pre-computed lists */
> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
> +			     unsigned int count)
> +{
> +	unsigned int i;
> +	struct freelist_init_state state;
> +
> +	if (count < 2) {
> +		for (i = 0; i < count; i++)
> +			set_free_obj(page, i, i);
> +		return;
> +	}
> +
> +	/* Last chunk is used already in this case */
> +	if (OBJFREELIST_SLAB(cachep))
> +		count--;
> +
> +	freelist_state_initialize(&state, count);
> +	for (i = 0; i < count; i++)
> +		set_free_obj(page, i, next_random_slot(&state));
> +
> +	if (OBJFREELIST_SLAB(cachep))
> +		set_free_obj(page, i, i);

Please consider last object of OBJFREELIST_SLAB cache, too.

freelist_state_init()
last_obj = next_randome_slot()
page->freelist = XXX
for (i = 0; i < count - 1; i++)
        set_free_obj()
set_free_obj(last_obj);

Thanks.

> +}
> +#else
> +static inline void shuffle_freelist(struct kmem_cache *cachep,
> +				    struct page *page, unsigned int count) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
>  static void cache_init_objs(struct kmem_cache *cachep,
>  			    struct page *page)
>  {
> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>  			kasan_poison_object_data(cachep, objp);
>  		}
>  
> -		set_free_obj(page, i, i);
> +		/* If enabled, initialization is done in shuffle_freelist */
> +		if (!config_enabled(CONFIG_FREELIST_RANDOM))
> +			set_free_obj(page, i, i);
>  	}
> +
> +	shuffle_freelist(cachep, page, cachep->num);
>  }
>  
>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
> -- 
> 2.8.0.rc3.226.g39d4020
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-19  7:15   ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-19  7:15 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> SLAB freelist. The list is randomized during initialization of a new set
> of pages. The order on different freelist sizes is pre-computed at boot
> for performance. This security feature reduces the predictability of the
> kernel SLAB allocator against heap overflows rendering attacks much less
> stable.

I'm not familiar on security but it doesn't look much secure than
before. Is there any other way to generate different sequence of freelist
for each new set of pages? Current approach using pre-computed array will
generate same sequence of freelist for all new set of pages having same size
class. Is it sufficient?

> For example this attack against SLUB (also applicable against SLAB)
> would be affected:
> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> 
> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> a controllable heap is opened to new attacks not yet publicly discussed.
> A kernel heap overflow can be transformed to multiple use-after-free.
> This feature makes this type of attack harder too.
> 
> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available at that boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API.
> 
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.

If this feature will be applied to the SLUB, it's better to put common
code to mm/slab_common.c.

> 
> Performance results highlighted no major changes:
> 
> Netperf average on 10 runs:
> 
> threads,base,change
> 16,576943.10,585905.90 (101.55%)
> 32,564082.00,569741.20 (101.00%)
> 48,558334.30,561851.20 (100.63%)
> 64,552025.20,556448.30 (100.80%)
> 80,552294.40,551743.10 (99.90%)
> 96,552435.30,547529.20 (99.11%)
> 112,551320.60,550183.20 (99.79%)
> 128,549138.30,550542.70 (100.26%)
> 144,549344.50,544529.10 (99.12%)
> 160,550360.80,539929.30 (98.10%)
> 
> slab_test 1 run on boot. After is faster except for odd result on size
> 2048.

Hmm... It's odd result. It adds more logic and it should
decrease performance. I guess it would be experimental error but
do you have any analysis about this result?

> 
> Before:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 118 cycles
> 10000 times kmalloc(16)/kfree -> 118 cycles
> 10000 times kmalloc(32)/kfree -> 118 cycles
> 10000 times kmalloc(64)/kfree -> 121 cycles
> 10000 times kmalloc(128)/kfree -> 118 cycles
> 10000 times kmalloc(256)/kfree -> 115 cycles
> 10000 times kmalloc(512)/kfree -> 115 cycles
> 10000 times kmalloc(1024)/kfree -> 115 cycles
> 10000 times kmalloc(2048)/kfree -> 115 cycles
> 10000 times kmalloc(4096)/kfree -> 115 cycles
> 10000 times kmalloc(8192)/kfree -> 115 cycles
> 10000 times kmalloc(16384)/kfree -> 115 cycles
> 
> After:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 115 cycles
> 10000 times kmalloc(16)/kfree -> 115 cycles
> 10000 times kmalloc(32)/kfree -> 115 cycles
> 10000 times kmalloc(64)/kfree -> 120 cycles
> 10000 times kmalloc(128)/kfree -> 127 cycles
> 10000 times kmalloc(256)/kfree -> 119 cycles
> 10000 times kmalloc(512)/kfree -> 112 cycles
> 10000 times kmalloc(1024)/kfree -> 112 cycles
> 10000 times kmalloc(2048)/kfree -> 112 cycles
> 10000 times kmalloc(4096)/kfree -> 112 cycles
> 10000 times kmalloc(8192)/kfree -> 112 cycles
> 10000 times kmalloc(16384)/kfree -> 112 cycles
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160418
> ---
>  init/Kconfig |   9 ++++
>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 174 insertions(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 0dfd09d..ee35418 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,15 @@ config SLOB
>  
>  endchoice
>  
> +config FREELIST_RANDOM
> +	default n
> +	depends on SLAB
> +	bool "SLAB freelist randomization"
> +	help
> +	  Randomizes the freelist order used on creating new SLABs. This
> +	  security feature reduces the predictability of the kernel slab
> +	  allocator against heap overflows.
> +
>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slab.c b/mm/slab.c
> index b70aabf..8371d80 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -116,6 +116,7 @@
>  #include	<linux/kmemcheck.h>
>  #include	<linux/memory.h>
>  #include	<linux/prefetch.h>
> +#include	<linux/log2.h>
>  
>  #include	<net/sock.h>
>  
> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/*
> + * Master lists are pre-computed random lists
> + * Lists of different sizes are used to optimize performance on SLABS with
> + * different object counts.
> + */

If it is for optimization, it would be one option to have separate
random list for each kmem_cache. It would consume more memory but it
would be marginal. And, it provides more un-predictability and it can
give better performance because we don't need state->type (more, less)
and special handling related for it.

> +static freelist_idx_t master_list_2[2];
> +static freelist_idx_t master_list_4[4];
> +static freelist_idx_t master_list_8[8];
> +static freelist_idx_t master_list_16[16];
> +static freelist_idx_t master_list_32[32];
> +static freelist_idx_t master_list_64[64];
> +static freelist_idx_t master_list_128[128];
> +static freelist_idx_t master_list_256[256];
> +const static struct m_list {
> +	size_t count;
> +	freelist_idx_t *list;
> +} master_lists[] = {
> +	{ ARRAY_SIZE(master_list_2), master_list_2 },
> +	{ ARRAY_SIZE(master_list_4), master_list_4 },
> +	{ ARRAY_SIZE(master_list_8), master_list_8 },
> +	{ ARRAY_SIZE(master_list_16), master_list_16 },
> +	{ ARRAY_SIZE(master_list_32), master_list_32 },
> +	{ ARRAY_SIZE(master_list_64), master_list_64 },
> +	{ ARRAY_SIZE(master_list_128), master_list_128 },
> +	{ ARRAY_SIZE(master_list_256), master_list_256 },
> +};
> +
> +/* Pre-compute the Freelist master lists at boot */
> +static void __init freelist_random_init(void)
> +{
> +	unsigned int seed;
> +	size_t z, i, rand;
> +	struct rnd_state slab_rand;
> +
> +	get_random_bytes_arch(&seed, sizeof(seed));
> +	prandom_seed_state(&slab_rand, seed);
> +
> +	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
> +		for (i = 0; i < master_lists[z].count; i++)
> +			master_lists[z].list[i] = i;
> +
> +		/* Fisher-Yates shuffle */
> +		for (i = master_lists[z].count - 1; i > 0; i--) {
> +			rand = prandom_u32_state(&slab_rand);
> +			rand %= (i + 1);
> +			swap(master_lists[z].list[i],
> +				master_lists[z].list[rand]);
> +		}
> +	}
> +}
> +#else
> +static inline void __init freelist_random_init(void) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
> +
>  /*
>   * Initialisation.  Called after the page allocator have been initialised and
>   * before smp_init().
> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>  	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>  		slab_max_order = SLAB_MAX_ORDER_HI;
>  
> +	freelist_random_init();
> +
>  	/* Bootstrap is tricky, because several objects are allocated
>  	 * from caches that do not exist yet:
>  	 * 1) initialize the kmem_cache cache: it contains the struct
> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>  #endif
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/* Identify if the target freelist matches the pre-computed list */
> +enum master_type {
> +	match,
> +	less,
> +	more
> +};
> +
> +/* Hold information during a freelist initialization */
> +struct freelist_init_state {
> +	unsigned int padding;
> +	unsigned int pos;
> +	unsigned int count;
> +	struct m_list master_list;
> +	unsigned int master_count;
> +	enum master_type type;
> +};
> +
> +/* Select the right pre-computed master list and initialize state */
> +static void freelist_state_initialize(struct freelist_init_state *state,
> +				      unsigned int count)
> +{
> +	unsigned int idx;
> +	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
> +
> +	memset(state, 0, sizeof(*state));
> +	state->count = count;
> +	state->pos = 0;

Using pos = 0 here looks not good in terms of security. In this case,
every new page having same size class have same sequence of freelist since boot.

How about using random value to set pos? It provides some more randomness
with minimal overhead.

> +	/* count is always >= 2 */
> +	idx = ilog2(count) - 1;
> +	if (idx >= last_idx)
> +		idx = last_idx;
> +	else if (roundup_pow_of_two(idx + 1) != count)
> +		idx++;
> +	state->master_list = master_lists[idx];
> +	if (state->master_list.count == state->count)
> +		state->type = match;
> +	else if (state->master_list.count > state->count)
> +		state->type = more;
> +	else
> +		state->type = less;
> +}
> +
> +/* Get the next entry on the master list depending on the target list size */
> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> +{
> +	if (state->type == less && state->pos == state->master_list.count) {
> +		state->padding += state->pos;
> +		state->pos = 0;
> +	}
> +	BUG_ON(state->pos >= state->master_list.count);
> +	return state->master_list.list[state->pos++];
> +}
> +
> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> +{
> +	freelist_idx_t cur, entry;
> +
> +	entry = get_next_entry(state);
> +
> +	if (state->type != match) {
> +		while ((entry + state->padding) >= state->count)
> +			entry = get_next_entry(state);
> +		cur = entry + state->padding;
> +		BUG_ON(cur >= state->count);
> +	} else {
> +		cur = entry;
> +	}
> +
> +	return cur;
> +}
> +
> +/* Shuffle the freelist initialization state based on pre-computed lists */
> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
> +			     unsigned int count)
> +{
> +	unsigned int i;
> +	struct freelist_init_state state;
> +
> +	if (count < 2) {
> +		for (i = 0; i < count; i++)
> +			set_free_obj(page, i, i);
> +		return;
> +	}
> +
> +	/* Last chunk is used already in this case */
> +	if (OBJFREELIST_SLAB(cachep))
> +		count--;
> +
> +	freelist_state_initialize(&state, count);
> +	for (i = 0; i < count; i++)
> +		set_free_obj(page, i, next_random_slot(&state));
> +
> +	if (OBJFREELIST_SLAB(cachep))
> +		set_free_obj(page, i, i);

Please consider last object of OBJFREELIST_SLAB cache, too.

freelist_state_init()
last_obj = next_randome_slot()
page->freelist = XXX
for (i = 0; i < count - 1; i++)
        set_free_obj()
set_free_obj(last_obj);

Thanks.

> +}
> +#else
> +static inline void shuffle_freelist(struct kmem_cache *cachep,
> +				    struct page *page, unsigned int count) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
>  static void cache_init_objs(struct kmem_cache *cachep,
>  			    struct page *page)
>  {
> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>  			kasan_poison_object_data(cachep, objp);
>  		}
>  
> -		set_free_obj(page, i, i);
> +		/* If enabled, initialization is done in shuffle_freelist */
> +		if (!config_enabled(CONFIG_FREELIST_RANDOM))
> +			set_free_obj(page, i, i);
>  	}
> +
> +	shuffle_freelist(cachep, page, cachep->num);
>  }
>  
>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
> -- 
> 2.8.0.rc3.226.g39d4020
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [kernel-hardening] Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-19  7:15   ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-19  7:15 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> SLAB freelist. The list is randomized during initialization of a new set
> of pages. The order on different freelist sizes is pre-computed at boot
> for performance. This security feature reduces the predictability of the
> kernel SLAB allocator against heap overflows rendering attacks much less
> stable.

I'm not familiar on security but it doesn't look much secure than
before. Is there any other way to generate different sequence of freelist
for each new set of pages? Current approach using pre-computed array will
generate same sequence of freelist for all new set of pages having same size
class. Is it sufficient?

> For example this attack against SLUB (also applicable against SLAB)
> would be affected:
> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> 
> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> a controllable heap is opened to new attacks not yet publicly discussed.
> A kernel heap overflow can be transformed to multiple use-after-free.
> This feature makes this type of attack harder too.
> 
> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available at that boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API.
> 
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.

If this feature will be applied to the SLUB, it's better to put common
code to mm/slab_common.c.

> 
> Performance results highlighted no major changes:
> 
> Netperf average on 10 runs:
> 
> threads,base,change
> 16,576943.10,585905.90 (101.55%)
> 32,564082.00,569741.20 (101.00%)
> 48,558334.30,561851.20 (100.63%)
> 64,552025.20,556448.30 (100.80%)
> 80,552294.40,551743.10 (99.90%)
> 96,552435.30,547529.20 (99.11%)
> 112,551320.60,550183.20 (99.79%)
> 128,549138.30,550542.70 (100.26%)
> 144,549344.50,544529.10 (99.12%)
> 160,550360.80,539929.30 (98.10%)
> 
> slab_test 1 run on boot. After is faster except for odd result on size
> 2048.

Hmm... It's odd result. It adds more logic and it should
decrease performance. I guess it would be experimental error but
do you have any analysis about this result?

> 
> Before:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 118 cycles
> 10000 times kmalloc(16)/kfree -> 118 cycles
> 10000 times kmalloc(32)/kfree -> 118 cycles
> 10000 times kmalloc(64)/kfree -> 121 cycles
> 10000 times kmalloc(128)/kfree -> 118 cycles
> 10000 times kmalloc(256)/kfree -> 115 cycles
> 10000 times kmalloc(512)/kfree -> 115 cycles
> 10000 times kmalloc(1024)/kfree -> 115 cycles
> 10000 times kmalloc(2048)/kfree -> 115 cycles
> 10000 times kmalloc(4096)/kfree -> 115 cycles
> 10000 times kmalloc(8192)/kfree -> 115 cycles
> 10000 times kmalloc(16384)/kfree -> 115 cycles
> 
> After:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 115 cycles
> 10000 times kmalloc(16)/kfree -> 115 cycles
> 10000 times kmalloc(32)/kfree -> 115 cycles
> 10000 times kmalloc(64)/kfree -> 120 cycles
> 10000 times kmalloc(128)/kfree -> 127 cycles
> 10000 times kmalloc(256)/kfree -> 119 cycles
> 10000 times kmalloc(512)/kfree -> 112 cycles
> 10000 times kmalloc(1024)/kfree -> 112 cycles
> 10000 times kmalloc(2048)/kfree -> 112 cycles
> 10000 times kmalloc(4096)/kfree -> 112 cycles
> 10000 times kmalloc(8192)/kfree -> 112 cycles
> 10000 times kmalloc(16384)/kfree -> 112 cycles
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160418
> ---
>  init/Kconfig |   9 ++++
>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 174 insertions(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 0dfd09d..ee35418 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,15 @@ config SLOB
>  
>  endchoice
>  
> +config FREELIST_RANDOM
> +	default n
> +	depends on SLAB
> +	bool "SLAB freelist randomization"
> +	help
> +	  Randomizes the freelist order used on creating new SLABs. This
> +	  security feature reduces the predictability of the kernel slab
> +	  allocator against heap overflows.
> +
>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slab.c b/mm/slab.c
> index b70aabf..8371d80 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -116,6 +116,7 @@
>  #include	<linux/kmemcheck.h>
>  #include	<linux/memory.h>
>  #include	<linux/prefetch.h>
> +#include	<linux/log2.h>
>  
>  #include	<net/sock.h>
>  
> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/*
> + * Master lists are pre-computed random lists
> + * Lists of different sizes are used to optimize performance on SLABS with
> + * different object counts.
> + */

If it is for optimization, it would be one option to have separate
random list for each kmem_cache. It would consume more memory but it
would be marginal. And, it provides more un-predictability and it can
give better performance because we don't need state->type (more, less)
and special handling related for it.

> +static freelist_idx_t master_list_2[2];
> +static freelist_idx_t master_list_4[4];
> +static freelist_idx_t master_list_8[8];
> +static freelist_idx_t master_list_16[16];
> +static freelist_idx_t master_list_32[32];
> +static freelist_idx_t master_list_64[64];
> +static freelist_idx_t master_list_128[128];
> +static freelist_idx_t master_list_256[256];
> +const static struct m_list {
> +	size_t count;
> +	freelist_idx_t *list;
> +} master_lists[] = {
> +	{ ARRAY_SIZE(master_list_2), master_list_2 },
> +	{ ARRAY_SIZE(master_list_4), master_list_4 },
> +	{ ARRAY_SIZE(master_list_8), master_list_8 },
> +	{ ARRAY_SIZE(master_list_16), master_list_16 },
> +	{ ARRAY_SIZE(master_list_32), master_list_32 },
> +	{ ARRAY_SIZE(master_list_64), master_list_64 },
> +	{ ARRAY_SIZE(master_list_128), master_list_128 },
> +	{ ARRAY_SIZE(master_list_256), master_list_256 },
> +};
> +
> +/* Pre-compute the Freelist master lists at boot */
> +static void __init freelist_random_init(void)
> +{
> +	unsigned int seed;
> +	size_t z, i, rand;
> +	struct rnd_state slab_rand;
> +
> +	get_random_bytes_arch(&seed, sizeof(seed));
> +	prandom_seed_state(&slab_rand, seed);
> +
> +	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
> +		for (i = 0; i < master_lists[z].count; i++)
> +			master_lists[z].list[i] = i;
> +
> +		/* Fisher-Yates shuffle */
> +		for (i = master_lists[z].count - 1; i > 0; i--) {
> +			rand = prandom_u32_state(&slab_rand);
> +			rand %= (i + 1);
> +			swap(master_lists[z].list[i],
> +				master_lists[z].list[rand]);
> +		}
> +	}
> +}
> +#else
> +static inline void __init freelist_random_init(void) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
> +
>  /*
>   * Initialisation.  Called after the page allocator have been initialised and
>   * before smp_init().
> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>  	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>  		slab_max_order = SLAB_MAX_ORDER_HI;
>  
> +	freelist_random_init();
> +
>  	/* Bootstrap is tricky, because several objects are allocated
>  	 * from caches that do not exist yet:
>  	 * 1) initialize the kmem_cache cache: it contains the struct
> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>  #endif
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/* Identify if the target freelist matches the pre-computed list */
> +enum master_type {
> +	match,
> +	less,
> +	more
> +};
> +
> +/* Hold information during a freelist initialization */
> +struct freelist_init_state {
> +	unsigned int padding;
> +	unsigned int pos;
> +	unsigned int count;
> +	struct m_list master_list;
> +	unsigned int master_count;
> +	enum master_type type;
> +};
> +
> +/* Select the right pre-computed master list and initialize state */
> +static void freelist_state_initialize(struct freelist_init_state *state,
> +				      unsigned int count)
> +{
> +	unsigned int idx;
> +	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
> +
> +	memset(state, 0, sizeof(*state));
> +	state->count = count;
> +	state->pos = 0;

Using pos = 0 here looks not good in terms of security. In this case,
every new page having same size class have same sequence of freelist since boot.

How about using random value to set pos? It provides some more randomness
with minimal overhead.

> +	/* count is always >= 2 */
> +	idx = ilog2(count) - 1;
> +	if (idx >= last_idx)
> +		idx = last_idx;
> +	else if (roundup_pow_of_two(idx + 1) != count)
> +		idx++;
> +	state->master_list = master_lists[idx];
> +	if (state->master_list.count == state->count)
> +		state->type = match;
> +	else if (state->master_list.count > state->count)
> +		state->type = more;
> +	else
> +		state->type = less;
> +}
> +
> +/* Get the next entry on the master list depending on the target list size */
> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> +{
> +	if (state->type == less && state->pos == state->master_list.count) {
> +		state->padding += state->pos;
> +		state->pos = 0;
> +	}
> +	BUG_ON(state->pos >= state->master_list.count);
> +	return state->master_list.list[state->pos++];
> +}
> +
> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> +{
> +	freelist_idx_t cur, entry;
> +
> +	entry = get_next_entry(state);
> +
> +	if (state->type != match) {
> +		while ((entry + state->padding) >= state->count)
> +			entry = get_next_entry(state);
> +		cur = entry + state->padding;
> +		BUG_ON(cur >= state->count);
> +	} else {
> +		cur = entry;
> +	}
> +
> +	return cur;
> +}
> +
> +/* Shuffle the freelist initialization state based on pre-computed lists */
> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
> +			     unsigned int count)
> +{
> +	unsigned int i;
> +	struct freelist_init_state state;
> +
> +	if (count < 2) {
> +		for (i = 0; i < count; i++)
> +			set_free_obj(page, i, i);
> +		return;
> +	}
> +
> +	/* Last chunk is used already in this case */
> +	if (OBJFREELIST_SLAB(cachep))
> +		count--;
> +
> +	freelist_state_initialize(&state, count);
> +	for (i = 0; i < count; i++)
> +		set_free_obj(page, i, next_random_slot(&state));
> +
> +	if (OBJFREELIST_SLAB(cachep))
> +		set_free_obj(page, i, i);

Please consider last object of OBJFREELIST_SLAB cache, too.

freelist_state_init()
last_obj = next_randome_slot()
page->freelist = XXX
for (i = 0; i < count - 1; i++)
        set_free_obj()
set_free_obj(last_obj);

Thanks.

> +}
> +#else
> +static inline void shuffle_freelist(struct kmem_cache *cachep,
> +				    struct page *page, unsigned int count) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
>  static void cache_init_objs(struct kmem_cache *cachep,
>  			    struct page *page)
>  {
> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>  			kasan_poison_object_data(cachep, objp);
>  		}
>  
> -		set_free_obj(page, i, i);
> +		/* If enabled, initialization is done in shuffle_freelist */
> +		if (!config_enabled(CONFIG_FREELIST_RANDOM))
> +			set_free_obj(page, i, i);
>  	}
> +
> +	shuffle_freelist(cachep, page, cachep->num);
>  }
>  
>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
> -- 
> 2.8.0.rc3.226.g39d4020
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-19  7:15   ` Joonsoo Kim
  (?)
@ 2016-04-19 16:44     ` Thomas Garnier
  -1 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-19 16:44 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. This security feature reduces the predictability of the
>> kernel SLAB allocator against heap overflows rendering attacks much less
>> stable.
>
> I'm not familiar on security but it doesn't look much secure than
> before. Is there any other way to generate different sequence of freelist
> for each new set of pages? Current approach using pre-computed array will
> generate same sequence of freelist for all new set of pages having same size
> class. Is it sufficient?
>

I think it is sufficient. There is a tradeoff for performance. We could randomly
pick an object from the freelist every time (on slab_get_obj) but I
think it will
have significant impact (at least 3%).

>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available at that boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>
> If this feature will be applied to the SLUB, it's better to put common
> code to mm/slab_common.c.
>

I think it might be moved there once we implement the SLUB counterpart
but it is too early to define which part will be common.

>>
>> Performance results highlighted no major changes:
>>
>> Netperf average on 10 runs:
>>
>> threads,base,change
>> 16,576943.10,585905.90 (101.55%)
>> 32,564082.00,569741.20 (101.00%)
>> 48,558334.30,561851.20 (100.63%)
>> 64,552025.20,556448.30 (100.80%)
>> 80,552294.40,551743.10 (99.90%)
>> 96,552435.30,547529.20 (99.11%)
>> 112,551320.60,550183.20 (99.79%)
>> 128,549138.30,550542.70 (100.26%)
>> 144,549344.50,544529.10 (99.12%)
>> 160,550360.80,539929.30 (98.10%)
>>
>> slab_test 1 run on boot. After is faster except for odd result on size
>> 2048.
>
> Hmm... It's odd result. It adds more logic and it should
> decrease performance. I guess it would be experimental error but
> do you have any analysis about this result?
>

I don't. I am glad to redo the test. I found that slab_test has very different
result based on the heap state at the time of the test. If I run the
test multiple
times, I have really various results on with or without the mitigation (on
dedicated hardware).

>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
>> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
>> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
>> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
>> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
>> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
>> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
>> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
>> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
>> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
>> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
>> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 118 cycles
>> 10000 times kmalloc(16)/kfree -> 118 cycles
>> 10000 times kmalloc(32)/kfree -> 118 cycles
>> 10000 times kmalloc(64)/kfree -> 121 cycles
>> 10000 times kmalloc(128)/kfree -> 118 cycles
>> 10000 times kmalloc(256)/kfree -> 115 cycles
>> 10000 times kmalloc(512)/kfree -> 115 cycles
>> 10000 times kmalloc(1024)/kfree -> 115 cycles
>> 10000 times kmalloc(2048)/kfree -> 115 cycles
>> 10000 times kmalloc(4096)/kfree -> 115 cycles
>> 10000 times kmalloc(8192)/kfree -> 115 cycles
>> 10000 times kmalloc(16384)/kfree -> 115 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
>> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
>> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
>> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
>> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
>> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
>> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
>> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
>> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
>> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
>> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
>> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 115 cycles
>> 10000 times kmalloc(16)/kfree -> 115 cycles
>> 10000 times kmalloc(32)/kfree -> 115 cycles
>> 10000 times kmalloc(64)/kfree -> 120 cycles
>> 10000 times kmalloc(128)/kfree -> 127 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 112 cycles
>> 10000 times kmalloc(1024)/kfree -> 112 cycles
>> 10000 times kmalloc(2048)/kfree -> 112 cycles
>> 10000 times kmalloc(4096)/kfree -> 112 cycles
>> 10000 times kmalloc(8192)/kfree -> 112 cycles
>> 10000 times kmalloc(16384)/kfree -> 112 cycles
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160418
>> ---
>>  init/Kconfig |   9 ++++
>>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 174 insertions(+), 1 deletion(-)
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0dfd09d..ee35418 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>>  endchoice
>>
>> +config FREELIST_RANDOM
>> +     default n
>> +     depends on SLAB
>> +     bool "SLAB freelist randomization"
>> +     help
>> +       Randomizes the freelist order used on creating new SLABs. This
>> +       security feature reduces the predictability of the kernel slab
>> +       allocator against heap overflows.
>> +
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b70aabf..8371d80 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>>  #include     <linux/kmemcheck.h>
>>  #include     <linux/memory.h>
>>  #include     <linux/prefetch.h>
>> +#include     <linux/log2.h>
>>
>>  #include     <net/sock.h>
>>
>> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>       }
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/*
>> + * Master lists are pre-computed random lists
>> + * Lists of different sizes are used to optimize performance on SLABS with
>> + * different object counts.
>> + */
>
> If it is for optimization, it would be one option to have separate
> random list for each kmem_cache. It would consume more memory but it
> would be marginal. And, it provides more un-predictability and it can
> give better performance because we don't need state->type (more, less)
> and special handling related for it.
>

I am not sur because major caches are created early at boot time. We still have
the same entropy problem and we are wasting a bit more memory. It will be faster
on usage though but not sure it will be significant.

>> +static freelist_idx_t master_list_2[2];
>> +static freelist_idx_t master_list_4[4];
>> +static freelist_idx_t master_list_8[8];
>> +static freelist_idx_t master_list_16[16];
>> +static freelist_idx_t master_list_32[32];
>> +static freelist_idx_t master_list_64[64];
>> +static freelist_idx_t master_list_128[128];
>> +static freelist_idx_t master_list_256[256];
>> +const static struct m_list {
>> +     size_t count;
>> +     freelist_idx_t *list;
>> +} master_lists[] = {
>> +     { ARRAY_SIZE(master_list_2), master_list_2 },
>> +     { ARRAY_SIZE(master_list_4), master_list_4 },
>> +     { ARRAY_SIZE(master_list_8), master_list_8 },
>> +     { ARRAY_SIZE(master_list_16), master_list_16 },
>> +     { ARRAY_SIZE(master_list_32), master_list_32 },
>> +     { ARRAY_SIZE(master_list_64), master_list_64 },
>> +     { ARRAY_SIZE(master_list_128), master_list_128 },
>> +     { ARRAY_SIZE(master_list_256), master_list_256 },
>> +};
>> +
>> +/* Pre-compute the Freelist master lists at boot */
>> +static void __init freelist_random_init(void)
>> +{
>> +     unsigned int seed;
>> +     size_t z, i, rand;
>> +     struct rnd_state slab_rand;
>> +
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> +     prandom_seed_state(&slab_rand, seed);
>> +
>> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
>> +             for (i = 0; i < master_lists[z].count; i++)
>> +                     master_lists[z].list[i] = i;
>> +
>> +             /* Fisher-Yates shuffle */
>> +             for (i = master_lists[z].count - 1; i > 0; i--) {
>> +                     rand = prandom_u32_state(&slab_rand);
>> +                     rand %= (i + 1);
>> +                     swap(master_lists[z].list[i],
>> +                             master_lists[z].list[rand]);
>> +             }
>> +     }
>> +}
>> +#else
>> +static inline void __init freelist_random_init(void) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>> +
>>  /*
>>   * Initialisation.  Called after the page allocator have been initialised and
>>   * before smp_init().
>> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>>               slab_max_order = SLAB_MAX_ORDER_HI;
>>
>> +     freelist_random_init();
>> +
>>       /* Bootstrap is tricky, because several objects are allocated
>>        * from caches that do not exist yet:
>>        * 1) initialize the kmem_cache cache: it contains the struct
>> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>>  #endif
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/* Identify if the target freelist matches the pre-computed list */
>> +enum master_type {
>> +     match,
>> +     less,
>> +     more
>> +};
>> +
>> +/* Hold information during a freelist initialization */
>> +struct freelist_init_state {
>> +     unsigned int padding;
>> +     unsigned int pos;
>> +     unsigned int count;
>> +     struct m_list master_list;
>> +     unsigned int master_count;
>> +     enum master_type type;
>> +};
>> +
>> +/* Select the right pre-computed master list and initialize state */
>> +static void freelist_state_initialize(struct freelist_init_state *state,
>> +                                   unsigned int count)
>> +{
>> +     unsigned int idx;
>> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
>> +
>> +     memset(state, 0, sizeof(*state));
>> +     state->count = count;
>> +     state->pos = 0;
>
> Using pos = 0 here looks not good in terms of security. In this case,
> every new page having same size class have same sequence of freelist since boot.
>
> How about using random value to set pos? It provides some more randomness
> with minimal overhead.
>

I think it is a good idea. I will add that for the next iteration.

>> +     /* count is always >= 2 */
>> +     idx = ilog2(count) - 1;
>> +     if (idx >= last_idx)
>> +             idx = last_idx;
>> +     else if (roundup_pow_of_two(idx + 1) != count)
>> +             idx++;
>> +     state->master_list = master_lists[idx];
>> +     if (state->master_list.count == state->count)
>> +             state->type = match;
>> +     else if (state->master_list.count > state->count)
>> +             state->type = more;
>> +     else
>> +             state->type = less;
>> +}
>> +
>> +/* Get the next entry on the master list depending on the target list size */
>> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> +{
>> +     if (state->type == less && state->pos == state->master_list.count) {
>> +             state->padding += state->pos;
>> +             state->pos = 0;
>> +     }
>> +     BUG_ON(state->pos >= state->master_list.count);
>> +     return state->master_list.list[state->pos++];
>> +}
>> +
>> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> +{
>> +     freelist_idx_t cur, entry;
>> +
>> +     entry = get_next_entry(state);
>> +
>> +     if (state->type != match) {
>> +             while ((entry + state->padding) >= state->count)
>> +                     entry = get_next_entry(state);
>> +             cur = entry + state->padding;
>> +             BUG_ON(cur >= state->count);
>> +     } else {
>> +             cur = entry;
>> +     }
>> +
>> +     return cur;
>> +}
>> +
>> +/* Shuffle the freelist initialization state based on pre-computed lists */
>> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
>> +                          unsigned int count)
>> +{
>> +     unsigned int i;
>> +     struct freelist_init_state state;
>> +
>> +     if (count < 2) {
>> +             for (i = 0; i < count; i++)
>> +                     set_free_obj(page, i, i);
>> +             return;
>> +     }
>> +
>> +     /* Last chunk is used already in this case */
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             count--;
>> +
>> +     freelist_state_initialize(&state, count);
>> +     for (i = 0; i < count; i++)
>> +             set_free_obj(page, i, next_random_slot(&state));
>> +
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             set_free_obj(page, i, i);
>
> Please consider last object of OBJFREELIST_SLAB cache, too.
>
> freelist_state_init()
> last_obj = next_randome_slot()
> page->freelist = XXX
> for (i = 0; i < count - 1; i++)
>         set_free_obj()
> set_free_obj(last_obj);
>
> Thanks.
>

The current implementation take the last chunk by default before the
freelist is initialized. Do you want it to be randomized as well?

>> +}
>> +#else
>> +static inline void shuffle_freelist(struct kmem_cache *cachep,
>> +                                 struct page *page, unsigned int count) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>>  static void cache_init_objs(struct kmem_cache *cachep,
>>                           struct page *page)
>>  {
>> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>>                       kasan_poison_object_data(cachep, objp);
>>               }
>>
>> -             set_free_obj(page, i, i);
>> +             /* If enabled, initialization is done in shuffle_freelist */
>> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
>> +                     set_free_obj(page, i, i);
>>       }
>> +
>> +     shuffle_freelist(cachep, page, cachep->num);
>>  }
>>
>>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-19 16:44     ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-19 16:44 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. This security feature reduces the predictability of the
>> kernel SLAB allocator against heap overflows rendering attacks much less
>> stable.
>
> I'm not familiar on security but it doesn't look much secure than
> before. Is there any other way to generate different sequence of freelist
> for each new set of pages? Current approach using pre-computed array will
> generate same sequence of freelist for all new set of pages having same size
> class. Is it sufficient?
>

I think it is sufficient. There is a tradeoff for performance. We could randomly
pick an object from the freelist every time (on slab_get_obj) but I
think it will
have significant impact (at least 3%).

>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available at that boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>
> If this feature will be applied to the SLUB, it's better to put common
> code to mm/slab_common.c.
>

I think it might be moved there once we implement the SLUB counterpart
but it is too early to define which part will be common.

>>
>> Performance results highlighted no major changes:
>>
>> Netperf average on 10 runs:
>>
>> threads,base,change
>> 16,576943.10,585905.90 (101.55%)
>> 32,564082.00,569741.20 (101.00%)
>> 48,558334.30,561851.20 (100.63%)
>> 64,552025.20,556448.30 (100.80%)
>> 80,552294.40,551743.10 (99.90%)
>> 96,552435.30,547529.20 (99.11%)
>> 112,551320.60,550183.20 (99.79%)
>> 128,549138.30,550542.70 (100.26%)
>> 144,549344.50,544529.10 (99.12%)
>> 160,550360.80,539929.30 (98.10%)
>>
>> slab_test 1 run on boot. After is faster except for odd result on size
>> 2048.
>
> Hmm... It's odd result. It adds more logic and it should
> decrease performance. I guess it would be experimental error but
> do you have any analysis about this result?
>

I don't. I am glad to redo the test. I found that slab_test has very different
result based on the heap state at the time of the test. If I run the
test multiple
times, I have really various results on with or without the mitigation (on
dedicated hardware).

>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
>> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
>> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
>> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
>> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
>> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
>> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
>> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
>> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
>> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
>> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
>> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 118 cycles
>> 10000 times kmalloc(16)/kfree -> 118 cycles
>> 10000 times kmalloc(32)/kfree -> 118 cycles
>> 10000 times kmalloc(64)/kfree -> 121 cycles
>> 10000 times kmalloc(128)/kfree -> 118 cycles
>> 10000 times kmalloc(256)/kfree -> 115 cycles
>> 10000 times kmalloc(512)/kfree -> 115 cycles
>> 10000 times kmalloc(1024)/kfree -> 115 cycles
>> 10000 times kmalloc(2048)/kfree -> 115 cycles
>> 10000 times kmalloc(4096)/kfree -> 115 cycles
>> 10000 times kmalloc(8192)/kfree -> 115 cycles
>> 10000 times kmalloc(16384)/kfree -> 115 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
>> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
>> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
>> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
>> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
>> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
>> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
>> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
>> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
>> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
>> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
>> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 115 cycles
>> 10000 times kmalloc(16)/kfree -> 115 cycles
>> 10000 times kmalloc(32)/kfree -> 115 cycles
>> 10000 times kmalloc(64)/kfree -> 120 cycles
>> 10000 times kmalloc(128)/kfree -> 127 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 112 cycles
>> 10000 times kmalloc(1024)/kfree -> 112 cycles
>> 10000 times kmalloc(2048)/kfree -> 112 cycles
>> 10000 times kmalloc(4096)/kfree -> 112 cycles
>> 10000 times kmalloc(8192)/kfree -> 112 cycles
>> 10000 times kmalloc(16384)/kfree -> 112 cycles
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160418
>> ---
>>  init/Kconfig |   9 ++++
>>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 174 insertions(+), 1 deletion(-)
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0dfd09d..ee35418 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>>  endchoice
>>
>> +config FREELIST_RANDOM
>> +     default n
>> +     depends on SLAB
>> +     bool "SLAB freelist randomization"
>> +     help
>> +       Randomizes the freelist order used on creating new SLABs. This
>> +       security feature reduces the predictability of the kernel slab
>> +       allocator against heap overflows.
>> +
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b70aabf..8371d80 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>>  #include     <linux/kmemcheck.h>
>>  #include     <linux/memory.h>
>>  #include     <linux/prefetch.h>
>> +#include     <linux/log2.h>
>>
>>  #include     <net/sock.h>
>>
>> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>       }
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/*
>> + * Master lists are pre-computed random lists
>> + * Lists of different sizes are used to optimize performance on SLABS with
>> + * different object counts.
>> + */
>
> If it is for optimization, it would be one option to have separate
> random list for each kmem_cache. It would consume more memory but it
> would be marginal. And, it provides more un-predictability and it can
> give better performance because we don't need state->type (more, less)
> and special handling related for it.
>

I am not sur because major caches are created early at boot time. We still have
the same entropy problem and we are wasting a bit more memory. It will be faster
on usage though but not sure it will be significant.

>> +static freelist_idx_t master_list_2[2];
>> +static freelist_idx_t master_list_4[4];
>> +static freelist_idx_t master_list_8[8];
>> +static freelist_idx_t master_list_16[16];
>> +static freelist_idx_t master_list_32[32];
>> +static freelist_idx_t master_list_64[64];
>> +static freelist_idx_t master_list_128[128];
>> +static freelist_idx_t master_list_256[256];
>> +const static struct m_list {
>> +     size_t count;
>> +     freelist_idx_t *list;
>> +} master_lists[] = {
>> +     { ARRAY_SIZE(master_list_2), master_list_2 },
>> +     { ARRAY_SIZE(master_list_4), master_list_4 },
>> +     { ARRAY_SIZE(master_list_8), master_list_8 },
>> +     { ARRAY_SIZE(master_list_16), master_list_16 },
>> +     { ARRAY_SIZE(master_list_32), master_list_32 },
>> +     { ARRAY_SIZE(master_list_64), master_list_64 },
>> +     { ARRAY_SIZE(master_list_128), master_list_128 },
>> +     { ARRAY_SIZE(master_list_256), master_list_256 },
>> +};
>> +
>> +/* Pre-compute the Freelist master lists at boot */
>> +static void __init freelist_random_init(void)
>> +{
>> +     unsigned int seed;
>> +     size_t z, i, rand;
>> +     struct rnd_state slab_rand;
>> +
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> +     prandom_seed_state(&slab_rand, seed);
>> +
>> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
>> +             for (i = 0; i < master_lists[z].count; i++)
>> +                     master_lists[z].list[i] = i;
>> +
>> +             /* Fisher-Yates shuffle */
>> +             for (i = master_lists[z].count - 1; i > 0; i--) {
>> +                     rand = prandom_u32_state(&slab_rand);
>> +                     rand %= (i + 1);
>> +                     swap(master_lists[z].list[i],
>> +                             master_lists[z].list[rand]);
>> +             }
>> +     }
>> +}
>> +#else
>> +static inline void __init freelist_random_init(void) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>> +
>>  /*
>>   * Initialisation.  Called after the page allocator have been initialised and
>>   * before smp_init().
>> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>>               slab_max_order = SLAB_MAX_ORDER_HI;
>>
>> +     freelist_random_init();
>> +
>>       /* Bootstrap is tricky, because several objects are allocated
>>        * from caches that do not exist yet:
>>        * 1) initialize the kmem_cache cache: it contains the struct
>> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>>  #endif
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/* Identify if the target freelist matches the pre-computed list */
>> +enum master_type {
>> +     match,
>> +     less,
>> +     more
>> +};
>> +
>> +/* Hold information during a freelist initialization */
>> +struct freelist_init_state {
>> +     unsigned int padding;
>> +     unsigned int pos;
>> +     unsigned int count;
>> +     struct m_list master_list;
>> +     unsigned int master_count;
>> +     enum master_type type;
>> +};
>> +
>> +/* Select the right pre-computed master list and initialize state */
>> +static void freelist_state_initialize(struct freelist_init_state *state,
>> +                                   unsigned int count)
>> +{
>> +     unsigned int idx;
>> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
>> +
>> +     memset(state, 0, sizeof(*state));
>> +     state->count = count;
>> +     state->pos = 0;
>
> Using pos = 0 here looks not good in terms of security. In this case,
> every new page having same size class have same sequence of freelist since boot.
>
> How about using random value to set pos? It provides some more randomness
> with minimal overhead.
>

I think it is a good idea. I will add that for the next iteration.

>> +     /* count is always >= 2 */
>> +     idx = ilog2(count) - 1;
>> +     if (idx >= last_idx)
>> +             idx = last_idx;
>> +     else if (roundup_pow_of_two(idx + 1) != count)
>> +             idx++;
>> +     state->master_list = master_lists[idx];
>> +     if (state->master_list.count == state->count)
>> +             state->type = match;
>> +     else if (state->master_list.count > state->count)
>> +             state->type = more;
>> +     else
>> +             state->type = less;
>> +}
>> +
>> +/* Get the next entry on the master list depending on the target list size */
>> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> +{
>> +     if (state->type == less && state->pos == state->master_list.count) {
>> +             state->padding += state->pos;
>> +             state->pos = 0;
>> +     }
>> +     BUG_ON(state->pos >= state->master_list.count);
>> +     return state->master_list.list[state->pos++];
>> +}
>> +
>> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> +{
>> +     freelist_idx_t cur, entry;
>> +
>> +     entry = get_next_entry(state);
>> +
>> +     if (state->type != match) {
>> +             while ((entry + state->padding) >= state->count)
>> +                     entry = get_next_entry(state);
>> +             cur = entry + state->padding;
>> +             BUG_ON(cur >= state->count);
>> +     } else {
>> +             cur = entry;
>> +     }
>> +
>> +     return cur;
>> +}
>> +
>> +/* Shuffle the freelist initialization state based on pre-computed lists */
>> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
>> +                          unsigned int count)
>> +{
>> +     unsigned int i;
>> +     struct freelist_init_state state;
>> +
>> +     if (count < 2) {
>> +             for (i = 0; i < count; i++)
>> +                     set_free_obj(page, i, i);
>> +             return;
>> +     }
>> +
>> +     /* Last chunk is used already in this case */
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             count--;
>> +
>> +     freelist_state_initialize(&state, count);
>> +     for (i = 0; i < count; i++)
>> +             set_free_obj(page, i, next_random_slot(&state));
>> +
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             set_free_obj(page, i, i);
>
> Please consider last object of OBJFREELIST_SLAB cache, too.
>
> freelist_state_init()
> last_obj = next_randome_slot()
> page->freelist = XXX
> for (i = 0; i < count - 1; i++)
>         set_free_obj()
> set_free_obj(last_obj);
>
> Thanks.
>

The current implementation take the last chunk by default before the
freelist is initialized. Do you want it to be randomized as well?

>> +}
>> +#else
>> +static inline void shuffle_freelist(struct kmem_cache *cachep,
>> +                                 struct page *page, unsigned int count) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>>  static void cache_init_objs(struct kmem_cache *cachep,
>>                           struct page *page)
>>  {
>> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>>                       kasan_poison_object_data(cachep, objp);
>>               }
>>
>> -             set_free_obj(page, i, i);
>> +             /* If enabled, initialization is done in shuffle_freelist */
>> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
>> +                     set_free_obj(page, i, i);
>>       }
>> +
>> +     shuffle_freelist(cachep, page, cachep->num);
>>  }
>>
>>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [kernel-hardening] Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-19 16:44     ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-19 16:44 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. This security feature reduces the predictability of the
>> kernel SLAB allocator against heap overflows rendering attacks much less
>> stable.
>
> I'm not familiar on security but it doesn't look much secure than
> before. Is there any other way to generate different sequence of freelist
> for each new set of pages? Current approach using pre-computed array will
> generate same sequence of freelist for all new set of pages having same size
> class. Is it sufficient?
>

I think it is sufficient. There is a tradeoff for performance. We could randomly
pick an object from the freelist every time (on slab_get_obj) but I
think it will
have significant impact (at least 3%).

>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available at that boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>
> If this feature will be applied to the SLUB, it's better to put common
> code to mm/slab_common.c.
>

I think it might be moved there once we implement the SLUB counterpart
but it is too early to define which part will be common.

>>
>> Performance results highlighted no major changes:
>>
>> Netperf average on 10 runs:
>>
>> threads,base,change
>> 16,576943.10,585905.90 (101.55%)
>> 32,564082.00,569741.20 (101.00%)
>> 48,558334.30,561851.20 (100.63%)
>> 64,552025.20,556448.30 (100.80%)
>> 80,552294.40,551743.10 (99.90%)
>> 96,552435.30,547529.20 (99.11%)
>> 112,551320.60,550183.20 (99.79%)
>> 128,549138.30,550542.70 (100.26%)
>> 144,549344.50,544529.10 (99.12%)
>> 160,550360.80,539929.30 (98.10%)
>>
>> slab_test 1 run on boot. After is faster except for odd result on size
>> 2048.
>
> Hmm... It's odd result. It adds more logic and it should
> decrease performance. I guess it would be experimental error but
> do you have any analysis about this result?
>

I don't. I am glad to redo the test. I found that slab_test has very different
result based on the heap state at the time of the test. If I run the
test multiple
times, I have really various results on with or without the mitigation (on
dedicated hardware).

>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
>> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
>> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
>> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
>> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
>> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
>> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
>> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
>> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
>> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
>> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
>> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 118 cycles
>> 10000 times kmalloc(16)/kfree -> 118 cycles
>> 10000 times kmalloc(32)/kfree -> 118 cycles
>> 10000 times kmalloc(64)/kfree -> 121 cycles
>> 10000 times kmalloc(128)/kfree -> 118 cycles
>> 10000 times kmalloc(256)/kfree -> 115 cycles
>> 10000 times kmalloc(512)/kfree -> 115 cycles
>> 10000 times kmalloc(1024)/kfree -> 115 cycles
>> 10000 times kmalloc(2048)/kfree -> 115 cycles
>> 10000 times kmalloc(4096)/kfree -> 115 cycles
>> 10000 times kmalloc(8192)/kfree -> 115 cycles
>> 10000 times kmalloc(16384)/kfree -> 115 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
>> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
>> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
>> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
>> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
>> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
>> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
>> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
>> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
>> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
>> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
>> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 115 cycles
>> 10000 times kmalloc(16)/kfree -> 115 cycles
>> 10000 times kmalloc(32)/kfree -> 115 cycles
>> 10000 times kmalloc(64)/kfree -> 120 cycles
>> 10000 times kmalloc(128)/kfree -> 127 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 112 cycles
>> 10000 times kmalloc(1024)/kfree -> 112 cycles
>> 10000 times kmalloc(2048)/kfree -> 112 cycles
>> 10000 times kmalloc(4096)/kfree -> 112 cycles
>> 10000 times kmalloc(8192)/kfree -> 112 cycles
>> 10000 times kmalloc(16384)/kfree -> 112 cycles
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160418
>> ---
>>  init/Kconfig |   9 ++++
>>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 174 insertions(+), 1 deletion(-)
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0dfd09d..ee35418 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>>  endchoice
>>
>> +config FREELIST_RANDOM
>> +     default n
>> +     depends on SLAB
>> +     bool "SLAB freelist randomization"
>> +     help
>> +       Randomizes the freelist order used on creating new SLABs. This
>> +       security feature reduces the predictability of the kernel slab
>> +       allocator against heap overflows.
>> +
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b70aabf..8371d80 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>>  #include     <linux/kmemcheck.h>
>>  #include     <linux/memory.h>
>>  #include     <linux/prefetch.h>
>> +#include     <linux/log2.h>
>>
>>  #include     <net/sock.h>
>>
>> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>       }
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/*
>> + * Master lists are pre-computed random lists
>> + * Lists of different sizes are used to optimize performance on SLABS with
>> + * different object counts.
>> + */
>
> If it is for optimization, it would be one option to have separate
> random list for each kmem_cache. It would consume more memory but it
> would be marginal. And, it provides more un-predictability and it can
> give better performance because we don't need state->type (more, less)
> and special handling related for it.
>

I am not sur because major caches are created early at boot time. We still have
the same entropy problem and we are wasting a bit more memory. It will be faster
on usage though but not sure it will be significant.

>> +static freelist_idx_t master_list_2[2];
>> +static freelist_idx_t master_list_4[4];
>> +static freelist_idx_t master_list_8[8];
>> +static freelist_idx_t master_list_16[16];
>> +static freelist_idx_t master_list_32[32];
>> +static freelist_idx_t master_list_64[64];
>> +static freelist_idx_t master_list_128[128];
>> +static freelist_idx_t master_list_256[256];
>> +const static struct m_list {
>> +     size_t count;
>> +     freelist_idx_t *list;
>> +} master_lists[] = {
>> +     { ARRAY_SIZE(master_list_2), master_list_2 },
>> +     { ARRAY_SIZE(master_list_4), master_list_4 },
>> +     { ARRAY_SIZE(master_list_8), master_list_8 },
>> +     { ARRAY_SIZE(master_list_16), master_list_16 },
>> +     { ARRAY_SIZE(master_list_32), master_list_32 },
>> +     { ARRAY_SIZE(master_list_64), master_list_64 },
>> +     { ARRAY_SIZE(master_list_128), master_list_128 },
>> +     { ARRAY_SIZE(master_list_256), master_list_256 },
>> +};
>> +
>> +/* Pre-compute the Freelist master lists at boot */
>> +static void __init freelist_random_init(void)
>> +{
>> +     unsigned int seed;
>> +     size_t z, i, rand;
>> +     struct rnd_state slab_rand;
>> +
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> +     prandom_seed_state(&slab_rand, seed);
>> +
>> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
>> +             for (i = 0; i < master_lists[z].count; i++)
>> +                     master_lists[z].list[i] = i;
>> +
>> +             /* Fisher-Yates shuffle */
>> +             for (i = master_lists[z].count - 1; i > 0; i--) {
>> +                     rand = prandom_u32_state(&slab_rand);
>> +                     rand %= (i + 1);
>> +                     swap(master_lists[z].list[i],
>> +                             master_lists[z].list[rand]);
>> +             }
>> +     }
>> +}
>> +#else
>> +static inline void __init freelist_random_init(void) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>> +
>>  /*
>>   * Initialisation.  Called after the page allocator have been initialised and
>>   * before smp_init().
>> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>>               slab_max_order = SLAB_MAX_ORDER_HI;
>>
>> +     freelist_random_init();
>> +
>>       /* Bootstrap is tricky, because several objects are allocated
>>        * from caches that do not exist yet:
>>        * 1) initialize the kmem_cache cache: it contains the struct
>> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>>  #endif
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/* Identify if the target freelist matches the pre-computed list */
>> +enum master_type {
>> +     match,
>> +     less,
>> +     more
>> +};
>> +
>> +/* Hold information during a freelist initialization */
>> +struct freelist_init_state {
>> +     unsigned int padding;
>> +     unsigned int pos;
>> +     unsigned int count;
>> +     struct m_list master_list;
>> +     unsigned int master_count;
>> +     enum master_type type;
>> +};
>> +
>> +/* Select the right pre-computed master list and initialize state */
>> +static void freelist_state_initialize(struct freelist_init_state *state,
>> +                                   unsigned int count)
>> +{
>> +     unsigned int idx;
>> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
>> +
>> +     memset(state, 0, sizeof(*state));
>> +     state->count = count;
>> +     state->pos = 0;
>
> Using pos = 0 here looks not good in terms of security. In this case,
> every new page having same size class have same sequence of freelist since boot.
>
> How about using random value to set pos? It provides some more randomness
> with minimal overhead.
>

I think it is a good idea. I will add that for the next iteration.

>> +     /* count is always >= 2 */
>> +     idx = ilog2(count) - 1;
>> +     if (idx >= last_idx)
>> +             idx = last_idx;
>> +     else if (roundup_pow_of_two(idx + 1) != count)
>> +             idx++;
>> +     state->master_list = master_lists[idx];
>> +     if (state->master_list.count == state->count)
>> +             state->type = match;
>> +     else if (state->master_list.count > state->count)
>> +             state->type = more;
>> +     else
>> +             state->type = less;
>> +}
>> +
>> +/* Get the next entry on the master list depending on the target list size */
>> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> +{
>> +     if (state->type == less && state->pos == state->master_list.count) {
>> +             state->padding += state->pos;
>> +             state->pos = 0;
>> +     }
>> +     BUG_ON(state->pos >= state->master_list.count);
>> +     return state->master_list.list[state->pos++];
>> +}
>> +
>> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> +{
>> +     freelist_idx_t cur, entry;
>> +
>> +     entry = get_next_entry(state);
>> +
>> +     if (state->type != match) {
>> +             while ((entry + state->padding) >= state->count)
>> +                     entry = get_next_entry(state);
>> +             cur = entry + state->padding;
>> +             BUG_ON(cur >= state->count);
>> +     } else {
>> +             cur = entry;
>> +     }
>> +
>> +     return cur;
>> +}
>> +
>> +/* Shuffle the freelist initialization state based on pre-computed lists */
>> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
>> +                          unsigned int count)
>> +{
>> +     unsigned int i;
>> +     struct freelist_init_state state;
>> +
>> +     if (count < 2) {
>> +             for (i = 0; i < count; i++)
>> +                     set_free_obj(page, i, i);
>> +             return;
>> +     }
>> +
>> +     /* Last chunk is used already in this case */
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             count--;
>> +
>> +     freelist_state_initialize(&state, count);
>> +     for (i = 0; i < count; i++)
>> +             set_free_obj(page, i, next_random_slot(&state));
>> +
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             set_free_obj(page, i, i);
>
> Please consider last object of OBJFREELIST_SLAB cache, too.
>
> freelist_state_init()
> last_obj = next_randome_slot()
> page->freelist = XXX
> for (i = 0; i < count - 1; i++)
>         set_free_obj()
> set_free_obj(last_obj);
>
> Thanks.
>

The current implementation take the last chunk by default before the
freelist is initialized. Do you want it to be randomized as well?

>> +}
>> +#else
>> +static inline void shuffle_freelist(struct kmem_cache *cachep,
>> +                                 struct page *page, unsigned int count) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>>  static void cache_init_objs(struct kmem_cache *cachep,
>>                           struct page *page)
>>  {
>> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>>                       kasan_poison_object_data(cachep, objp);
>>               }
>>
>> -             set_free_obj(page, i, i);
>> +             /* If enabled, initialization is done in shuffle_freelist */
>> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
>> +                     set_free_obj(page, i, i);
>>       }
>> +
>> +     shuffle_freelist(cachep, page, cachep->num);
>>  }
>>
>>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-19 16:44     ` Thomas Garnier
  (?)
@ 2016-04-20  8:08       ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-20  8:08 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Tue, Apr 19, 2016 at 09:44:54AM -0700, Thomas Garnier wrote:
> On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
> >> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> >> SLAB freelist. The list is randomized during initialization of a new set
> >> of pages. The order on different freelist sizes is pre-computed at boot
> >> for performance. This security feature reduces the predictability of the
> >> kernel SLAB allocator against heap overflows rendering attacks much less
> >> stable.
> >
> > I'm not familiar on security but it doesn't look much secure than
> > before. Is there any other way to generate different sequence of freelist
> > for each new set of pages? Current approach using pre-computed array will
> > generate same sequence of freelist for all new set of pages having same size
> > class. Is it sufficient?
> >
> 
> I think it is sufficient. There is a tradeoff for performance. We could randomly
> pick an object from the freelist every time (on slab_get_obj) but I
> think it will
> have significant impact (at least 3%).
> 
> >> For example this attack against SLUB (also applicable against SLAB)
> >> would be affected:
> >> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> >>
> >> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> >> a controllable heap is opened to new attacks not yet publicly discussed.
> >> A kernel heap overflow can be transformed to multiple use-after-free.
> >> This feature makes this type of attack harder too.
> >>
> >> To generate entropy, we use get_random_bytes_arch because 0 bits of
> >> entropy is available at that boot stage. In the worse case this function
> >> will fallback to the get_random_bytes sub API.
> >>
> >> The config option name is not specific to the SLAB as this approach will
> >> be extended to other allocators like SLUB.
> >
> > If this feature will be applied to the SLUB, it's better to put common
> > code to mm/slab_common.c.
> >
> 
> I think it might be moved there once we implement the SLUB counterpart
> but it is too early to define which part will be common.
> 
> >>
> >> Performance results highlighted no major changes:
> >>
> >> Netperf average on 10 runs:
> >>
> >> threads,base,change
> >> 16,576943.10,585905.90 (101.55%)
> >> 32,564082.00,569741.20 (101.00%)
> >> 48,558334.30,561851.20 (100.63%)
> >> 64,552025.20,556448.30 (100.80%)
> >> 80,552294.40,551743.10 (99.90%)
> >> 96,552435.30,547529.20 (99.11%)
> >> 112,551320.60,550183.20 (99.79%)
> >> 128,549138.30,550542.70 (100.26%)
> >> 144,549344.50,544529.10 (99.12%)
> >> 160,550360.80,539929.30 (98.10%)
> >>
> >> slab_test 1 run on boot. After is faster except for odd result on size
> >> 2048.
> >
> > Hmm... It's odd result. It adds more logic and it should
> > decrease performance. I guess it would be experimental error but
> > do you have any analysis about this result?
> >
> 
> I don't. I am glad to redo the test. I found that slab_test has very different
> result based on the heap state at the time of the test. If I run the
> test multiple
> times, I have really various results on with or without the mitigation (on
> dedicated hardware).
> 
> >>
> >> Before:
> >>
> >> Single thread testing
> >> =====================
> >> 1. Kmalloc: Repeatedly allocate then free test
> >> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
> >> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
> >> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
> >> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
> >> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
> >> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
> >> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
> >> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
> >> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
> >> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
> >> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
> >> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
> >> 2. Kmalloc: alloc/free test
> >> 10000 times kmalloc(8)/kfree -> 118 cycles
> >> 10000 times kmalloc(16)/kfree -> 118 cycles
> >> 10000 times kmalloc(32)/kfree -> 118 cycles
> >> 10000 times kmalloc(64)/kfree -> 121 cycles
> >> 10000 times kmalloc(128)/kfree -> 118 cycles
> >> 10000 times kmalloc(256)/kfree -> 115 cycles
> >> 10000 times kmalloc(512)/kfree -> 115 cycles
> >> 10000 times kmalloc(1024)/kfree -> 115 cycles
> >> 10000 times kmalloc(2048)/kfree -> 115 cycles
> >> 10000 times kmalloc(4096)/kfree -> 115 cycles
> >> 10000 times kmalloc(8192)/kfree -> 115 cycles
> >> 10000 times kmalloc(16384)/kfree -> 115 cycles
> >>
> >> After:
> >>
> >> Single thread testing
> >> =====================
> >> 1. Kmalloc: Repeatedly allocate then free test
> >> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
> >> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
> >> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
> >> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
> >> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
> >> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
> >> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
> >> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
> >> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
> >> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
> >> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
> >> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
> >> 2. Kmalloc: alloc/free test
> >> 10000 times kmalloc(8)/kfree -> 115 cycles
> >> 10000 times kmalloc(16)/kfree -> 115 cycles
> >> 10000 times kmalloc(32)/kfree -> 115 cycles
> >> 10000 times kmalloc(64)/kfree -> 120 cycles
> >> 10000 times kmalloc(128)/kfree -> 127 cycles
> >> 10000 times kmalloc(256)/kfree -> 119 cycles
> >> 10000 times kmalloc(512)/kfree -> 112 cycles
> >> 10000 times kmalloc(1024)/kfree -> 112 cycles
> >> 10000 times kmalloc(2048)/kfree -> 112 cycles
> >> 10000 times kmalloc(4096)/kfree -> 112 cycles
> >> 10000 times kmalloc(8192)/kfree -> 112 cycles
> >> 10000 times kmalloc(16384)/kfree -> 112 cycles
> >>
> >> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> >> ---
> >> Based on next-20160418
> >> ---
> >>  init/Kconfig |   9 ++++
> >>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 174 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 0dfd09d..ee35418 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -1742,6 +1742,15 @@ config SLOB
> >>
> >>  endchoice
> >>
> >> +config FREELIST_RANDOM
> >> +     default n
> >> +     depends on SLAB
> >> +     bool "SLAB freelist randomization"
> >> +     help
> >> +       Randomizes the freelist order used on creating new SLABs. This
> >> +       security feature reduces the predictability of the kernel slab
> >> +       allocator against heap overflows.
> >> +
> >>  config SLUB_CPU_PARTIAL
> >>       default y
> >>       depends on SLUB && SMP
> >> diff --git a/mm/slab.c b/mm/slab.c
> >> index b70aabf..8371d80 100644
> >> --- a/mm/slab.c
> >> +++ b/mm/slab.c
> >> @@ -116,6 +116,7 @@
> >>  #include     <linux/kmemcheck.h>
> >>  #include     <linux/memory.h>
> >>  #include     <linux/prefetch.h>
> >> +#include     <linux/log2.h>
> >>
> >>  #include     <net/sock.h>
> >>
> >> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
> >>       }
> >>  }
> >>
> >> +#ifdef CONFIG_FREELIST_RANDOM
> >> +/*
> >> + * Master lists are pre-computed random lists
> >> + * Lists of different sizes are used to optimize performance on SLABS with
> >> + * different object counts.
> >> + */
> >
> > If it is for optimization, it would be one option to have separate
> > random list for each kmem_cache. It would consume more memory but it
> > would be marginal. And, it provides more un-predictability and it can
> > give better performance because we don't need state->type (more, less)
> > and special handling related for it.
> >
> 
> I am not sur because major caches are created early at boot time. We still have
> the same entropy problem and we are wasting a bit more memory. It will be faster

I think that entropy problem is another issue. It should be considered
separately. If it is solved, making per-computed array for each
kmem_cache will provide more un-predictability. If someone who succeed to
exploit some kmem_cache with 128 object per slab want to exploit
another kmem_cache with 128 object per slab, this separate pre-computed array
will be helpful.

> on usage though but not sure it will be significant.

I also think it's not significant. But, besides performance effect,
code doesn't look very attractive and extendable. In case of SLUB,
there is setup_slub_max_order option and object per slab could be larger
than 256. To deal with it, we need to add many more static definition
and it looks not good to me. Please use dynamic allocated memory
instead of static array definition.

> 
> >> +static freelist_idx_t master_list_2[2];
> >> +static freelist_idx_t master_list_4[4];
> >> +static freelist_idx_t master_list_8[8];
> >> +static freelist_idx_t master_list_16[16];
> >> +static freelist_idx_t master_list_32[32];
> >> +static freelist_idx_t master_list_64[64];
> >> +static freelist_idx_t master_list_128[128];
> >> +static freelist_idx_t master_list_256[256];
> >> +const static struct m_list {
> >> +     size_t count;
> >> +     freelist_idx_t *list;
> >> +} master_lists[] = {
> >> +     { ARRAY_SIZE(master_list_2), master_list_2 },
> >> +     { ARRAY_SIZE(master_list_4), master_list_4 },
> >> +     { ARRAY_SIZE(master_list_8), master_list_8 },
> >> +     { ARRAY_SIZE(master_list_16), master_list_16 },
> >> +     { ARRAY_SIZE(master_list_32), master_list_32 },
> >> +     { ARRAY_SIZE(master_list_64), master_list_64 },
> >> +     { ARRAY_SIZE(master_list_128), master_list_128 },
> >> +     { ARRAY_SIZE(master_list_256), master_list_256 },
> >> +};
> >> +
> >> +/* Pre-compute the Freelist master lists at boot */
> >> +static void __init freelist_random_init(void)
> >> +{
> >> +     unsigned int seed;
> >> +     size_t z, i, rand;
> >> +     struct rnd_state slab_rand;
> >> +
> >> +     get_random_bytes_arch(&seed, sizeof(seed));
> >> +     prandom_seed_state(&slab_rand, seed);
> >> +
> >> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
> >> +             for (i = 0; i < master_lists[z].count; i++)
> >> +                     master_lists[z].list[i] = i;
> >> +
> >> +             /* Fisher-Yates shuffle */
> >> +             for (i = master_lists[z].count - 1; i > 0; i--) {
> >> +                     rand = prandom_u32_state(&slab_rand);
> >> +                     rand %= (i + 1);
> >> +                     swap(master_lists[z].list[i],
> >> +                             master_lists[z].list[rand]);
> >> +             }
> >> +     }
> >> +}
> >> +#else
> >> +static inline void __init freelist_random_init(void) { }
> >> +#endif /* CONFIG_FREELIST_RANDOM */
> >> +
> >> +
> >>  /*
> >>   * Initialisation.  Called after the page allocator have been initialised and
> >>   * before smp_init().
> >> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
> >>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
> >>               slab_max_order = SLAB_MAX_ORDER_HI;
> >>
> >> +     freelist_random_init();
> >> +
> >>       /* Bootstrap is tricky, because several objects are allocated
> >>        * from caches that do not exist yet:
> >>        * 1) initialize the kmem_cache cache: it contains the struct
> >> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
> >>  #endif
> >>  }
> >>
> >> +#ifdef CONFIG_FREELIST_RANDOM
> >> +/* Identify if the target freelist matches the pre-computed list */
> >> +enum master_type {
> >> +     match,
> >> +     less,
> >> +     more
> >> +};
> >> +
> >> +/* Hold information during a freelist initialization */
> >> +struct freelist_init_state {
> >> +     unsigned int padding;
> >> +     unsigned int pos;
> >> +     unsigned int count;
> >> +     struct m_list master_list;
> >> +     unsigned int master_count;
> >> +     enum master_type type;
> >> +};
> >> +
> >> +/* Select the right pre-computed master list and initialize state */
> >> +static void freelist_state_initialize(struct freelist_init_state *state,
> >> +                                   unsigned int count)
> >> +{
> >> +     unsigned int idx;
> >> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
> >> +
> >> +     memset(state, 0, sizeof(*state));
> >> +     state->count = count;
> >> +     state->pos = 0;
> >
> > Using pos = 0 here looks not good in terms of security. In this case,
> > every new page having same size class have same sequence of freelist since boot.
> >
> > How about using random value to set pos? It provides some more randomness
> > with minimal overhead.
> >
> 
> I think it is a good idea. I will add that for the next iteration.
> 
> >> +     /* count is always >= 2 */
> >> +     idx = ilog2(count) - 1;
> >> +     if (idx >= last_idx)
> >> +             idx = last_idx;
> >> +     else if (roundup_pow_of_two(idx + 1) != count)
> >> +             idx++;
> >> +     state->master_list = master_lists[idx];
> >> +     if (state->master_list.count == state->count)
> >> +             state->type = match;
> >> +     else if (state->master_list.count > state->count)
> >> +             state->type = more;
> >> +     else
> >> +             state->type = less;
> >> +}
> >> +
> >> +/* Get the next entry on the master list depending on the target list size */
> >> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> >> +{
> >> +     if (state->type == less && state->pos == state->master_list.count) {
> >> +             state->padding += state->pos;
> >> +             state->pos = 0;
> >> +     }
> >> +     BUG_ON(state->pos >= state->master_list.count);
> >> +     return state->master_list.list[state->pos++];
> >> +}
> >> +
> >> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> >> +{
> >> +     freelist_idx_t cur, entry;
> >> +
> >> +     entry = get_next_entry(state);
> >> +
> >> +     if (state->type != match) {
> >> +             while ((entry + state->padding) >= state->count)
> >> +                     entry = get_next_entry(state);
> >> +             cur = entry + state->padding;
> >> +             BUG_ON(cur >= state->count);
> >> +     } else {
> >> +             cur = entry;
> >> +     }
> >> +
> >> +     return cur;
> >> +}
> >> +
> >> +/* Shuffle the freelist initialization state based on pre-computed lists */
> >> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
> >> +                          unsigned int count)
> >> +{
> >> +     unsigned int i;
> >> +     struct freelist_init_state state;
> >> +
> >> +     if (count < 2) {
> >> +             for (i = 0; i < count; i++)
> >> +                     set_free_obj(page, i, i);
> >> +             return;
> >> +     }
> >> +
> >> +     /* Last chunk is used already in this case */
> >> +     if (OBJFREELIST_SLAB(cachep))
> >> +             count--;
> >> +
> >> +     freelist_state_initialize(&state, count);
> >> +     for (i = 0; i < count; i++)
> >> +             set_free_obj(page, i, next_random_slot(&state));
> >> +
> >> +     if (OBJFREELIST_SLAB(cachep))
> >> +             set_free_obj(page, i, i);
> >
> > Please consider last object of OBJFREELIST_SLAB cache, too.
> >
> > freelist_state_init()
> > last_obj = next_randome_slot()
> > page->freelist = XXX
> > for (i = 0; i < count - 1; i++)
> >         set_free_obj()
> > set_free_obj(last_obj);
> >
> > Thanks.
> >
> 
> The current implementation take the last chunk by default before the
> freelist is initialized. Do you want it to be randomized as well?

Yes.

Thanks.

> 
> >> +}
> >> +#else
> >> +static inline void shuffle_freelist(struct kmem_cache *cachep,
> >> +                                 struct page *page, unsigned int count) { }
> >> +#endif /* CONFIG_FREELIST_RANDOM */
> >> +
> >>  static void cache_init_objs(struct kmem_cache *cachep,
> >>                           struct page *page)
> >>  {
> >> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
> >>                       kasan_poison_object_data(cachep, objp);
> >>               }
> >>
> >> -             set_free_obj(page, i, i);
> >> +             /* If enabled, initialization is done in shuffle_freelist */
> >> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
> >> +                     set_free_obj(page, i, i);
> >>       }
> >> +
> >> +     shuffle_freelist(cachep, page, cachep->num);
> >>  }
> >>
> >>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
> >> --
> >> 2.8.0.rc3.226.g39d4020
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-20  8:08       ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-20  8:08 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Tue, Apr 19, 2016 at 09:44:54AM -0700, Thomas Garnier wrote:
> On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
> >> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> >> SLAB freelist. The list is randomized during initialization of a new set
> >> of pages. The order on different freelist sizes is pre-computed at boot
> >> for performance. This security feature reduces the predictability of the
> >> kernel SLAB allocator against heap overflows rendering attacks much less
> >> stable.
> >
> > I'm not familiar on security but it doesn't look much secure than
> > before. Is there any other way to generate different sequence of freelist
> > for each new set of pages? Current approach using pre-computed array will
> > generate same sequence of freelist for all new set of pages having same size
> > class. Is it sufficient?
> >
> 
> I think it is sufficient. There is a tradeoff for performance. We could randomly
> pick an object from the freelist every time (on slab_get_obj) but I
> think it will
> have significant impact (at least 3%).
> 
> >> For example this attack against SLUB (also applicable against SLAB)
> >> would be affected:
> >> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> >>
> >> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> >> a controllable heap is opened to new attacks not yet publicly discussed.
> >> A kernel heap overflow can be transformed to multiple use-after-free.
> >> This feature makes this type of attack harder too.
> >>
> >> To generate entropy, we use get_random_bytes_arch because 0 bits of
> >> entropy is available at that boot stage. In the worse case this function
> >> will fallback to the get_random_bytes sub API.
> >>
> >> The config option name is not specific to the SLAB as this approach will
> >> be extended to other allocators like SLUB.
> >
> > If this feature will be applied to the SLUB, it's better to put common
> > code to mm/slab_common.c.
> >
> 
> I think it might be moved there once we implement the SLUB counterpart
> but it is too early to define which part will be common.
> 
> >>
> >> Performance results highlighted no major changes:
> >>
> >> Netperf average on 10 runs:
> >>
> >> threads,base,change
> >> 16,576943.10,585905.90 (101.55%)
> >> 32,564082.00,569741.20 (101.00%)
> >> 48,558334.30,561851.20 (100.63%)
> >> 64,552025.20,556448.30 (100.80%)
> >> 80,552294.40,551743.10 (99.90%)
> >> 96,552435.30,547529.20 (99.11%)
> >> 112,551320.60,550183.20 (99.79%)
> >> 128,549138.30,550542.70 (100.26%)
> >> 144,549344.50,544529.10 (99.12%)
> >> 160,550360.80,539929.30 (98.10%)
> >>
> >> slab_test 1 run on boot. After is faster except for odd result on size
> >> 2048.
> >
> > Hmm... It's odd result. It adds more logic and it should
> > decrease performance. I guess it would be experimental error but
> > do you have any analysis about this result?
> >
> 
> I don't. I am glad to redo the test. I found that slab_test has very different
> result based on the heap state at the time of the test. If I run the
> test multiple
> times, I have really various results on with or without the mitigation (on
> dedicated hardware).
> 
> >>
> >> Before:
> >>
> >> Single thread testing
> >> =====================
> >> 1. Kmalloc: Repeatedly allocate then free test
> >> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
> >> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
> >> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
> >> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
> >> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
> >> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
> >> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
> >> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
> >> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
> >> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
> >> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
> >> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
> >> 2. Kmalloc: alloc/free test
> >> 10000 times kmalloc(8)/kfree -> 118 cycles
> >> 10000 times kmalloc(16)/kfree -> 118 cycles
> >> 10000 times kmalloc(32)/kfree -> 118 cycles
> >> 10000 times kmalloc(64)/kfree -> 121 cycles
> >> 10000 times kmalloc(128)/kfree -> 118 cycles
> >> 10000 times kmalloc(256)/kfree -> 115 cycles
> >> 10000 times kmalloc(512)/kfree -> 115 cycles
> >> 10000 times kmalloc(1024)/kfree -> 115 cycles
> >> 10000 times kmalloc(2048)/kfree -> 115 cycles
> >> 10000 times kmalloc(4096)/kfree -> 115 cycles
> >> 10000 times kmalloc(8192)/kfree -> 115 cycles
> >> 10000 times kmalloc(16384)/kfree -> 115 cycles
> >>
> >> After:
> >>
> >> Single thread testing
> >> =====================
> >> 1. Kmalloc: Repeatedly allocate then free test
> >> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
> >> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
> >> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
> >> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
> >> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
> >> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
> >> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
> >> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
> >> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
> >> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
> >> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
> >> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
> >> 2. Kmalloc: alloc/free test
> >> 10000 times kmalloc(8)/kfree -> 115 cycles
> >> 10000 times kmalloc(16)/kfree -> 115 cycles
> >> 10000 times kmalloc(32)/kfree -> 115 cycles
> >> 10000 times kmalloc(64)/kfree -> 120 cycles
> >> 10000 times kmalloc(128)/kfree -> 127 cycles
> >> 10000 times kmalloc(256)/kfree -> 119 cycles
> >> 10000 times kmalloc(512)/kfree -> 112 cycles
> >> 10000 times kmalloc(1024)/kfree -> 112 cycles
> >> 10000 times kmalloc(2048)/kfree -> 112 cycles
> >> 10000 times kmalloc(4096)/kfree -> 112 cycles
> >> 10000 times kmalloc(8192)/kfree -> 112 cycles
> >> 10000 times kmalloc(16384)/kfree -> 112 cycles
> >>
> >> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> >> ---
> >> Based on next-20160418
> >> ---
> >>  init/Kconfig |   9 ++++
> >>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 174 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 0dfd09d..ee35418 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -1742,6 +1742,15 @@ config SLOB
> >>
> >>  endchoice
> >>
> >> +config FREELIST_RANDOM
> >> +     default n
> >> +     depends on SLAB
> >> +     bool "SLAB freelist randomization"
> >> +     help
> >> +       Randomizes the freelist order used on creating new SLABs. This
> >> +       security feature reduces the predictability of the kernel slab
> >> +       allocator against heap overflows.
> >> +
> >>  config SLUB_CPU_PARTIAL
> >>       default y
> >>       depends on SLUB && SMP
> >> diff --git a/mm/slab.c b/mm/slab.c
> >> index b70aabf..8371d80 100644
> >> --- a/mm/slab.c
> >> +++ b/mm/slab.c
> >> @@ -116,6 +116,7 @@
> >>  #include     <linux/kmemcheck.h>
> >>  #include     <linux/memory.h>
> >>  #include     <linux/prefetch.h>
> >> +#include     <linux/log2.h>
> >>
> >>  #include     <net/sock.h>
> >>
> >> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
> >>       }
> >>  }
> >>
> >> +#ifdef CONFIG_FREELIST_RANDOM
> >> +/*
> >> + * Master lists are pre-computed random lists
> >> + * Lists of different sizes are used to optimize performance on SLABS with
> >> + * different object counts.
> >> + */
> >
> > If it is for optimization, it would be one option to have separate
> > random list for each kmem_cache. It would consume more memory but it
> > would be marginal. And, it provides more un-predictability and it can
> > give better performance because we don't need state->type (more, less)
> > and special handling related for it.
> >
> 
> I am not sur because major caches are created early at boot time. We still have
> the same entropy problem and we are wasting a bit more memory. It will be faster

I think that entropy problem is another issue. It should be considered
separately. If it is solved, making per-computed array for each
kmem_cache will provide more un-predictability. If someone who succeed to
exploit some kmem_cache with 128 object per slab want to exploit
another kmem_cache with 128 object per slab, this separate pre-computed array
will be helpful.

> on usage though but not sure it will be significant.

I also think it's not significant. But, besides performance effect,
code doesn't look very attractive and extendable. In case of SLUB,
there is setup_slub_max_order option and object per slab could be larger
than 256. To deal with it, we need to add many more static definition
and it looks not good to me. Please use dynamic allocated memory
instead of static array definition.

> 
> >> +static freelist_idx_t master_list_2[2];
> >> +static freelist_idx_t master_list_4[4];
> >> +static freelist_idx_t master_list_8[8];
> >> +static freelist_idx_t master_list_16[16];
> >> +static freelist_idx_t master_list_32[32];
> >> +static freelist_idx_t master_list_64[64];
> >> +static freelist_idx_t master_list_128[128];
> >> +static freelist_idx_t master_list_256[256];
> >> +const static struct m_list {
> >> +     size_t count;
> >> +     freelist_idx_t *list;
> >> +} master_lists[] = {
> >> +     { ARRAY_SIZE(master_list_2), master_list_2 },
> >> +     { ARRAY_SIZE(master_list_4), master_list_4 },
> >> +     { ARRAY_SIZE(master_list_8), master_list_8 },
> >> +     { ARRAY_SIZE(master_list_16), master_list_16 },
> >> +     { ARRAY_SIZE(master_list_32), master_list_32 },
> >> +     { ARRAY_SIZE(master_list_64), master_list_64 },
> >> +     { ARRAY_SIZE(master_list_128), master_list_128 },
> >> +     { ARRAY_SIZE(master_list_256), master_list_256 },
> >> +};
> >> +
> >> +/* Pre-compute the Freelist master lists at boot */
> >> +static void __init freelist_random_init(void)
> >> +{
> >> +     unsigned int seed;
> >> +     size_t z, i, rand;
> >> +     struct rnd_state slab_rand;
> >> +
> >> +     get_random_bytes_arch(&seed, sizeof(seed));
> >> +     prandom_seed_state(&slab_rand, seed);
> >> +
> >> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
> >> +             for (i = 0; i < master_lists[z].count; i++)
> >> +                     master_lists[z].list[i] = i;
> >> +
> >> +             /* Fisher-Yates shuffle */
> >> +             for (i = master_lists[z].count - 1; i > 0; i--) {
> >> +                     rand = prandom_u32_state(&slab_rand);
> >> +                     rand %= (i + 1);
> >> +                     swap(master_lists[z].list[i],
> >> +                             master_lists[z].list[rand]);
> >> +             }
> >> +     }
> >> +}
> >> +#else
> >> +static inline void __init freelist_random_init(void) { }
> >> +#endif /* CONFIG_FREELIST_RANDOM */
> >> +
> >> +
> >>  /*
> >>   * Initialisation.  Called after the page allocator have been initialised and
> >>   * before smp_init().
> >> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
> >>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
> >>               slab_max_order = SLAB_MAX_ORDER_HI;
> >>
> >> +     freelist_random_init();
> >> +
> >>       /* Bootstrap is tricky, because several objects are allocated
> >>        * from caches that do not exist yet:
> >>        * 1) initialize the kmem_cache cache: it contains the struct
> >> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
> >>  #endif
> >>  }
> >>
> >> +#ifdef CONFIG_FREELIST_RANDOM
> >> +/* Identify if the target freelist matches the pre-computed list */
> >> +enum master_type {
> >> +     match,
> >> +     less,
> >> +     more
> >> +};
> >> +
> >> +/* Hold information during a freelist initialization */
> >> +struct freelist_init_state {
> >> +     unsigned int padding;
> >> +     unsigned int pos;
> >> +     unsigned int count;
> >> +     struct m_list master_list;
> >> +     unsigned int master_count;
> >> +     enum master_type type;
> >> +};
> >> +
> >> +/* Select the right pre-computed master list and initialize state */
> >> +static void freelist_state_initialize(struct freelist_init_state *state,
> >> +                                   unsigned int count)
> >> +{
> >> +     unsigned int idx;
> >> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
> >> +
> >> +     memset(state, 0, sizeof(*state));
> >> +     state->count = count;
> >> +     state->pos = 0;
> >
> > Using pos = 0 here looks not good in terms of security. In this case,
> > every new page having same size class have same sequence of freelist since boot.
> >
> > How about using random value to set pos? It provides some more randomness
> > with minimal overhead.
> >
> 
> I think it is a good idea. I will add that for the next iteration.
> 
> >> +     /* count is always >= 2 */
> >> +     idx = ilog2(count) - 1;
> >> +     if (idx >= last_idx)
> >> +             idx = last_idx;
> >> +     else if (roundup_pow_of_two(idx + 1) != count)
> >> +             idx++;
> >> +     state->master_list = master_lists[idx];
> >> +     if (state->master_list.count == state->count)
> >> +             state->type = match;
> >> +     else if (state->master_list.count > state->count)
> >> +             state->type = more;
> >> +     else
> >> +             state->type = less;
> >> +}
> >> +
> >> +/* Get the next entry on the master list depending on the target list size */
> >> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> >> +{
> >> +     if (state->type == less && state->pos == state->master_list.count) {
> >> +             state->padding += state->pos;
> >> +             state->pos = 0;
> >> +     }
> >> +     BUG_ON(state->pos >= state->master_list.count);
> >> +     return state->master_list.list[state->pos++];
> >> +}
> >> +
> >> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> >> +{
> >> +     freelist_idx_t cur, entry;
> >> +
> >> +     entry = get_next_entry(state);
> >> +
> >> +     if (state->type != match) {
> >> +             while ((entry + state->padding) >= state->count)
> >> +                     entry = get_next_entry(state);
> >> +             cur = entry + state->padding;
> >> +             BUG_ON(cur >= state->count);
> >> +     } else {
> >> +             cur = entry;
> >> +     }
> >> +
> >> +     return cur;
> >> +}
> >> +
> >> +/* Shuffle the freelist initialization state based on pre-computed lists */
> >> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
> >> +                          unsigned int count)
> >> +{
> >> +     unsigned int i;
> >> +     struct freelist_init_state state;
> >> +
> >> +     if (count < 2) {
> >> +             for (i = 0; i < count; i++)
> >> +                     set_free_obj(page, i, i);
> >> +             return;
> >> +     }
> >> +
> >> +     /* Last chunk is used already in this case */
> >> +     if (OBJFREELIST_SLAB(cachep))
> >> +             count--;
> >> +
> >> +     freelist_state_initialize(&state, count);
> >> +     for (i = 0; i < count; i++)
> >> +             set_free_obj(page, i, next_random_slot(&state));
> >> +
> >> +     if (OBJFREELIST_SLAB(cachep))
> >> +             set_free_obj(page, i, i);
> >
> > Please consider last object of OBJFREELIST_SLAB cache, too.
> >
> > freelist_state_init()
> > last_obj = next_randome_slot()
> > page->freelist = XXX
> > for (i = 0; i < count - 1; i++)
> >         set_free_obj()
> > set_free_obj(last_obj);
> >
> > Thanks.
> >
> 
> The current implementation take the last chunk by default before the
> freelist is initialized. Do you want it to be randomized as well?

Yes.

Thanks.

> 
> >> +}
> >> +#else
> >> +static inline void shuffle_freelist(struct kmem_cache *cachep,
> >> +                                 struct page *page, unsigned int count) { }
> >> +#endif /* CONFIG_FREELIST_RANDOM */
> >> +
> >>  static void cache_init_objs(struct kmem_cache *cachep,
> >>                           struct page *page)
> >>  {
> >> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
> >>                       kasan_poison_object_data(cachep, objp);
> >>               }
> >>
> >> -             set_free_obj(page, i, i);
> >> +             /* If enabled, initialization is done in shuffle_freelist */
> >> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
> >> +                     set_free_obj(page, i, i);
> >>       }
> >> +
> >> +     shuffle_freelist(cachep, page, cachep->num);
> >>  }
> >>
> >>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
> >> --
> >> 2.8.0.rc3.226.g39d4020
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [kernel-hardening] Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-20  8:08       ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-20  8:08 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Tue, Apr 19, 2016 at 09:44:54AM -0700, Thomas Garnier wrote:
> On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
> >> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> >> SLAB freelist. The list is randomized during initialization of a new set
> >> of pages. The order on different freelist sizes is pre-computed at boot
> >> for performance. This security feature reduces the predictability of the
> >> kernel SLAB allocator against heap overflows rendering attacks much less
> >> stable.
> >
> > I'm not familiar on security but it doesn't look much secure than
> > before. Is there any other way to generate different sequence of freelist
> > for each new set of pages? Current approach using pre-computed array will
> > generate same sequence of freelist for all new set of pages having same size
> > class. Is it sufficient?
> >
> 
> I think it is sufficient. There is a tradeoff for performance. We could randomly
> pick an object from the freelist every time (on slab_get_obj) but I
> think it will
> have significant impact (at least 3%).
> 
> >> For example this attack against SLUB (also applicable against SLAB)
> >> would be affected:
> >> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> >>
> >> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> >> a controllable heap is opened to new attacks not yet publicly discussed.
> >> A kernel heap overflow can be transformed to multiple use-after-free.
> >> This feature makes this type of attack harder too.
> >>
> >> To generate entropy, we use get_random_bytes_arch because 0 bits of
> >> entropy is available at that boot stage. In the worse case this function
> >> will fallback to the get_random_bytes sub API.
> >>
> >> The config option name is not specific to the SLAB as this approach will
> >> be extended to other allocators like SLUB.
> >
> > If this feature will be applied to the SLUB, it's better to put common
> > code to mm/slab_common.c.
> >
> 
> I think it might be moved there once we implement the SLUB counterpart
> but it is too early to define which part will be common.
> 
> >>
> >> Performance results highlighted no major changes:
> >>
> >> Netperf average on 10 runs:
> >>
> >> threads,base,change
> >> 16,576943.10,585905.90 (101.55%)
> >> 32,564082.00,569741.20 (101.00%)
> >> 48,558334.30,561851.20 (100.63%)
> >> 64,552025.20,556448.30 (100.80%)
> >> 80,552294.40,551743.10 (99.90%)
> >> 96,552435.30,547529.20 (99.11%)
> >> 112,551320.60,550183.20 (99.79%)
> >> 128,549138.30,550542.70 (100.26%)
> >> 144,549344.50,544529.10 (99.12%)
> >> 160,550360.80,539929.30 (98.10%)
> >>
> >> slab_test 1 run on boot. After is faster except for odd result on size
> >> 2048.
> >
> > Hmm... It's odd result. It adds more logic and it should
> > decrease performance. I guess it would be experimental error but
> > do you have any analysis about this result?
> >
> 
> I don't. I am glad to redo the test. I found that slab_test has very different
> result based on the heap state at the time of the test. If I run the
> test multiple
> times, I have really various results on with or without the mitigation (on
> dedicated hardware).
> 
> >>
> >> Before:
> >>
> >> Single thread testing
> >> =====================
> >> 1. Kmalloc: Repeatedly allocate then free test
> >> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
> >> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
> >> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
> >> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
> >> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
> >> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
> >> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
> >> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
> >> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
> >> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
> >> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
> >> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
> >> 2. Kmalloc: alloc/free test
> >> 10000 times kmalloc(8)/kfree -> 118 cycles
> >> 10000 times kmalloc(16)/kfree -> 118 cycles
> >> 10000 times kmalloc(32)/kfree -> 118 cycles
> >> 10000 times kmalloc(64)/kfree -> 121 cycles
> >> 10000 times kmalloc(128)/kfree -> 118 cycles
> >> 10000 times kmalloc(256)/kfree -> 115 cycles
> >> 10000 times kmalloc(512)/kfree -> 115 cycles
> >> 10000 times kmalloc(1024)/kfree -> 115 cycles
> >> 10000 times kmalloc(2048)/kfree -> 115 cycles
> >> 10000 times kmalloc(4096)/kfree -> 115 cycles
> >> 10000 times kmalloc(8192)/kfree -> 115 cycles
> >> 10000 times kmalloc(16384)/kfree -> 115 cycles
> >>
> >> After:
> >>
> >> Single thread testing
> >> =====================
> >> 1. Kmalloc: Repeatedly allocate then free test
> >> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
> >> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
> >> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
> >> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
> >> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
> >> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
> >> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
> >> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
> >> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
> >> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
> >> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
> >> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
> >> 2. Kmalloc: alloc/free test
> >> 10000 times kmalloc(8)/kfree -> 115 cycles
> >> 10000 times kmalloc(16)/kfree -> 115 cycles
> >> 10000 times kmalloc(32)/kfree -> 115 cycles
> >> 10000 times kmalloc(64)/kfree -> 120 cycles
> >> 10000 times kmalloc(128)/kfree -> 127 cycles
> >> 10000 times kmalloc(256)/kfree -> 119 cycles
> >> 10000 times kmalloc(512)/kfree -> 112 cycles
> >> 10000 times kmalloc(1024)/kfree -> 112 cycles
> >> 10000 times kmalloc(2048)/kfree -> 112 cycles
> >> 10000 times kmalloc(4096)/kfree -> 112 cycles
> >> 10000 times kmalloc(8192)/kfree -> 112 cycles
> >> 10000 times kmalloc(16384)/kfree -> 112 cycles
> >>
> >> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> >> ---
> >> Based on next-20160418
> >> ---
> >>  init/Kconfig |   9 ++++
> >>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 174 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 0dfd09d..ee35418 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -1742,6 +1742,15 @@ config SLOB
> >>
> >>  endchoice
> >>
> >> +config FREELIST_RANDOM
> >> +     default n
> >> +     depends on SLAB
> >> +     bool "SLAB freelist randomization"
> >> +     help
> >> +       Randomizes the freelist order used on creating new SLABs. This
> >> +       security feature reduces the predictability of the kernel slab
> >> +       allocator against heap overflows.
> >> +
> >>  config SLUB_CPU_PARTIAL
> >>       default y
> >>       depends on SLUB && SMP
> >> diff --git a/mm/slab.c b/mm/slab.c
> >> index b70aabf..8371d80 100644
> >> --- a/mm/slab.c
> >> +++ b/mm/slab.c
> >> @@ -116,6 +116,7 @@
> >>  #include     <linux/kmemcheck.h>
> >>  #include     <linux/memory.h>
> >>  #include     <linux/prefetch.h>
> >> +#include     <linux/log2.h>
> >>
> >>  #include     <net/sock.h>
> >>
> >> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
> >>       }
> >>  }
> >>
> >> +#ifdef CONFIG_FREELIST_RANDOM
> >> +/*
> >> + * Master lists are pre-computed random lists
> >> + * Lists of different sizes are used to optimize performance on SLABS with
> >> + * different object counts.
> >> + */
> >
> > If it is for optimization, it would be one option to have separate
> > random list for each kmem_cache. It would consume more memory but it
> > would be marginal. And, it provides more un-predictability and it can
> > give better performance because we don't need state->type (more, less)
> > and special handling related for it.
> >
> 
> I am not sur because major caches are created early at boot time. We still have
> the same entropy problem and we are wasting a bit more memory. It will be faster

I think that entropy problem is another issue. It should be considered
separately. If it is solved, making per-computed array for each
kmem_cache will provide more un-predictability. If someone who succeed to
exploit some kmem_cache with 128 object per slab want to exploit
another kmem_cache with 128 object per slab, this separate pre-computed array
will be helpful.

> on usage though but not sure it will be significant.

I also think it's not significant. But, besides performance effect,
code doesn't look very attractive and extendable. In case of SLUB,
there is setup_slub_max_order option and object per slab could be larger
than 256. To deal with it, we need to add many more static definition
and it looks not good to me. Please use dynamic allocated memory
instead of static array definition.

> 
> >> +static freelist_idx_t master_list_2[2];
> >> +static freelist_idx_t master_list_4[4];
> >> +static freelist_idx_t master_list_8[8];
> >> +static freelist_idx_t master_list_16[16];
> >> +static freelist_idx_t master_list_32[32];
> >> +static freelist_idx_t master_list_64[64];
> >> +static freelist_idx_t master_list_128[128];
> >> +static freelist_idx_t master_list_256[256];
> >> +const static struct m_list {
> >> +     size_t count;
> >> +     freelist_idx_t *list;
> >> +} master_lists[] = {
> >> +     { ARRAY_SIZE(master_list_2), master_list_2 },
> >> +     { ARRAY_SIZE(master_list_4), master_list_4 },
> >> +     { ARRAY_SIZE(master_list_8), master_list_8 },
> >> +     { ARRAY_SIZE(master_list_16), master_list_16 },
> >> +     { ARRAY_SIZE(master_list_32), master_list_32 },
> >> +     { ARRAY_SIZE(master_list_64), master_list_64 },
> >> +     { ARRAY_SIZE(master_list_128), master_list_128 },
> >> +     { ARRAY_SIZE(master_list_256), master_list_256 },
> >> +};
> >> +
> >> +/* Pre-compute the Freelist master lists at boot */
> >> +static void __init freelist_random_init(void)
> >> +{
> >> +     unsigned int seed;
> >> +     size_t z, i, rand;
> >> +     struct rnd_state slab_rand;
> >> +
> >> +     get_random_bytes_arch(&seed, sizeof(seed));
> >> +     prandom_seed_state(&slab_rand, seed);
> >> +
> >> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
> >> +             for (i = 0; i < master_lists[z].count; i++)
> >> +                     master_lists[z].list[i] = i;
> >> +
> >> +             /* Fisher-Yates shuffle */
> >> +             for (i = master_lists[z].count - 1; i > 0; i--) {
> >> +                     rand = prandom_u32_state(&slab_rand);
> >> +                     rand %= (i + 1);
> >> +                     swap(master_lists[z].list[i],
> >> +                             master_lists[z].list[rand]);
> >> +             }
> >> +     }
> >> +}
> >> +#else
> >> +static inline void __init freelist_random_init(void) { }
> >> +#endif /* CONFIG_FREELIST_RANDOM */
> >> +
> >> +
> >>  /*
> >>   * Initialisation.  Called after the page allocator have been initialised and
> >>   * before smp_init().
> >> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
> >>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
> >>               slab_max_order = SLAB_MAX_ORDER_HI;
> >>
> >> +     freelist_random_init();
> >> +
> >>       /* Bootstrap is tricky, because several objects are allocated
> >>        * from caches that do not exist yet:
> >>        * 1) initialize the kmem_cache cache: it contains the struct
> >> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
> >>  #endif
> >>  }
> >>
> >> +#ifdef CONFIG_FREELIST_RANDOM
> >> +/* Identify if the target freelist matches the pre-computed list */
> >> +enum master_type {
> >> +     match,
> >> +     less,
> >> +     more
> >> +};
> >> +
> >> +/* Hold information during a freelist initialization */
> >> +struct freelist_init_state {
> >> +     unsigned int padding;
> >> +     unsigned int pos;
> >> +     unsigned int count;
> >> +     struct m_list master_list;
> >> +     unsigned int master_count;
> >> +     enum master_type type;
> >> +};
> >> +
> >> +/* Select the right pre-computed master list and initialize state */
> >> +static void freelist_state_initialize(struct freelist_init_state *state,
> >> +                                   unsigned int count)
> >> +{
> >> +     unsigned int idx;
> >> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
> >> +
> >> +     memset(state, 0, sizeof(*state));
> >> +     state->count = count;
> >> +     state->pos = 0;
> >
> > Using pos = 0 here looks not good in terms of security. In this case,
> > every new page having same size class have same sequence of freelist since boot.
> >
> > How about using random value to set pos? It provides some more randomness
> > with minimal overhead.
> >
> 
> I think it is a good idea. I will add that for the next iteration.
> 
> >> +     /* count is always >= 2 */
> >> +     idx = ilog2(count) - 1;
> >> +     if (idx >= last_idx)
> >> +             idx = last_idx;
> >> +     else if (roundup_pow_of_two(idx + 1) != count)
> >> +             idx++;
> >> +     state->master_list = master_lists[idx];
> >> +     if (state->master_list.count == state->count)
> >> +             state->type = match;
> >> +     else if (state->master_list.count > state->count)
> >> +             state->type = more;
> >> +     else
> >> +             state->type = less;
> >> +}
> >> +
> >> +/* Get the next entry on the master list depending on the target list size */
> >> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> >> +{
> >> +     if (state->type == less && state->pos == state->master_list.count) {
> >> +             state->padding += state->pos;
> >> +             state->pos = 0;
> >> +     }
> >> +     BUG_ON(state->pos >= state->master_list.count);
> >> +     return state->master_list.list[state->pos++];
> >> +}
> >> +
> >> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> >> +{
> >> +     freelist_idx_t cur, entry;
> >> +
> >> +     entry = get_next_entry(state);
> >> +
> >> +     if (state->type != match) {
> >> +             while ((entry + state->padding) >= state->count)
> >> +                     entry = get_next_entry(state);
> >> +             cur = entry + state->padding;
> >> +             BUG_ON(cur >= state->count);
> >> +     } else {
> >> +             cur = entry;
> >> +     }
> >> +
> >> +     return cur;
> >> +}
> >> +
> >> +/* Shuffle the freelist initialization state based on pre-computed lists */
> >> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
> >> +                          unsigned int count)
> >> +{
> >> +     unsigned int i;
> >> +     struct freelist_init_state state;
> >> +
> >> +     if (count < 2) {
> >> +             for (i = 0; i < count; i++)
> >> +                     set_free_obj(page, i, i);
> >> +             return;
> >> +     }
> >> +
> >> +     /* Last chunk is used already in this case */
> >> +     if (OBJFREELIST_SLAB(cachep))
> >> +             count--;
> >> +
> >> +     freelist_state_initialize(&state, count);
> >> +     for (i = 0; i < count; i++)
> >> +             set_free_obj(page, i, next_random_slot(&state));
> >> +
> >> +     if (OBJFREELIST_SLAB(cachep))
> >> +             set_free_obj(page, i, i);
> >
> > Please consider last object of OBJFREELIST_SLAB cache, too.
> >
> > freelist_state_init()
> > last_obj = next_randome_slot()
> > page->freelist = XXX
> > for (i = 0; i < count - 1; i++)
> >         set_free_obj()
> > set_free_obj(last_obj);
> >
> > Thanks.
> >
> 
> The current implementation take the last chunk by default before the
> freelist is initialized. Do you want it to be randomized as well?

Yes.

Thanks.

> 
> >> +}
> >> +#else
> >> +static inline void shuffle_freelist(struct kmem_cache *cachep,
> >> +                                 struct page *page, unsigned int count) { }
> >> +#endif /* CONFIG_FREELIST_RANDOM */
> >> +
> >>  static void cache_init_objs(struct kmem_cache *cachep,
> >>                           struct page *page)
> >>  {
> >> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
> >>                       kasan_poison_object_data(cachep, objp);
> >>               }
> >>
> >> -             set_free_obj(page, i, i);
> >> +             /* If enabled, initialization is done in shuffle_freelist */
> >> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
> >> +                     set_free_obj(page, i, i);
> >>       }
> >> +
> >> +     shuffle_freelist(cachep, page, cachep->num);
> >>  }
> >>
> >>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
> >> --
> >> 2.8.0.rc3.226.g39d4020
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-20  8:08       ` Joonsoo Kim
  (?)
@ 2016-04-20 14:47         ` Thomas Garnier
  -1 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-20 14:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Wed, Apr 20, 2016 at 1:08 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Tue, Apr 19, 2016 at 09:44:54AM -0700, Thomas Garnier wrote:
>> On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
>> >> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> >> SLAB freelist. The list is randomized during initialization of a new set
>> >> of pages. The order on different freelist sizes is pre-computed at boot
>> >> for performance. This security feature reduces the predictability of the
>> >> kernel SLAB allocator against heap overflows rendering attacks much less
>> >> stable.
>> >
>> > I'm not familiar on security but it doesn't look much secure than
>> > before. Is there any other way to generate different sequence of freelist
>> > for each new set of pages? Current approach using pre-computed array will
>> > generate same sequence of freelist for all new set of pages having same size
>> > class. Is it sufficient?
>> >
>>
>> I think it is sufficient. There is a tradeoff for performance. We could randomly
>> pick an object from the freelist every time (on slab_get_obj) but I
>> think it will
>> have significant impact (at least 3%).
>>
>> >> For example this attack against SLUB (also applicable against SLAB)
>> >> would be affected:
>> >> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>> >>
>> >> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> >> a controllable heap is opened to new attacks not yet publicly discussed.
>> >> A kernel heap overflow can be transformed to multiple use-after-free.
>> >> This feature makes this type of attack harder too.
>> >>
>> >> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> >> entropy is available at that boot stage. In the worse case this function
>> >> will fallback to the get_random_bytes sub API.
>> >>
>> >> The config option name is not specific to the SLAB as this approach will
>> >> be extended to other allocators like SLUB.
>> >
>> > If this feature will be applied to the SLUB, it's better to put common
>> > code to mm/slab_common.c.
>> >
>>
>> I think it might be moved there once we implement the SLUB counterpart
>> but it is too early to define which part will be common.
>>
>> >>
>> >> Performance results highlighted no major changes:
>> >>
>> >> Netperf average on 10 runs:
>> >>
>> >> threads,base,change
>> >> 16,576943.10,585905.90 (101.55%)
>> >> 32,564082.00,569741.20 (101.00%)
>> >> 48,558334.30,561851.20 (100.63%)
>> >> 64,552025.20,556448.30 (100.80%)
>> >> 80,552294.40,551743.10 (99.90%)
>> >> 96,552435.30,547529.20 (99.11%)
>> >> 112,551320.60,550183.20 (99.79%)
>> >> 128,549138.30,550542.70 (100.26%)
>> >> 144,549344.50,544529.10 (99.12%)
>> >> 160,550360.80,539929.30 (98.10%)
>> >>
>> >> slab_test 1 run on boot. After is faster except for odd result on size
>> >> 2048.
>> >
>> > Hmm... It's odd result. It adds more logic and it should
>> > decrease performance. I guess it would be experimental error but
>> > do you have any analysis about this result?
>> >
>>
>> I don't. I am glad to redo the test. I found that slab_test has very different
>> result based on the heap state at the time of the test. If I run the
>> test multiple
>> times, I have really various results on with or without the mitigation (on
>> dedicated hardware).
>>
>> >>
>> >> Before:
>> >>
>> >> Single thread testing
>> >> =====================
>> >> 1. Kmalloc: Repeatedly allocate then free test
>> >> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
>> >> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
>> >> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
>> >> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
>> >> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
>> >> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
>> >> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
>> >> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
>> >> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
>> >> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
>> >> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
>> >> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
>> >> 2. Kmalloc: alloc/free test
>> >> 10000 times kmalloc(8)/kfree -> 118 cycles
>> >> 10000 times kmalloc(16)/kfree -> 118 cycles
>> >> 10000 times kmalloc(32)/kfree -> 118 cycles
>> >> 10000 times kmalloc(64)/kfree -> 121 cycles
>> >> 10000 times kmalloc(128)/kfree -> 118 cycles
>> >> 10000 times kmalloc(256)/kfree -> 115 cycles
>> >> 10000 times kmalloc(512)/kfree -> 115 cycles
>> >> 10000 times kmalloc(1024)/kfree -> 115 cycles
>> >> 10000 times kmalloc(2048)/kfree -> 115 cycles
>> >> 10000 times kmalloc(4096)/kfree -> 115 cycles
>> >> 10000 times kmalloc(8192)/kfree -> 115 cycles
>> >> 10000 times kmalloc(16384)/kfree -> 115 cycles
>> >>
>> >> After:
>> >>
>> >> Single thread testing
>> >> =====================
>> >> 1. Kmalloc: Repeatedly allocate then free test
>> >> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
>> >> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
>> >> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
>> >> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
>> >> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
>> >> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
>> >> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
>> >> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
>> >> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
>> >> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
>> >> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
>> >> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
>> >> 2. Kmalloc: alloc/free test
>> >> 10000 times kmalloc(8)/kfree -> 115 cycles
>> >> 10000 times kmalloc(16)/kfree -> 115 cycles
>> >> 10000 times kmalloc(32)/kfree -> 115 cycles
>> >> 10000 times kmalloc(64)/kfree -> 120 cycles
>> >> 10000 times kmalloc(128)/kfree -> 127 cycles
>> >> 10000 times kmalloc(256)/kfree -> 119 cycles
>> >> 10000 times kmalloc(512)/kfree -> 112 cycles
>> >> 10000 times kmalloc(1024)/kfree -> 112 cycles
>> >> 10000 times kmalloc(2048)/kfree -> 112 cycles
>> >> 10000 times kmalloc(4096)/kfree -> 112 cycles
>> >> 10000 times kmalloc(8192)/kfree -> 112 cycles
>> >> 10000 times kmalloc(16384)/kfree -> 112 cycles
>> >>
>> >> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> >> ---
>> >> Based on next-20160418
>> >> ---
>> >>  init/Kconfig |   9 ++++
>> >>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> >>  2 files changed, 174 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/init/Kconfig b/init/Kconfig
>> >> index 0dfd09d..ee35418 100644
>> >> --- a/init/Kconfig
>> >> +++ b/init/Kconfig
>> >> @@ -1742,6 +1742,15 @@ config SLOB
>> >>
>> >>  endchoice
>> >>
>> >> +config FREELIST_RANDOM
>> >> +     default n
>> >> +     depends on SLAB
>> >> +     bool "SLAB freelist randomization"
>> >> +     help
>> >> +       Randomizes the freelist order used on creating new SLABs. This
>> >> +       security feature reduces the predictability of the kernel slab
>> >> +       allocator against heap overflows.
>> >> +
>> >>  config SLUB_CPU_PARTIAL
>> >>       default y
>> >>       depends on SLUB && SMP
>> >> diff --git a/mm/slab.c b/mm/slab.c
>> >> index b70aabf..8371d80 100644
>> >> --- a/mm/slab.c
>> >> +++ b/mm/slab.c
>> >> @@ -116,6 +116,7 @@
>> >>  #include     <linux/kmemcheck.h>
>> >>  #include     <linux/memory.h>
>> >>  #include     <linux/prefetch.h>
>> >> +#include     <linux/log2.h>
>> >>
>> >>  #include     <net/sock.h>
>> >>
>> >> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>> >>       }
>> >>  }
>> >>
>> >> +#ifdef CONFIG_FREELIST_RANDOM
>> >> +/*
>> >> + * Master lists are pre-computed random lists
>> >> + * Lists of different sizes are used to optimize performance on SLABS with
>> >> + * different object counts.
>> >> + */
>> >
>> > If it is for optimization, it would be one option to have separate
>> > random list for each kmem_cache. It would consume more memory but it
>> > would be marginal. And, it provides more un-predictability and it can
>> > give better performance because we don't need state->type (more, less)
>> > and special handling related for it.
>> >
>>
>> I am not sur because major caches are created early at boot time. We still have
>> the same entropy problem and we are wasting a bit more memory. It will be faster
>
> I think that entropy problem is another issue. It should be considered
> separately. If it is solved, making per-computed array for each
> kmem_cache will provide more un-predictability. If someone who succeed to
> exploit some kmem_cache with 128 object per slab want to exploit
> another kmem_cache with 128 object per slab, this separate pre-computed array
> will be helpful.
>
>> on usage though but not sure it will be significant.
>
> I also think it's not significant. But, besides performance effect,
> code doesn't look very attractive and extendable. In case of SLUB,
> there is setup_slub_max_order option and object per slab could be larger
> than 256. To deal with it, we need to add many more static definition
> and it looks not good to me. Please use dynamic allocated memory
> instead of static array definition.
>

You don't need to. We wrap the list used (if you look at get_next_entry
we reset at pos 0 when we arrive to the list size).

I do think that the design will be better with a dedicated list per cache. Given
you seem fine with the memory differences, performance can only get better...

I will refactor for that on the next iteration.

>>
>> >> +static freelist_idx_t master_list_2[2];
>> >> +static freelist_idx_t master_list_4[4];
>> >> +static freelist_idx_t master_list_8[8];
>> >> +static freelist_idx_t master_list_16[16];
>> >> +static freelist_idx_t master_list_32[32];
>> >> +static freelist_idx_t master_list_64[64];
>> >> +static freelist_idx_t master_list_128[128];
>> >> +static freelist_idx_t master_list_256[256];
>> >> +const static struct m_list {
>> >> +     size_t count;
>> >> +     freelist_idx_t *list;
>> >> +} master_lists[] = {
>> >> +     { ARRAY_SIZE(master_list_2), master_list_2 },
>> >> +     { ARRAY_SIZE(master_list_4), master_list_4 },
>> >> +     { ARRAY_SIZE(master_list_8), master_list_8 },
>> >> +     { ARRAY_SIZE(master_list_16), master_list_16 },
>> >> +     { ARRAY_SIZE(master_list_32), master_list_32 },
>> >> +     { ARRAY_SIZE(master_list_64), master_list_64 },
>> >> +     { ARRAY_SIZE(master_list_128), master_list_128 },
>> >> +     { ARRAY_SIZE(master_list_256), master_list_256 },
>> >> +};
>> >> +
>> >> +/* Pre-compute the Freelist master lists at boot */
>> >> +static void __init freelist_random_init(void)
>> >> +{
>> >> +     unsigned int seed;
>> >> +     size_t z, i, rand;
>> >> +     struct rnd_state slab_rand;
>> >> +
>> >> +     get_random_bytes_arch(&seed, sizeof(seed));
>> >> +     prandom_seed_state(&slab_rand, seed);
>> >> +
>> >> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
>> >> +             for (i = 0; i < master_lists[z].count; i++)
>> >> +                     master_lists[z].list[i] = i;
>> >> +
>> >> +             /* Fisher-Yates shuffle */
>> >> +             for (i = master_lists[z].count - 1; i > 0; i--) {
>> >> +                     rand = prandom_u32_state(&slab_rand);
>> >> +                     rand %= (i + 1);
>> >> +                     swap(master_lists[z].list[i],
>> >> +                             master_lists[z].list[rand]);
>> >> +             }
>> >> +     }
>> >> +}
>> >> +#else
>> >> +static inline void __init freelist_random_init(void) { }
>> >> +#endif /* CONFIG_FREELIST_RANDOM */
>> >> +
>> >> +
>> >>  /*
>> >>   * Initialisation.  Called after the page allocator have been initialised and
>> >>   * before smp_init().
>> >> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>> >>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>> >>               slab_max_order = SLAB_MAX_ORDER_HI;
>> >>
>> >> +     freelist_random_init();
>> >> +
>> >>       /* Bootstrap is tricky, because several objects are allocated
>> >>        * from caches that do not exist yet:
>> >>        * 1) initialize the kmem_cache cache: it contains the struct
>> >> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>> >>  #endif
>> >>  }
>> >>
>> >> +#ifdef CONFIG_FREELIST_RANDOM
>> >> +/* Identify if the target freelist matches the pre-computed list */
>> >> +enum master_type {
>> >> +     match,
>> >> +     less,
>> >> +     more
>> >> +};
>> >> +
>> >> +/* Hold information during a freelist initialization */
>> >> +struct freelist_init_state {
>> >> +     unsigned int padding;
>> >> +     unsigned int pos;
>> >> +     unsigned int count;
>> >> +     struct m_list master_list;
>> >> +     unsigned int master_count;
>> >> +     enum master_type type;
>> >> +};
>> >> +
>> >> +/* Select the right pre-computed master list and initialize state */
>> >> +static void freelist_state_initialize(struct freelist_init_state *state,
>> >> +                                   unsigned int count)
>> >> +{
>> >> +     unsigned int idx;
>> >> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
>> >> +
>> >> +     memset(state, 0, sizeof(*state));
>> >> +     state->count = count;
>> >> +     state->pos = 0;
>> >
>> > Using pos = 0 here looks not good in terms of security. In this case,
>> > every new page having same size class have same sequence of freelist since boot.
>> >
>> > How about using random value to set pos? It provides some more randomness
>> > with minimal overhead.
>> >
>>
>> I think it is a good idea. I will add that for the next iteration.
>>
>> >> +     /* count is always >= 2 */
>> >> +     idx = ilog2(count) - 1;
>> >> +     if (idx >= last_idx)
>> >> +             idx = last_idx;
>> >> +     else if (roundup_pow_of_two(idx + 1) != count)
>> >> +             idx++;
>> >> +     state->master_list = master_lists[idx];
>> >> +     if (state->master_list.count == state->count)
>> >> +             state->type = match;
>> >> +     else if (state->master_list.count > state->count)
>> >> +             state->type = more;
>> >> +     else
>> >> +             state->type = less;
>> >> +}
>> >> +
>> >> +/* Get the next entry on the master list depending on the target list size */
>> >> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> >> +{
>> >> +     if (state->type == less && state->pos == state->master_list.count) {
>> >> +             state->padding += state->pos;
>> >> +             state->pos = 0;
>> >> +     }
>> >> +     BUG_ON(state->pos >= state->master_list.count);
>> >> +     return state->master_list.list[state->pos++];
>> >> +}
>> >> +
>> >> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> >> +{
>> >> +     freelist_idx_t cur, entry;
>> >> +
>> >> +     entry = get_next_entry(state);
>> >> +
>> >> +     if (state->type != match) {
>> >> +             while ((entry + state->padding) >= state->count)
>> >> +                     entry = get_next_entry(state);
>> >> +             cur = entry + state->padding;
>> >> +             BUG_ON(cur >= state->count);
>> >> +     } else {
>> >> +             cur = entry;
>> >> +     }
>> >> +
>> >> +     return cur;
>> >> +}
>> >> +
>> >> +/* Shuffle the freelist initialization state based on pre-computed lists */
>> >> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
>> >> +                          unsigned int count)
>> >> +{
>> >> +     unsigned int i;
>> >> +     struct freelist_init_state state;
>> >> +
>> >> +     if (count < 2) {
>> >> +             for (i = 0; i < count; i++)
>> >> +                     set_free_obj(page, i, i);
>> >> +             return;
>> >> +     }
>> >> +
>> >> +     /* Last chunk is used already in this case */
>> >> +     if (OBJFREELIST_SLAB(cachep))
>> >> +             count--;
>> >> +
>> >> +     freelist_state_initialize(&state, count);
>> >> +     for (i = 0; i < count; i++)
>> >> +             set_free_obj(page, i, next_random_slot(&state));
>> >> +
>> >> +     if (OBJFREELIST_SLAB(cachep))
>> >> +             set_free_obj(page, i, i);
>> >
>> > Please consider last object of OBJFREELIST_SLAB cache, too.
>> >
>> > freelist_state_init()
>> > last_obj = next_randome_slot()
>> > page->freelist = XXX
>> > for (i = 0; i < count - 1; i++)
>> >         set_free_obj()
>> > set_free_obj(last_obj);
>> >
>> > Thanks.
>> >
>>
>> The current implementation take the last chunk by default before the
>> freelist is initialized. Do you want it to be randomized as well?
>
> Yes.
>
> Thanks.
>
>>
>> >> +}
>> >> +#else
>> >> +static inline void shuffle_freelist(struct kmem_cache *cachep,
>> >> +                                 struct page *page, unsigned int count) { }
>> >> +#endif /* CONFIG_FREELIST_RANDOM */
>> >> +
>> >>  static void cache_init_objs(struct kmem_cache *cachep,
>> >>                           struct page *page)
>> >>  {
>> >> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>> >>                       kasan_poison_object_data(cachep, objp);
>> >>               }
>> >>
>> >> -             set_free_obj(page, i, i);
>> >> +             /* If enabled, initialization is done in shuffle_freelist */
>> >> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
>> >> +                     set_free_obj(page, i, i);
>> >>       }
>> >> +
>> >> +     shuffle_freelist(cachep, page, cachep->num);
>> >>  }
>> >>
>> >>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
>> >> --
>> >> 2.8.0.rc3.226.g39d4020
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-20 14:47         ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-20 14:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Wed, Apr 20, 2016 at 1:08 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Tue, Apr 19, 2016 at 09:44:54AM -0700, Thomas Garnier wrote:
>> On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
>> >> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> >> SLAB freelist. The list is randomized during initialization of a new set
>> >> of pages. The order on different freelist sizes is pre-computed at boot
>> >> for performance. This security feature reduces the predictability of the
>> >> kernel SLAB allocator against heap overflows rendering attacks much less
>> >> stable.
>> >
>> > I'm not familiar on security but it doesn't look much secure than
>> > before. Is there any other way to generate different sequence of freelist
>> > for each new set of pages? Current approach using pre-computed array will
>> > generate same sequence of freelist for all new set of pages having same size
>> > class. Is it sufficient?
>> >
>>
>> I think it is sufficient. There is a tradeoff for performance. We could randomly
>> pick an object from the freelist every time (on slab_get_obj) but I
>> think it will
>> have significant impact (at least 3%).
>>
>> >> For example this attack against SLUB (also applicable against SLAB)
>> >> would be affected:
>> >> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>> >>
>> >> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> >> a controllable heap is opened to new attacks not yet publicly discussed.
>> >> A kernel heap overflow can be transformed to multiple use-after-free.
>> >> This feature makes this type of attack harder too.
>> >>
>> >> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> >> entropy is available at that boot stage. In the worse case this function
>> >> will fallback to the get_random_bytes sub API.
>> >>
>> >> The config option name is not specific to the SLAB as this approach will
>> >> be extended to other allocators like SLUB.
>> >
>> > If this feature will be applied to the SLUB, it's better to put common
>> > code to mm/slab_common.c.
>> >
>>
>> I think it might be moved there once we implement the SLUB counterpart
>> but it is too early to define which part will be common.
>>
>> >>
>> >> Performance results highlighted no major changes:
>> >>
>> >> Netperf average on 10 runs:
>> >>
>> >> threads,base,change
>> >> 16,576943.10,585905.90 (101.55%)
>> >> 32,564082.00,569741.20 (101.00%)
>> >> 48,558334.30,561851.20 (100.63%)
>> >> 64,552025.20,556448.30 (100.80%)
>> >> 80,552294.40,551743.10 (99.90%)
>> >> 96,552435.30,547529.20 (99.11%)
>> >> 112,551320.60,550183.20 (99.79%)
>> >> 128,549138.30,550542.70 (100.26%)
>> >> 144,549344.50,544529.10 (99.12%)
>> >> 160,550360.80,539929.30 (98.10%)
>> >>
>> >> slab_test 1 run on boot. After is faster except for odd result on size
>> >> 2048.
>> >
>> > Hmm... It's odd result. It adds more logic and it should
>> > decrease performance. I guess it would be experimental error but
>> > do you have any analysis about this result?
>> >
>>
>> I don't. I am glad to redo the test. I found that slab_test has very different
>> result based on the heap state at the time of the test. If I run the
>> test multiple
>> times, I have really various results on with or without the mitigation (on
>> dedicated hardware).
>>
>> >>
>> >> Before:
>> >>
>> >> Single thread testing
>> >> =====================
>> >> 1. Kmalloc: Repeatedly allocate then free test
>> >> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
>> >> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
>> >> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
>> >> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
>> >> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
>> >> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
>> >> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
>> >> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
>> >> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
>> >> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
>> >> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
>> >> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
>> >> 2. Kmalloc: alloc/free test
>> >> 10000 times kmalloc(8)/kfree -> 118 cycles
>> >> 10000 times kmalloc(16)/kfree -> 118 cycles
>> >> 10000 times kmalloc(32)/kfree -> 118 cycles
>> >> 10000 times kmalloc(64)/kfree -> 121 cycles
>> >> 10000 times kmalloc(128)/kfree -> 118 cycles
>> >> 10000 times kmalloc(256)/kfree -> 115 cycles
>> >> 10000 times kmalloc(512)/kfree -> 115 cycles
>> >> 10000 times kmalloc(1024)/kfree -> 115 cycles
>> >> 10000 times kmalloc(2048)/kfree -> 115 cycles
>> >> 10000 times kmalloc(4096)/kfree -> 115 cycles
>> >> 10000 times kmalloc(8192)/kfree -> 115 cycles
>> >> 10000 times kmalloc(16384)/kfree -> 115 cycles
>> >>
>> >> After:
>> >>
>> >> Single thread testing
>> >> =====================
>> >> 1. Kmalloc: Repeatedly allocate then free test
>> >> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
>> >> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
>> >> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
>> >> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
>> >> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
>> >> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
>> >> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
>> >> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
>> >> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
>> >> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
>> >> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
>> >> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
>> >> 2. Kmalloc: alloc/free test
>> >> 10000 times kmalloc(8)/kfree -> 115 cycles
>> >> 10000 times kmalloc(16)/kfree -> 115 cycles
>> >> 10000 times kmalloc(32)/kfree -> 115 cycles
>> >> 10000 times kmalloc(64)/kfree -> 120 cycles
>> >> 10000 times kmalloc(128)/kfree -> 127 cycles
>> >> 10000 times kmalloc(256)/kfree -> 119 cycles
>> >> 10000 times kmalloc(512)/kfree -> 112 cycles
>> >> 10000 times kmalloc(1024)/kfree -> 112 cycles
>> >> 10000 times kmalloc(2048)/kfree -> 112 cycles
>> >> 10000 times kmalloc(4096)/kfree -> 112 cycles
>> >> 10000 times kmalloc(8192)/kfree -> 112 cycles
>> >> 10000 times kmalloc(16384)/kfree -> 112 cycles
>> >>
>> >> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> >> ---
>> >> Based on next-20160418
>> >> ---
>> >>  init/Kconfig |   9 ++++
>> >>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> >>  2 files changed, 174 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/init/Kconfig b/init/Kconfig
>> >> index 0dfd09d..ee35418 100644
>> >> --- a/init/Kconfig
>> >> +++ b/init/Kconfig
>> >> @@ -1742,6 +1742,15 @@ config SLOB
>> >>
>> >>  endchoice
>> >>
>> >> +config FREELIST_RANDOM
>> >> +     default n
>> >> +     depends on SLAB
>> >> +     bool "SLAB freelist randomization"
>> >> +     help
>> >> +       Randomizes the freelist order used on creating new SLABs. This
>> >> +       security feature reduces the predictability of the kernel slab
>> >> +       allocator against heap overflows.
>> >> +
>> >>  config SLUB_CPU_PARTIAL
>> >>       default y
>> >>       depends on SLUB && SMP
>> >> diff --git a/mm/slab.c b/mm/slab.c
>> >> index b70aabf..8371d80 100644
>> >> --- a/mm/slab.c
>> >> +++ b/mm/slab.c
>> >> @@ -116,6 +116,7 @@
>> >>  #include     <linux/kmemcheck.h>
>> >>  #include     <linux/memory.h>
>> >>  #include     <linux/prefetch.h>
>> >> +#include     <linux/log2.h>
>> >>
>> >>  #include     <net/sock.h>
>> >>
>> >> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>> >>       }
>> >>  }
>> >>
>> >> +#ifdef CONFIG_FREELIST_RANDOM
>> >> +/*
>> >> + * Master lists are pre-computed random lists
>> >> + * Lists of different sizes are used to optimize performance on SLABS with
>> >> + * different object counts.
>> >> + */
>> >
>> > If it is for optimization, it would be one option to have separate
>> > random list for each kmem_cache. It would consume more memory but it
>> > would be marginal. And, it provides more un-predictability and it can
>> > give better performance because we don't need state->type (more, less)
>> > and special handling related for it.
>> >
>>
>> I am not sur because major caches are created early at boot time. We still have
>> the same entropy problem and we are wasting a bit more memory. It will be faster
>
> I think that entropy problem is another issue. It should be considered
> separately. If it is solved, making per-computed array for each
> kmem_cache will provide more un-predictability. If someone who succeed to
> exploit some kmem_cache with 128 object per slab want to exploit
> another kmem_cache with 128 object per slab, this separate pre-computed array
> will be helpful.
>
>> on usage though but not sure it will be significant.
>
> I also think it's not significant. But, besides performance effect,
> code doesn't look very attractive and extendable. In case of SLUB,
> there is setup_slub_max_order option and object per slab could be larger
> than 256. To deal with it, we need to add many more static definition
> and it looks not good to me. Please use dynamic allocated memory
> instead of static array definition.
>

You don't need to. We wrap the list used (if you look at get_next_entry
we reset at pos 0 when we arrive to the list size).

I do think that the design will be better with a dedicated list per cache. Given
you seem fine with the memory differences, performance can only get better...

I will refactor for that on the next iteration.

>>
>> >> +static freelist_idx_t master_list_2[2];
>> >> +static freelist_idx_t master_list_4[4];
>> >> +static freelist_idx_t master_list_8[8];
>> >> +static freelist_idx_t master_list_16[16];
>> >> +static freelist_idx_t master_list_32[32];
>> >> +static freelist_idx_t master_list_64[64];
>> >> +static freelist_idx_t master_list_128[128];
>> >> +static freelist_idx_t master_list_256[256];
>> >> +const static struct m_list {
>> >> +     size_t count;
>> >> +     freelist_idx_t *list;
>> >> +} master_lists[] = {
>> >> +     { ARRAY_SIZE(master_list_2), master_list_2 },
>> >> +     { ARRAY_SIZE(master_list_4), master_list_4 },
>> >> +     { ARRAY_SIZE(master_list_8), master_list_8 },
>> >> +     { ARRAY_SIZE(master_list_16), master_list_16 },
>> >> +     { ARRAY_SIZE(master_list_32), master_list_32 },
>> >> +     { ARRAY_SIZE(master_list_64), master_list_64 },
>> >> +     { ARRAY_SIZE(master_list_128), master_list_128 },
>> >> +     { ARRAY_SIZE(master_list_256), master_list_256 },
>> >> +};
>> >> +
>> >> +/* Pre-compute the Freelist master lists at boot */
>> >> +static void __init freelist_random_init(void)
>> >> +{
>> >> +     unsigned int seed;
>> >> +     size_t z, i, rand;
>> >> +     struct rnd_state slab_rand;
>> >> +
>> >> +     get_random_bytes_arch(&seed, sizeof(seed));
>> >> +     prandom_seed_state(&slab_rand, seed);
>> >> +
>> >> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
>> >> +             for (i = 0; i < master_lists[z].count; i++)
>> >> +                     master_lists[z].list[i] = i;
>> >> +
>> >> +             /* Fisher-Yates shuffle */
>> >> +             for (i = master_lists[z].count - 1; i > 0; i--) {
>> >> +                     rand = prandom_u32_state(&slab_rand);
>> >> +                     rand %= (i + 1);
>> >> +                     swap(master_lists[z].list[i],
>> >> +                             master_lists[z].list[rand]);
>> >> +             }
>> >> +     }
>> >> +}
>> >> +#else
>> >> +static inline void __init freelist_random_init(void) { }
>> >> +#endif /* CONFIG_FREELIST_RANDOM */
>> >> +
>> >> +
>> >>  /*
>> >>   * Initialisation.  Called after the page allocator have been initialised and
>> >>   * before smp_init().
>> >> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>> >>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>> >>               slab_max_order = SLAB_MAX_ORDER_HI;
>> >>
>> >> +     freelist_random_init();
>> >> +
>> >>       /* Bootstrap is tricky, because several objects are allocated
>> >>        * from caches that do not exist yet:
>> >>        * 1) initialize the kmem_cache cache: it contains the struct
>> >> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>> >>  #endif
>> >>  }
>> >>
>> >> +#ifdef CONFIG_FREELIST_RANDOM
>> >> +/* Identify if the target freelist matches the pre-computed list */
>> >> +enum master_type {
>> >> +     match,
>> >> +     less,
>> >> +     more
>> >> +};
>> >> +
>> >> +/* Hold information during a freelist initialization */
>> >> +struct freelist_init_state {
>> >> +     unsigned int padding;
>> >> +     unsigned int pos;
>> >> +     unsigned int count;
>> >> +     struct m_list master_list;
>> >> +     unsigned int master_count;
>> >> +     enum master_type type;
>> >> +};
>> >> +
>> >> +/* Select the right pre-computed master list and initialize state */
>> >> +static void freelist_state_initialize(struct freelist_init_state *state,
>> >> +                                   unsigned int count)
>> >> +{
>> >> +     unsigned int idx;
>> >> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
>> >> +
>> >> +     memset(state, 0, sizeof(*state));
>> >> +     state->count = count;
>> >> +     state->pos = 0;
>> >
>> > Using pos = 0 here looks not good in terms of security. In this case,
>> > every new page having same size class have same sequence of freelist since boot.
>> >
>> > How about using random value to set pos? It provides some more randomness
>> > with minimal overhead.
>> >
>>
>> I think it is a good idea. I will add that for the next iteration.
>>
>> >> +     /* count is always >= 2 */
>> >> +     idx = ilog2(count) - 1;
>> >> +     if (idx >= last_idx)
>> >> +             idx = last_idx;
>> >> +     else if (roundup_pow_of_two(idx + 1) != count)
>> >> +             idx++;
>> >> +     state->master_list = master_lists[idx];
>> >> +     if (state->master_list.count == state->count)
>> >> +             state->type = match;
>> >> +     else if (state->master_list.count > state->count)
>> >> +             state->type = more;
>> >> +     else
>> >> +             state->type = less;
>> >> +}
>> >> +
>> >> +/* Get the next entry on the master list depending on the target list size */
>> >> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> >> +{
>> >> +     if (state->type == less && state->pos == state->master_list.count) {
>> >> +             state->padding += state->pos;
>> >> +             state->pos = 0;
>> >> +     }
>> >> +     BUG_ON(state->pos >= state->master_list.count);
>> >> +     return state->master_list.list[state->pos++];
>> >> +}
>> >> +
>> >> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> >> +{
>> >> +     freelist_idx_t cur, entry;
>> >> +
>> >> +     entry = get_next_entry(state);
>> >> +
>> >> +     if (state->type != match) {
>> >> +             while ((entry + state->padding) >= state->count)
>> >> +                     entry = get_next_entry(state);
>> >> +             cur = entry + state->padding;
>> >> +             BUG_ON(cur >= state->count);
>> >> +     } else {
>> >> +             cur = entry;
>> >> +     }
>> >> +
>> >> +     return cur;
>> >> +}
>> >> +
>> >> +/* Shuffle the freelist initialization state based on pre-computed lists */
>> >> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
>> >> +                          unsigned int count)
>> >> +{
>> >> +     unsigned int i;
>> >> +     struct freelist_init_state state;
>> >> +
>> >> +     if (count < 2) {
>> >> +             for (i = 0; i < count; i++)
>> >> +                     set_free_obj(page, i, i);
>> >> +             return;
>> >> +     }
>> >> +
>> >> +     /* Last chunk is used already in this case */
>> >> +     if (OBJFREELIST_SLAB(cachep))
>> >> +             count--;
>> >> +
>> >> +     freelist_state_initialize(&state, count);
>> >> +     for (i = 0; i < count; i++)
>> >> +             set_free_obj(page, i, next_random_slot(&state));
>> >> +
>> >> +     if (OBJFREELIST_SLAB(cachep))
>> >> +             set_free_obj(page, i, i);
>> >
>> > Please consider last object of OBJFREELIST_SLAB cache, too.
>> >
>> > freelist_state_init()
>> > last_obj = next_randome_slot()
>> > page->freelist = XXX
>> > for (i = 0; i < count - 1; i++)
>> >         set_free_obj()
>> > set_free_obj(last_obj);
>> >
>> > Thanks.
>> >
>>
>> The current implementation take the last chunk by default before the
>> freelist is initialized. Do you want it to be randomized as well?
>
> Yes.
>
> Thanks.
>
>>
>> >> +}
>> >> +#else
>> >> +static inline void shuffle_freelist(struct kmem_cache *cachep,
>> >> +                                 struct page *page, unsigned int count) { }
>> >> +#endif /* CONFIG_FREELIST_RANDOM */
>> >> +
>> >>  static void cache_init_objs(struct kmem_cache *cachep,
>> >>                           struct page *page)
>> >>  {
>> >> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>> >>                       kasan_poison_object_data(cachep, objp);
>> >>               }
>> >>
>> >> -             set_free_obj(page, i, i);
>> >> +             /* If enabled, initialization is done in shuffle_freelist */
>> >> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
>> >> +                     set_free_obj(page, i, i);
>> >>       }
>> >> +
>> >> +     shuffle_freelist(cachep, page, cachep->num);
>> >>  }
>> >>
>> >>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
>> >> --
>> >> 2.8.0.rc3.226.g39d4020
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [kernel-hardening] Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-20 14:47         ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-20 14:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Wed, Apr 20, 2016 at 1:08 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Tue, Apr 19, 2016 at 09:44:54AM -0700, Thomas Garnier wrote:
>> On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
>> >> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> >> SLAB freelist. The list is randomized during initialization of a new set
>> >> of pages. The order on different freelist sizes is pre-computed at boot
>> >> for performance. This security feature reduces the predictability of the
>> >> kernel SLAB allocator against heap overflows rendering attacks much less
>> >> stable.
>> >
>> > I'm not familiar on security but it doesn't look much secure than
>> > before. Is there any other way to generate different sequence of freelist
>> > for each new set of pages? Current approach using pre-computed array will
>> > generate same sequence of freelist for all new set of pages having same size
>> > class. Is it sufficient?
>> >
>>
>> I think it is sufficient. There is a tradeoff for performance. We could randomly
>> pick an object from the freelist every time (on slab_get_obj) but I
>> think it will
>> have significant impact (at least 3%).
>>
>> >> For example this attack against SLUB (also applicable against SLAB)
>> >> would be affected:
>> >> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>> >>
>> >> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> >> a controllable heap is opened to new attacks not yet publicly discussed.
>> >> A kernel heap overflow can be transformed to multiple use-after-free.
>> >> This feature makes this type of attack harder too.
>> >>
>> >> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> >> entropy is available at that boot stage. In the worse case this function
>> >> will fallback to the get_random_bytes sub API.
>> >>
>> >> The config option name is not specific to the SLAB as this approach will
>> >> be extended to other allocators like SLUB.
>> >
>> > If this feature will be applied to the SLUB, it's better to put common
>> > code to mm/slab_common.c.
>> >
>>
>> I think it might be moved there once we implement the SLUB counterpart
>> but it is too early to define which part will be common.
>>
>> >>
>> >> Performance results highlighted no major changes:
>> >>
>> >> Netperf average on 10 runs:
>> >>
>> >> threads,base,change
>> >> 16,576943.10,585905.90 (101.55%)
>> >> 32,564082.00,569741.20 (101.00%)
>> >> 48,558334.30,561851.20 (100.63%)
>> >> 64,552025.20,556448.30 (100.80%)
>> >> 80,552294.40,551743.10 (99.90%)
>> >> 96,552435.30,547529.20 (99.11%)
>> >> 112,551320.60,550183.20 (99.79%)
>> >> 128,549138.30,550542.70 (100.26%)
>> >> 144,549344.50,544529.10 (99.12%)
>> >> 160,550360.80,539929.30 (98.10%)
>> >>
>> >> slab_test 1 run on boot. After is faster except for odd result on size
>> >> 2048.
>> >
>> > Hmm... It's odd result. It adds more logic and it should
>> > decrease performance. I guess it would be experimental error but
>> > do you have any analysis about this result?
>> >
>>
>> I don't. I am glad to redo the test. I found that slab_test has very different
>> result based on the heap state at the time of the test. If I run the
>> test multiple
>> times, I have really various results on with or without the mitigation (on
>> dedicated hardware).
>>
>> >>
>> >> Before:
>> >>
>> >> Single thread testing
>> >> =====================
>> >> 1. Kmalloc: Repeatedly allocate then free test
>> >> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
>> >> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
>> >> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
>> >> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
>> >> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
>> >> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
>> >> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
>> >> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
>> >> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
>> >> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
>> >> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
>> >> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
>> >> 2. Kmalloc: alloc/free test
>> >> 10000 times kmalloc(8)/kfree -> 118 cycles
>> >> 10000 times kmalloc(16)/kfree -> 118 cycles
>> >> 10000 times kmalloc(32)/kfree -> 118 cycles
>> >> 10000 times kmalloc(64)/kfree -> 121 cycles
>> >> 10000 times kmalloc(128)/kfree -> 118 cycles
>> >> 10000 times kmalloc(256)/kfree -> 115 cycles
>> >> 10000 times kmalloc(512)/kfree -> 115 cycles
>> >> 10000 times kmalloc(1024)/kfree -> 115 cycles
>> >> 10000 times kmalloc(2048)/kfree -> 115 cycles
>> >> 10000 times kmalloc(4096)/kfree -> 115 cycles
>> >> 10000 times kmalloc(8192)/kfree -> 115 cycles
>> >> 10000 times kmalloc(16384)/kfree -> 115 cycles
>> >>
>> >> After:
>> >>
>> >> Single thread testing
>> >> =====================
>> >> 1. Kmalloc: Repeatedly allocate then free test
>> >> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
>> >> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
>> >> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
>> >> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
>> >> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
>> >> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
>> >> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
>> >> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
>> >> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
>> >> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
>> >> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
>> >> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
>> >> 2. Kmalloc: alloc/free test
>> >> 10000 times kmalloc(8)/kfree -> 115 cycles
>> >> 10000 times kmalloc(16)/kfree -> 115 cycles
>> >> 10000 times kmalloc(32)/kfree -> 115 cycles
>> >> 10000 times kmalloc(64)/kfree -> 120 cycles
>> >> 10000 times kmalloc(128)/kfree -> 127 cycles
>> >> 10000 times kmalloc(256)/kfree -> 119 cycles
>> >> 10000 times kmalloc(512)/kfree -> 112 cycles
>> >> 10000 times kmalloc(1024)/kfree -> 112 cycles
>> >> 10000 times kmalloc(2048)/kfree -> 112 cycles
>> >> 10000 times kmalloc(4096)/kfree -> 112 cycles
>> >> 10000 times kmalloc(8192)/kfree -> 112 cycles
>> >> 10000 times kmalloc(16384)/kfree -> 112 cycles
>> >>
>> >> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> >> ---
>> >> Based on next-20160418
>> >> ---
>> >>  init/Kconfig |   9 ++++
>> >>  mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> >>  2 files changed, 174 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/init/Kconfig b/init/Kconfig
>> >> index 0dfd09d..ee35418 100644
>> >> --- a/init/Kconfig
>> >> +++ b/init/Kconfig
>> >> @@ -1742,6 +1742,15 @@ config SLOB
>> >>
>> >>  endchoice
>> >>
>> >> +config FREELIST_RANDOM
>> >> +     default n
>> >> +     depends on SLAB
>> >> +     bool "SLAB freelist randomization"
>> >> +     help
>> >> +       Randomizes the freelist order used on creating new SLABs. This
>> >> +       security feature reduces the predictability of the kernel slab
>> >> +       allocator against heap overflows.
>> >> +
>> >>  config SLUB_CPU_PARTIAL
>> >>       default y
>> >>       depends on SLUB && SMP
>> >> diff --git a/mm/slab.c b/mm/slab.c
>> >> index b70aabf..8371d80 100644
>> >> --- a/mm/slab.c
>> >> +++ b/mm/slab.c
>> >> @@ -116,6 +116,7 @@
>> >>  #include     <linux/kmemcheck.h>
>> >>  #include     <linux/memory.h>
>> >>  #include     <linux/prefetch.h>
>> >> +#include     <linux/log2.h>
>> >>
>> >>  #include     <net/sock.h>
>> >>
>> >> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>> >>       }
>> >>  }
>> >>
>> >> +#ifdef CONFIG_FREELIST_RANDOM
>> >> +/*
>> >> + * Master lists are pre-computed random lists
>> >> + * Lists of different sizes are used to optimize performance on SLABS with
>> >> + * different object counts.
>> >> + */
>> >
>> > If it is for optimization, it would be one option to have separate
>> > random list for each kmem_cache. It would consume more memory but it
>> > would be marginal. And, it provides more un-predictability and it can
>> > give better performance because we don't need state->type (more, less)
>> > and special handling related for it.
>> >
>>
>> I am not sur because major caches are created early at boot time. We still have
>> the same entropy problem and we are wasting a bit more memory. It will be faster
>
> I think that entropy problem is another issue. It should be considered
> separately. If it is solved, making per-computed array for each
> kmem_cache will provide more un-predictability. If someone who succeed to
> exploit some kmem_cache with 128 object per slab want to exploit
> another kmem_cache with 128 object per slab, this separate pre-computed array
> will be helpful.
>
>> on usage though but not sure it will be significant.
>
> I also think it's not significant. But, besides performance effect,
> code doesn't look very attractive and extendable. In case of SLUB,
> there is setup_slub_max_order option and object per slab could be larger
> than 256. To deal with it, we need to add many more static definition
> and it looks not good to me. Please use dynamic allocated memory
> instead of static array definition.
>

You don't need to. We wrap the list used (if you look at get_next_entry
we reset at pos 0 when we arrive to the list size).

I do think that the design will be better with a dedicated list per cache. Given
you seem fine with the memory differences, performance can only get better...

I will refactor for that on the next iteration.

>>
>> >> +static freelist_idx_t master_list_2[2];
>> >> +static freelist_idx_t master_list_4[4];
>> >> +static freelist_idx_t master_list_8[8];
>> >> +static freelist_idx_t master_list_16[16];
>> >> +static freelist_idx_t master_list_32[32];
>> >> +static freelist_idx_t master_list_64[64];
>> >> +static freelist_idx_t master_list_128[128];
>> >> +static freelist_idx_t master_list_256[256];
>> >> +const static struct m_list {
>> >> +     size_t count;
>> >> +     freelist_idx_t *list;
>> >> +} master_lists[] = {
>> >> +     { ARRAY_SIZE(master_list_2), master_list_2 },
>> >> +     { ARRAY_SIZE(master_list_4), master_list_4 },
>> >> +     { ARRAY_SIZE(master_list_8), master_list_8 },
>> >> +     { ARRAY_SIZE(master_list_16), master_list_16 },
>> >> +     { ARRAY_SIZE(master_list_32), master_list_32 },
>> >> +     { ARRAY_SIZE(master_list_64), master_list_64 },
>> >> +     { ARRAY_SIZE(master_list_128), master_list_128 },
>> >> +     { ARRAY_SIZE(master_list_256), master_list_256 },
>> >> +};
>> >> +
>> >> +/* Pre-compute the Freelist master lists at boot */
>> >> +static void __init freelist_random_init(void)
>> >> +{
>> >> +     unsigned int seed;
>> >> +     size_t z, i, rand;
>> >> +     struct rnd_state slab_rand;
>> >> +
>> >> +     get_random_bytes_arch(&seed, sizeof(seed));
>> >> +     prandom_seed_state(&slab_rand, seed);
>> >> +
>> >> +     for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
>> >> +             for (i = 0; i < master_lists[z].count; i++)
>> >> +                     master_lists[z].list[i] = i;
>> >> +
>> >> +             /* Fisher-Yates shuffle */
>> >> +             for (i = master_lists[z].count - 1; i > 0; i--) {
>> >> +                     rand = prandom_u32_state(&slab_rand);
>> >> +                     rand %= (i + 1);
>> >> +                     swap(master_lists[z].list[i],
>> >> +                             master_lists[z].list[rand]);
>> >> +             }
>> >> +     }
>> >> +}
>> >> +#else
>> >> +static inline void __init freelist_random_init(void) { }
>> >> +#endif /* CONFIG_FREELIST_RANDOM */
>> >> +
>> >> +
>> >>  /*
>> >>   * Initialisation.  Called after the page allocator have been initialised and
>> >>   * before smp_init().
>> >> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>> >>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>> >>               slab_max_order = SLAB_MAX_ORDER_HI;
>> >>
>> >> +     freelist_random_init();
>> >> +
>> >>       /* Bootstrap is tricky, because several objects are allocated
>> >>        * from caches that do not exist yet:
>> >>        * 1) initialize the kmem_cache cache: it contains the struct
>> >> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>> >>  #endif
>> >>  }
>> >>
>> >> +#ifdef CONFIG_FREELIST_RANDOM
>> >> +/* Identify if the target freelist matches the pre-computed list */
>> >> +enum master_type {
>> >> +     match,
>> >> +     less,
>> >> +     more
>> >> +};
>> >> +
>> >> +/* Hold information during a freelist initialization */
>> >> +struct freelist_init_state {
>> >> +     unsigned int padding;
>> >> +     unsigned int pos;
>> >> +     unsigned int count;
>> >> +     struct m_list master_list;
>> >> +     unsigned int master_count;
>> >> +     enum master_type type;
>> >> +};
>> >> +
>> >> +/* Select the right pre-computed master list and initialize state */
>> >> +static void freelist_state_initialize(struct freelist_init_state *state,
>> >> +                                   unsigned int count)
>> >> +{
>> >> +     unsigned int idx;
>> >> +     const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
>> >> +
>> >> +     memset(state, 0, sizeof(*state));
>> >> +     state->count = count;
>> >> +     state->pos = 0;
>> >
>> > Using pos = 0 here looks not good in terms of security. In this case,
>> > every new page having same size class have same sequence of freelist since boot.
>> >
>> > How about using random value to set pos? It provides some more randomness
>> > with minimal overhead.
>> >
>>
>> I think it is a good idea. I will add that for the next iteration.
>>
>> >> +     /* count is always >= 2 */
>> >> +     idx = ilog2(count) - 1;
>> >> +     if (idx >= last_idx)
>> >> +             idx = last_idx;
>> >> +     else if (roundup_pow_of_two(idx + 1) != count)
>> >> +             idx++;
>> >> +     state->master_list = master_lists[idx];
>> >> +     if (state->master_list.count == state->count)
>> >> +             state->type = match;
>> >> +     else if (state->master_list.count > state->count)
>> >> +             state->type = more;
>> >> +     else
>> >> +             state->type = less;
>> >> +}
>> >> +
>> >> +/* Get the next entry on the master list depending on the target list size */
>> >> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> >> +{
>> >> +     if (state->type == less && state->pos == state->master_list.count) {
>> >> +             state->padding += state->pos;
>> >> +             state->pos = 0;
>> >> +     }
>> >> +     BUG_ON(state->pos >= state->master_list.count);
>> >> +     return state->master_list.list[state->pos++];
>> >> +}
>> >> +
>> >> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> >> +{
>> >> +     freelist_idx_t cur, entry;
>> >> +
>> >> +     entry = get_next_entry(state);
>> >> +
>> >> +     if (state->type != match) {
>> >> +             while ((entry + state->padding) >= state->count)
>> >> +                     entry = get_next_entry(state);
>> >> +             cur = entry + state->padding;
>> >> +             BUG_ON(cur >= state->count);
>> >> +     } else {
>> >> +             cur = entry;
>> >> +     }
>> >> +
>> >> +     return cur;
>> >> +}
>> >> +
>> >> +/* Shuffle the freelist initialization state based on pre-computed lists */
>> >> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
>> >> +                          unsigned int count)
>> >> +{
>> >> +     unsigned int i;
>> >> +     struct freelist_init_state state;
>> >> +
>> >> +     if (count < 2) {
>> >> +             for (i = 0; i < count; i++)
>> >> +                     set_free_obj(page, i, i);
>> >> +             return;
>> >> +     }
>> >> +
>> >> +     /* Last chunk is used already in this case */
>> >> +     if (OBJFREELIST_SLAB(cachep))
>> >> +             count--;
>> >> +
>> >> +     freelist_state_initialize(&state, count);
>> >> +     for (i = 0; i < count; i++)
>> >> +             set_free_obj(page, i, next_random_slot(&state));
>> >> +
>> >> +     if (OBJFREELIST_SLAB(cachep))
>> >> +             set_free_obj(page, i, i);
>> >
>> > Please consider last object of OBJFREELIST_SLAB cache, too.
>> >
>> > freelist_state_init()
>> > last_obj = next_randome_slot()
>> > page->freelist = XXX
>> > for (i = 0; i < count - 1; i++)
>> >         set_free_obj()
>> > set_free_obj(last_obj);
>> >
>> > Thanks.
>> >
>>
>> The current implementation take the last chunk by default before the
>> freelist is initialized. Do you want it to be randomized as well?
>
> Yes.
>
> Thanks.
>
>>
>> >> +}
>> >> +#else
>> >> +static inline void shuffle_freelist(struct kmem_cache *cachep,
>> >> +                                 struct page *page, unsigned int count) { }
>> >> +#endif /* CONFIG_FREELIST_RANDOM */
>> >> +
>> >>  static void cache_init_objs(struct kmem_cache *cachep,
>> >>                           struct page *page)
>> >>  {
>> >> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>> >>                       kasan_poison_object_data(cachep, objp);
>> >>               }
>> >>
>> >> -             set_free_obj(page, i, i);
>> >> +             /* If enabled, initialization is done in shuffle_freelist */
>> >> +             if (!config_enabled(CONFIG_FREELIST_RANDOM))
>> >> +                     set_free_obj(page, i, i);
>> >>       }
>> >> +
>> >> +     shuffle_freelist(cachep, page, cachep->num);
>> >>  }
>> >>
>> >>  static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
>> >> --
>> >> 2.8.0.rc3.226.g39d4020
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-25 20:39 ` Thomas Garnier
@ 2016-04-26 14:19   ` Christoph Lameter
  -1 siblings, 0 replies; 35+ messages in thread
From: Christoph Lameter @ 2016-04-26 14:19 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, 25 Apr 2016, Thomas Garnier wrote:

> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available in the boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API. We also generate a shift
> random number to shift pre-computed freelist for each new set of pages.
>
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.
>
> Performance results highlighted no major changes:

Ok. alloc/free tests are not affected since this exercises the per cpu
objects. And the other ones as well since most of the overhead occurs on
slab page initialization.

> Before:
> 10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
> 10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
> 10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
> 10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
> 10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
>
> After:
> 10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
> 10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
> 10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
> 10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
> 10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles

And there is some slight regression with the larger objects. Not sure if
we are really hitting the slab page initialization too much there either.
Pretty minimal in synthetic tests. Can you run something like hackbench
too?

Otherwise this looks ok.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-26 14:19   ` Christoph Lameter
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Lameter @ 2016-04-26 14:19 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, 25 Apr 2016, Thomas Garnier wrote:

> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available in the boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API. We also generate a shift
> random number to shift pre-computed freelist for each new set of pages.
>
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.
>
> Performance results highlighted no major changes:

Ok. alloc/free tests are not affected since this exercises the per cpu
objects. And the other ones as well since most of the overhead occurs on
slab page initialization.

> Before:
> 10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
> 10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
> 10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
> 10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
> 10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
>
> After:
> 10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
> 10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
> 10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
> 10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
> 10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles

And there is some slight regression with the larger objects. Not sure if
we are really hitting the slab page initialization too much there either.
Pretty minimal in synthetic tests. Can you run something like hackbench
too?

Otherwise this looks ok.

Acked-by: Christoph Lameter <cl@linux.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-26  0:40   ` Joonsoo Kim
@ 2016-04-26  1:58     ` Thomas Garnier
  -1 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-26  1:58 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

Make sense. I think it is still valuable to randomize earlier pages. I
will adapt the code, test and send patch v4.

Thanks for the quick feedback,
Thomas

On Mon, Apr 25, 2016 at 5:40 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Mon, Apr 25, 2016 at 01:39:23PM -0700, Thomas Garnier wrote:
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. Each kmem_cache has its own randomized freelist except
>> early on boot where global lists are used. This security feature reduces
>> the predictability of the kernel SLAB allocator against heap overflows
>> rendering attacks much less stable.
>>
>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available in the boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API. We also generate a shift
>> random number to shift pre-computed freelist for each new set of pages.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>>
>> Performance results highlighted no major changes:
>>
>> slab_test 1 run on boot. Difference only seen on the 2048 size test
>> being the worse case scenario covered by freelist randomization. New
>> slab pages are constantly being created on the 10000 allocations.
>> Variance should be mainly due to getting new pages every few
>> allocations.
>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
>> 10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
>> 10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
>> 10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
>> 10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
>> 10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
>> 10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
>> 10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
>> 10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
>> 10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
>> 10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
>> 10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 121 cycles
>> 10000 times kmalloc(16)/kfree -> 121 cycles
>> 10000 times kmalloc(32)/kfree -> 121 cycles
>> 10000 times kmalloc(64)/kfree -> 121 cycles
>> 10000 times kmalloc(128)/kfree -> 121 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 119 cycles
>> 10000 times kmalloc(1024)/kfree -> 119 cycles
>> 10000 times kmalloc(2048)/kfree -> 119 cycles
>> 10000 times kmalloc(4096)/kfree -> 121 cycles
>> 10000 times kmalloc(8192)/kfree -> 119 cycles
>> 10000 times kmalloc(16384)/kfree -> 119 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
>> 10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
>> 10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
>> 10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
>> 10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
>> 10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
>> 10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
>> 10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
>> 10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
>> 10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
>> 10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
>> 10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 121 cycles
>> 10000 times kmalloc(16)/kfree -> 121 cycles
>> 10000 times kmalloc(32)/kfree -> 123 cycles
>> 10000 times kmalloc(64)/kfree -> 142 cycles
>> 10000 times kmalloc(128)/kfree -> 121 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 119 cycles
>> 10000 times kmalloc(1024)/kfree -> 119 cycles
>> 10000 times kmalloc(2048)/kfree -> 119 cycles
>> 10000 times kmalloc(4096)/kfree -> 119 cycles
>> 10000 times kmalloc(8192)/kfree -> 119 cycles
>> 10000 times kmalloc(16384)/kfree -> 119 cycles
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160422
>> ---
>>  include/linux/slab_def.h |   4 +
>>  init/Kconfig             |   9 ++
>>  mm/slab.c                | 213 ++++++++++++++++++++++++++++++++++++++++++++++-
>>  3 files changed, 224 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
>> index 9edbbf3..182ec26 100644
>> --- a/include/linux/slab_def.h
>> +++ b/include/linux/slab_def.h
>> @@ -80,6 +80,10 @@ struct kmem_cache {
>>       struct kasan_cache kasan_info;
>>  #endif
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +     void *random_seq;
>> +#endif
>> +
>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0c66640..73453d0 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>>  endchoice
>>
>> +config FREELIST_RANDOM
>> +     default n
>> +     depends on SLAB
>> +     bool "SLAB freelist randomization"
>> +     help
>> +       Randomizes the freelist order used on creating new SLABs. This
>> +       security feature reduces the predictability of the kernel slab
>> +       allocator against heap overflows.
>> +
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b82ee6b..89eb617 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>>  #include     <linux/kmemcheck.h>
>>  #include     <linux/memory.h>
>>  #include     <linux/prefetch.h>
>> +#include     <linux/log2.h>
>>
>>  #include     <net/sock.h>
>>
>> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>       }
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
>> +                     size_t count)
>> +{
>> +     size_t i;
>> +     unsigned int rand;
>> +
>> +     for (i = 0; i < count; i++)
>> +             list[i] = i;
>> +
>> +     /* Fisher-Yates shuffle */
>> +     for (i = count - 1; i > 0; i--) {
>> +             rand = prandom_u32_state(state);
>> +             rand %= (i + 1);
>> +             swap(list[i], list[rand]);
>> +     }
>> +}
>> +
>> +/* Create a random sequence per cache */
>> +static void cache_random_seq_create(struct kmem_cache *cachep)
>> +{
>> +     unsigned int seed, count = cachep->num;
>> +     struct rnd_state state;
>> +
>> +     if (count < 2)
>> +             return;
>> +
>> +     cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
>> +     BUG_ON(cachep->random_seq == NULL);
>
> Hello,
>
> Please make function return int and propagate error to the cache creator.
>
>> +
>> +     /* Get best entropy at this stage */
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> +     prandom_seed_state(&state, seed);
>> +
>> +     freelist_randomize(&state, cachep->random_seq, count);
>> +}
>> +
>> +/* Destroy the per-cache random freelist sequence */
>> +static void cache_random_seq_destroy(struct kmem_cache *cachep)
>> +{
>> +     kfree(cachep->random_seq);
>> +     cachep->random_seq = NULL;
>> +}
>> +
>> +/*
>> + * Global static list are used when pre-computed cache list are not yet
>> + * available. Lists of different sizes are created to optimize performance on
>> + * SLABS with different object counts.
>> + */
>> +static freelist_idx_t freelist_random_seq_2[2];
>> +static freelist_idx_t freelist_random_seq_4[4];
>> +static freelist_idx_t freelist_random_seq_8[8];
>> +static freelist_idx_t freelist_random_seq_16[16];
>> +static freelist_idx_t freelist_random_seq_32[32];
>> +static freelist_idx_t freelist_random_seq_64[64];
>> +static freelist_idx_t freelist_random_seq_128[128];
>> +static freelist_idx_t freelist_random_seq_256[256];
>> +const static struct m_list {
>> +     size_t count;
>> +     freelist_idx_t *list;
>> +} freelist_random_seqs[] = {
>> +     { ARRAY_SIZE(freelist_random_seq_2), freelist_random_seq_2 },
>> +     { ARRAY_SIZE(freelist_random_seq_4), freelist_random_seq_4 },
>> +     { ARRAY_SIZE(freelist_random_seq_8), freelist_random_seq_8 },
>> +     { ARRAY_SIZE(freelist_random_seq_16), freelist_random_seq_16 },
>> +     { ARRAY_SIZE(freelist_random_seq_32), freelist_random_seq_32 },
>> +     { ARRAY_SIZE(freelist_random_seq_64), freelist_random_seq_64 },
>> +     { ARRAY_SIZE(freelist_random_seq_128), freelist_random_seq_128 },
>> +     { ARRAY_SIZE(freelist_random_seq_256), freelist_random_seq_256 },
>> +};
>
> I'd like to remove this global static list even if we can't get random
> sequence in early boot-up process. In this stage that kernel is not
> yet initialized, malicious user cannot do anything so random sequence
> doesn't give any more security. After kernel initialization, we will
> use per cache random sequence so problem suface is really small. If you
> want to randomize freelist sequence even in this case, you can manually
> permute the sequence with calling prandom_u32_state(). But, I don't
> think it is necessary.
>
> Thanks.
>
>> +
>> +/* Pre-compute the global pre-computed lists early at boot */
>> +static void __init freelist_random_init(void)
>> +{
>> +     unsigned int seed;
>> +     size_t i;
>> +     struct rnd_state state;
>> +
>> +     /* Get best entropy available at this stage */
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> +     prandom_seed_state(&state, seed);
>> +
>> +     for (i = 0; i < ARRAY_SIZE(freelist_random_seqs); i++) {
>> +             freelist_randomize(&state, freelist_random_seqs[i].list,
>> +                             freelist_random_seqs[i].count);
>> +     }
>> +}
>> +#else
>> +static inline void __init freelist_random_init(void) { }
>> +static inline void cache_random_seq_create(struct kmem_cache *cachep) { }
>> +static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>> +
>>  /*
>>   * Initialisation.  Called after the page allocator have been initialised and
>>   * before smp_init().
>> @@ -1256,6 +1351,8 @@ void __init kmem_cache_init(void)
>>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>>               slab_max_order = SLAB_MAX_ORDER_HI;
>>
>> +     freelist_random_init();
>> +
>>       /* Bootstrap is tricky, because several objects are allocated
>>        * from caches that do not exist yet:
>>        * 1) initialize the kmem_cache cache: it contains the struct
>> @@ -2337,6 +2434,8 @@ void __kmem_cache_release(struct kmem_cache *cachep)
>>       int i;
>>       struct kmem_cache_node *n;
>>
>> +     cache_random_seq_destroy(cachep);
>> +
>>       free_percpu(cachep->cpu_cache);
>>
>>       /* NUMA: free the node structures */
>> @@ -2443,15 +2542,122 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>>  #endif
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/* Hold information during a freelist initialization */
>> +struct freelist_init_state {
>> +     unsigned int padding;
>> +     unsigned int pos;
>> +     unsigned int count;
>> +     unsigned int rand;
>> +     struct m_list freelist_random_seq;
>> +};
>> +
>> +/* Select the right pre-computed list and initialize state */
>> +static void freelist_state_initialize(struct freelist_init_state *state,
>> +                             struct kmem_cache *cachep,
>> +                             unsigned int count)
>> +{
>> +     unsigned int idx;
>> +     const unsigned int last_idx = ARRAY_SIZE(freelist_random_seqs) - 1;
>> +
>> +     memset(state, 0, sizeof(*state));
>> +     state->count = count;
>> +     state->pos = 0;
>> +
>> +     /* Use best entropy available to define a random shift */
>> +     get_random_bytes_arch(&state->rand, sizeof(state->rand));
>> +
>> +     if (cachep->random_seq) {
>> +             state->freelist_random_seq.list = cachep->random_seq;
>> +             state->freelist_random_seq.count = count;
>> +     } else {
>> +             /* count is always >= 2 */
>> +             idx = ilog2(count) - 1;
>> +             if (idx >= last_idx)
>> +                     idx = last_idx;
>> +             else if (roundup_pow_of_two(idx + 1) != count)
>> +                     idx++;
>> +             state->freelist_random_seq = freelist_random_seqs[idx];
>> +     }
>> +}
>> +
>> +/* Get the next entry on the list depending on the target list size */
>> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> +{
>> +     freelist_idx_t ret;
>> +
>> +     if (state->pos == state->freelist_random_seq.count) {
>> +             state->padding += state->pos;
>> +             state->pos = 0;
>> +     }
>> +
>> +     /* Randomize the entry using the random shift */
>> +     ret = state->freelist_random_seq.list[state->pos++];
>> +     ret = (ret + state->rand) % state->freelist_random_seq.count;
>> +     return ret;
>> +}
>> +
>> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> +{
>> +     freelist_idx_t entry;
>> +
>> +     do {
>> +             entry = get_next_entry(state);
>> +     } while ((entry + state->padding) >= state->count);
>> +
>> +     return entry + state->padding;
>> +}
>> +
>> +/*
>> + * Shuffle the freelist initialization state based on pre-computed lists.
>> + * return true if the list was successfully shuffled, false otherwise.
>> + */
>> +static bool shuffle_freelist(struct kmem_cache *cachep, struct page *page)
>> +{
>> +     unsigned int objfreelist, i, count = cachep->num;
>> +     struct freelist_init_state state;
>> +
>> +     if (count < 2)
>> +             return false;
>> +
>> +     objfreelist = 0;
>> +     freelist_state_initialize(&state, cachep, count);
>> +
>> +     /* Take the first random entry as the objfreelist */
>> +     if (OBJFREELIST_SLAB(cachep)) {
>> +             objfreelist = next_random_slot(&state);
>> +             page->freelist = index_to_obj(cachep, page, objfreelist) +
>> +                                             obj_offset(cachep);
>> +             count--;
>> +     }
>> +     for (i = 0; i < count; i++)
>> +             set_free_obj(page, i, next_random_slot(&state));
>> +
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             set_free_obj(page, i, objfreelist);
>> +     return true;
>> +}
>> +#else
>> +static inline bool shuffle_freelist(struct kmem_cache *cachep,
>> +                             struct page *page)
>> +{
>> +     return false;
>> +}
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>>  static void cache_init_objs(struct kmem_cache *cachep,
>>                           struct page *page)
>>  {
>>       int i;
>>       void *objp;
>> +     bool shuffled;
>>
>>       cache_init_objs_debug(cachep, page);
>>
>> -     if (OBJFREELIST_SLAB(cachep)) {
>> +     /* Try to randomize the freelist if enabled */
>> +     shuffled = shuffle_freelist(cachep, page);
>> +
>> +     if (!shuffled && OBJFREELIST_SLAB(cachep)) {
>>               page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
>>                                               obj_offset(cachep);
>>       }
>> @@ -2465,7 +2671,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
>>                       kasan_poison_object_data(cachep, objp);
>>               }
>>
>> -             set_free_obj(page, i, i);
>> +             if (!shuffled)
>> +                     set_free_obj(page, i, i);
>>       }
>>  }
>>
>> @@ -3815,6 +4022,8 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
>>       int shared = 0;
>>       int batchcount = 0;
>>
>> +     cache_random_seq_create(cachep);
>> +
>>       if (!is_root_cache(cachep)) {
>>               struct kmem_cache *root = memcg_root_cache(cachep);
>>               limit = root->limit;
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-26  1:58     ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-26  1:58 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

Make sense. I think it is still valuable to randomize earlier pages. I
will adapt the code, test and send patch v4.

Thanks for the quick feedback,
Thomas

On Mon, Apr 25, 2016 at 5:40 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Mon, Apr 25, 2016 at 01:39:23PM -0700, Thomas Garnier wrote:
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. Each kmem_cache has its own randomized freelist except
>> early on boot where global lists are used. This security feature reduces
>> the predictability of the kernel SLAB allocator against heap overflows
>> rendering attacks much less stable.
>>
>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available in the boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API. We also generate a shift
>> random number to shift pre-computed freelist for each new set of pages.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>>
>> Performance results highlighted no major changes:
>>
>> slab_test 1 run on boot. Difference only seen on the 2048 size test
>> being the worse case scenario covered by freelist randomization. New
>> slab pages are constantly being created on the 10000 allocations.
>> Variance should be mainly due to getting new pages every few
>> allocations.
>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
>> 10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
>> 10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
>> 10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
>> 10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
>> 10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
>> 10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
>> 10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
>> 10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
>> 10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
>> 10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
>> 10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 121 cycles
>> 10000 times kmalloc(16)/kfree -> 121 cycles
>> 10000 times kmalloc(32)/kfree -> 121 cycles
>> 10000 times kmalloc(64)/kfree -> 121 cycles
>> 10000 times kmalloc(128)/kfree -> 121 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 119 cycles
>> 10000 times kmalloc(1024)/kfree -> 119 cycles
>> 10000 times kmalloc(2048)/kfree -> 119 cycles
>> 10000 times kmalloc(4096)/kfree -> 121 cycles
>> 10000 times kmalloc(8192)/kfree -> 119 cycles
>> 10000 times kmalloc(16384)/kfree -> 119 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
>> 10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
>> 10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
>> 10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
>> 10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
>> 10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
>> 10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
>> 10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
>> 10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
>> 10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
>> 10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
>> 10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 121 cycles
>> 10000 times kmalloc(16)/kfree -> 121 cycles
>> 10000 times kmalloc(32)/kfree -> 123 cycles
>> 10000 times kmalloc(64)/kfree -> 142 cycles
>> 10000 times kmalloc(128)/kfree -> 121 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 119 cycles
>> 10000 times kmalloc(1024)/kfree -> 119 cycles
>> 10000 times kmalloc(2048)/kfree -> 119 cycles
>> 10000 times kmalloc(4096)/kfree -> 119 cycles
>> 10000 times kmalloc(8192)/kfree -> 119 cycles
>> 10000 times kmalloc(16384)/kfree -> 119 cycles
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160422
>> ---
>>  include/linux/slab_def.h |   4 +
>>  init/Kconfig             |   9 ++
>>  mm/slab.c                | 213 ++++++++++++++++++++++++++++++++++++++++++++++-
>>  3 files changed, 224 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
>> index 9edbbf3..182ec26 100644
>> --- a/include/linux/slab_def.h
>> +++ b/include/linux/slab_def.h
>> @@ -80,6 +80,10 @@ struct kmem_cache {
>>       struct kasan_cache kasan_info;
>>  #endif
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +     void *random_seq;
>> +#endif
>> +
>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0c66640..73453d0 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>>  endchoice
>>
>> +config FREELIST_RANDOM
>> +     default n
>> +     depends on SLAB
>> +     bool "SLAB freelist randomization"
>> +     help
>> +       Randomizes the freelist order used on creating new SLABs. This
>> +       security feature reduces the predictability of the kernel slab
>> +       allocator against heap overflows.
>> +
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b82ee6b..89eb617 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>>  #include     <linux/kmemcheck.h>
>>  #include     <linux/memory.h>
>>  #include     <linux/prefetch.h>
>> +#include     <linux/log2.h>
>>
>>  #include     <net/sock.h>
>>
>> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>       }
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
>> +                     size_t count)
>> +{
>> +     size_t i;
>> +     unsigned int rand;
>> +
>> +     for (i = 0; i < count; i++)
>> +             list[i] = i;
>> +
>> +     /* Fisher-Yates shuffle */
>> +     for (i = count - 1; i > 0; i--) {
>> +             rand = prandom_u32_state(state);
>> +             rand %= (i + 1);
>> +             swap(list[i], list[rand]);
>> +     }
>> +}
>> +
>> +/* Create a random sequence per cache */
>> +static void cache_random_seq_create(struct kmem_cache *cachep)
>> +{
>> +     unsigned int seed, count = cachep->num;
>> +     struct rnd_state state;
>> +
>> +     if (count < 2)
>> +             return;
>> +
>> +     cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
>> +     BUG_ON(cachep->random_seq == NULL);
>
> Hello,
>
> Please make function return int and propagate error to the cache creator.
>
>> +
>> +     /* Get best entropy at this stage */
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> +     prandom_seed_state(&state, seed);
>> +
>> +     freelist_randomize(&state, cachep->random_seq, count);
>> +}
>> +
>> +/* Destroy the per-cache random freelist sequence */
>> +static void cache_random_seq_destroy(struct kmem_cache *cachep)
>> +{
>> +     kfree(cachep->random_seq);
>> +     cachep->random_seq = NULL;
>> +}
>> +
>> +/*
>> + * Global static list are used when pre-computed cache list are not yet
>> + * available. Lists of different sizes are created to optimize performance on
>> + * SLABS with different object counts.
>> + */
>> +static freelist_idx_t freelist_random_seq_2[2];
>> +static freelist_idx_t freelist_random_seq_4[4];
>> +static freelist_idx_t freelist_random_seq_8[8];
>> +static freelist_idx_t freelist_random_seq_16[16];
>> +static freelist_idx_t freelist_random_seq_32[32];
>> +static freelist_idx_t freelist_random_seq_64[64];
>> +static freelist_idx_t freelist_random_seq_128[128];
>> +static freelist_idx_t freelist_random_seq_256[256];
>> +const static struct m_list {
>> +     size_t count;
>> +     freelist_idx_t *list;
>> +} freelist_random_seqs[] = {
>> +     { ARRAY_SIZE(freelist_random_seq_2), freelist_random_seq_2 },
>> +     { ARRAY_SIZE(freelist_random_seq_4), freelist_random_seq_4 },
>> +     { ARRAY_SIZE(freelist_random_seq_8), freelist_random_seq_8 },
>> +     { ARRAY_SIZE(freelist_random_seq_16), freelist_random_seq_16 },
>> +     { ARRAY_SIZE(freelist_random_seq_32), freelist_random_seq_32 },
>> +     { ARRAY_SIZE(freelist_random_seq_64), freelist_random_seq_64 },
>> +     { ARRAY_SIZE(freelist_random_seq_128), freelist_random_seq_128 },
>> +     { ARRAY_SIZE(freelist_random_seq_256), freelist_random_seq_256 },
>> +};
>
> I'd like to remove this global static list even if we can't get random
> sequence in early boot-up process. In this stage that kernel is not
> yet initialized, malicious user cannot do anything so random sequence
> doesn't give any more security. After kernel initialization, we will
> use per cache random sequence so problem suface is really small. If you
> want to randomize freelist sequence even in this case, you can manually
> permute the sequence with calling prandom_u32_state(). But, I don't
> think it is necessary.
>
> Thanks.
>
>> +
>> +/* Pre-compute the global pre-computed lists early at boot */
>> +static void __init freelist_random_init(void)
>> +{
>> +     unsigned int seed;
>> +     size_t i;
>> +     struct rnd_state state;
>> +
>> +     /* Get best entropy available at this stage */
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> +     prandom_seed_state(&state, seed);
>> +
>> +     for (i = 0; i < ARRAY_SIZE(freelist_random_seqs); i++) {
>> +             freelist_randomize(&state, freelist_random_seqs[i].list,
>> +                             freelist_random_seqs[i].count);
>> +     }
>> +}
>> +#else
>> +static inline void __init freelist_random_init(void) { }
>> +static inline void cache_random_seq_create(struct kmem_cache *cachep) { }
>> +static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>> +
>>  /*
>>   * Initialisation.  Called after the page allocator have been initialised and
>>   * before smp_init().
>> @@ -1256,6 +1351,8 @@ void __init kmem_cache_init(void)
>>       if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>>               slab_max_order = SLAB_MAX_ORDER_HI;
>>
>> +     freelist_random_init();
>> +
>>       /* Bootstrap is tricky, because several objects are allocated
>>        * from caches that do not exist yet:
>>        * 1) initialize the kmem_cache cache: it contains the struct
>> @@ -2337,6 +2434,8 @@ void __kmem_cache_release(struct kmem_cache *cachep)
>>       int i;
>>       struct kmem_cache_node *n;
>>
>> +     cache_random_seq_destroy(cachep);
>> +
>>       free_percpu(cachep->cpu_cache);
>>
>>       /* NUMA: free the node structures */
>> @@ -2443,15 +2542,122 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>>  #endif
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/* Hold information during a freelist initialization */
>> +struct freelist_init_state {
>> +     unsigned int padding;
>> +     unsigned int pos;
>> +     unsigned int count;
>> +     unsigned int rand;
>> +     struct m_list freelist_random_seq;
>> +};
>> +
>> +/* Select the right pre-computed list and initialize state */
>> +static void freelist_state_initialize(struct freelist_init_state *state,
>> +                             struct kmem_cache *cachep,
>> +                             unsigned int count)
>> +{
>> +     unsigned int idx;
>> +     const unsigned int last_idx = ARRAY_SIZE(freelist_random_seqs) - 1;
>> +
>> +     memset(state, 0, sizeof(*state));
>> +     state->count = count;
>> +     state->pos = 0;
>> +
>> +     /* Use best entropy available to define a random shift */
>> +     get_random_bytes_arch(&state->rand, sizeof(state->rand));
>> +
>> +     if (cachep->random_seq) {
>> +             state->freelist_random_seq.list = cachep->random_seq;
>> +             state->freelist_random_seq.count = count;
>> +     } else {
>> +             /* count is always >= 2 */
>> +             idx = ilog2(count) - 1;
>> +             if (idx >= last_idx)
>> +                     idx = last_idx;
>> +             else if (roundup_pow_of_two(idx + 1) != count)
>> +                     idx++;
>> +             state->freelist_random_seq = freelist_random_seqs[idx];
>> +     }
>> +}
>> +
>> +/* Get the next entry on the list depending on the target list size */
>> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> +{
>> +     freelist_idx_t ret;
>> +
>> +     if (state->pos == state->freelist_random_seq.count) {
>> +             state->padding += state->pos;
>> +             state->pos = 0;
>> +     }
>> +
>> +     /* Randomize the entry using the random shift */
>> +     ret = state->freelist_random_seq.list[state->pos++];
>> +     ret = (ret + state->rand) % state->freelist_random_seq.count;
>> +     return ret;
>> +}
>> +
>> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> +{
>> +     freelist_idx_t entry;
>> +
>> +     do {
>> +             entry = get_next_entry(state);
>> +     } while ((entry + state->padding) >= state->count);
>> +
>> +     return entry + state->padding;
>> +}
>> +
>> +/*
>> + * Shuffle the freelist initialization state based on pre-computed lists.
>> + * return true if the list was successfully shuffled, false otherwise.
>> + */
>> +static bool shuffle_freelist(struct kmem_cache *cachep, struct page *page)
>> +{
>> +     unsigned int objfreelist, i, count = cachep->num;
>> +     struct freelist_init_state state;
>> +
>> +     if (count < 2)
>> +             return false;
>> +
>> +     objfreelist = 0;
>> +     freelist_state_initialize(&state, cachep, count);
>> +
>> +     /* Take the first random entry as the objfreelist */
>> +     if (OBJFREELIST_SLAB(cachep)) {
>> +             objfreelist = next_random_slot(&state);
>> +             page->freelist = index_to_obj(cachep, page, objfreelist) +
>> +                                             obj_offset(cachep);
>> +             count--;
>> +     }
>> +     for (i = 0; i < count; i++)
>> +             set_free_obj(page, i, next_random_slot(&state));
>> +
>> +     if (OBJFREELIST_SLAB(cachep))
>> +             set_free_obj(page, i, objfreelist);
>> +     return true;
>> +}
>> +#else
>> +static inline bool shuffle_freelist(struct kmem_cache *cachep,
>> +                             struct page *page)
>> +{
>> +     return false;
>> +}
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>>  static void cache_init_objs(struct kmem_cache *cachep,
>>                           struct page *page)
>>  {
>>       int i;
>>       void *objp;
>> +     bool shuffled;
>>
>>       cache_init_objs_debug(cachep, page);
>>
>> -     if (OBJFREELIST_SLAB(cachep)) {
>> +     /* Try to randomize the freelist if enabled */
>> +     shuffled = shuffle_freelist(cachep, page);
>> +
>> +     if (!shuffled && OBJFREELIST_SLAB(cachep)) {
>>               page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
>>                                               obj_offset(cachep);
>>       }
>> @@ -2465,7 +2671,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
>>                       kasan_poison_object_data(cachep, objp);
>>               }
>>
>> -             set_free_obj(page, i, i);
>> +             if (!shuffled)
>> +                     set_free_obj(page, i, i);
>>       }
>>  }
>>
>> @@ -3815,6 +4022,8 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
>>       int shared = 0;
>>       int batchcount = 0;
>>
>> +     cache_random_seq_create(cachep);
>> +
>>       if (!is_root_cache(cachep)) {
>>               struct kmem_cache *root = memcg_root_cache(cachep);
>>               limit = root->limit;
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-25 20:39 ` Thomas Garnier
@ 2016-04-26  0:40   ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-26  0:40 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, Apr 25, 2016 at 01:39:23PM -0700, Thomas Garnier wrote:
> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> SLAB freelist. The list is randomized during initialization of a new set
> of pages. The order on different freelist sizes is pre-computed at boot
> for performance. Each kmem_cache has its own randomized freelist except
> early on boot where global lists are used. This security feature reduces
> the predictability of the kernel SLAB allocator against heap overflows
> rendering attacks much less stable.
> 
> For example this attack against SLUB (also applicable against SLAB)
> would be affected:
> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> 
> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> a controllable heap is opened to new attacks not yet publicly discussed.
> A kernel heap overflow can be transformed to multiple use-after-free.
> This feature makes this type of attack harder too.
> 
> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available in the boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API. We also generate a shift
> random number to shift pre-computed freelist for each new set of pages.
> 
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.
> 
> Performance results highlighted no major changes:
> 
> slab_test 1 run on boot. Difference only seen on the 2048 size test
> being the worse case scenario covered by freelist randomization. New
> slab pages are constantly being created on the 10000 allocations.
> Variance should be mainly due to getting new pages every few
> allocations.
> 
> Before:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
> 10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
> 10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
> 10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
> 10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
> 10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
> 10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
> 10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
> 10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
> 10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
> 10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
> 10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 121 cycles
> 10000 times kmalloc(16)/kfree -> 121 cycles
> 10000 times kmalloc(32)/kfree -> 121 cycles
> 10000 times kmalloc(64)/kfree -> 121 cycles
> 10000 times kmalloc(128)/kfree -> 121 cycles
> 10000 times kmalloc(256)/kfree -> 119 cycles
> 10000 times kmalloc(512)/kfree -> 119 cycles
> 10000 times kmalloc(1024)/kfree -> 119 cycles
> 10000 times kmalloc(2048)/kfree -> 119 cycles
> 10000 times kmalloc(4096)/kfree -> 121 cycles
> 10000 times kmalloc(8192)/kfree -> 119 cycles
> 10000 times kmalloc(16384)/kfree -> 119 cycles
> 
> After:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
> 10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
> 10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
> 10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
> 10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
> 10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
> 10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
> 10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
> 10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
> 10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
> 10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
> 10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 121 cycles
> 10000 times kmalloc(16)/kfree -> 121 cycles
> 10000 times kmalloc(32)/kfree -> 123 cycles
> 10000 times kmalloc(64)/kfree -> 142 cycles
> 10000 times kmalloc(128)/kfree -> 121 cycles
> 10000 times kmalloc(256)/kfree -> 119 cycles
> 10000 times kmalloc(512)/kfree -> 119 cycles
> 10000 times kmalloc(1024)/kfree -> 119 cycles
> 10000 times kmalloc(2048)/kfree -> 119 cycles
> 10000 times kmalloc(4096)/kfree -> 119 cycles
> 10000 times kmalloc(8192)/kfree -> 119 cycles
> 10000 times kmalloc(16384)/kfree -> 119 cycles
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160422
> ---
>  include/linux/slab_def.h |   4 +
>  init/Kconfig             |   9 ++
>  mm/slab.c                | 213 ++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 224 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 9edbbf3..182ec26 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -80,6 +80,10 @@ struct kmem_cache {
>  	struct kasan_cache kasan_info;
>  #endif
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +	void *random_seq;
> +#endif
> +
>  	struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 0c66640..73453d0 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,15 @@ config SLOB
>  
>  endchoice
>  
> +config FREELIST_RANDOM
> +	default n
> +	depends on SLAB
> +	bool "SLAB freelist randomization"
> +	help
> +	  Randomizes the freelist order used on creating new SLABs. This
> +	  security feature reduces the predictability of the kernel slab
> +	  allocator against heap overflows.
> +
>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slab.c b/mm/slab.c
> index b82ee6b..89eb617 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -116,6 +116,7 @@
>  #include	<linux/kmemcheck.h>
>  #include	<linux/memory.h>
>  #include	<linux/prefetch.h>
> +#include	<linux/log2.h>
>  
>  #include	<net/sock.h>
>  
> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
> +			size_t count)
> +{
> +	size_t i;
> +	unsigned int rand;
> +
> +	for (i = 0; i < count; i++)
> +		list[i] = i;
> +
> +	/* Fisher-Yates shuffle */
> +	for (i = count - 1; i > 0; i--) {
> +		rand = prandom_u32_state(state);
> +		rand %= (i + 1);
> +		swap(list[i], list[rand]);
> +	}
> +}
> +
> +/* Create a random sequence per cache */
> +static void cache_random_seq_create(struct kmem_cache *cachep)
> +{
> +	unsigned int seed, count = cachep->num;
> +	struct rnd_state state;
> +
> +	if (count < 2)
> +		return;
> +
> +	cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
> +	BUG_ON(cachep->random_seq == NULL);

Hello,

Please make function return int and propagate error to the cache creator.

> +
> +	/* Get best entropy at this stage */
> +	get_random_bytes_arch(&seed, sizeof(seed));
> +	prandom_seed_state(&state, seed);
> +
> +	freelist_randomize(&state, cachep->random_seq, count);
> +}
> +
> +/* Destroy the per-cache random freelist sequence */
> +static void cache_random_seq_destroy(struct kmem_cache *cachep)
> +{
> +	kfree(cachep->random_seq);
> +	cachep->random_seq = NULL;
> +}
> +
> +/*
> + * Global static list are used when pre-computed cache list are not yet
> + * available. Lists of different sizes are created to optimize performance on
> + * SLABS with different object counts.
> + */
> +static freelist_idx_t freelist_random_seq_2[2];
> +static freelist_idx_t freelist_random_seq_4[4];
> +static freelist_idx_t freelist_random_seq_8[8];
> +static freelist_idx_t freelist_random_seq_16[16];
> +static freelist_idx_t freelist_random_seq_32[32];
> +static freelist_idx_t freelist_random_seq_64[64];
> +static freelist_idx_t freelist_random_seq_128[128];
> +static freelist_idx_t freelist_random_seq_256[256];
> +const static struct m_list {
> +	size_t count;
> +	freelist_idx_t *list;
> +} freelist_random_seqs[] = {
> +	{ ARRAY_SIZE(freelist_random_seq_2), freelist_random_seq_2 },
> +	{ ARRAY_SIZE(freelist_random_seq_4), freelist_random_seq_4 },
> +	{ ARRAY_SIZE(freelist_random_seq_8), freelist_random_seq_8 },
> +	{ ARRAY_SIZE(freelist_random_seq_16), freelist_random_seq_16 },
> +	{ ARRAY_SIZE(freelist_random_seq_32), freelist_random_seq_32 },
> +	{ ARRAY_SIZE(freelist_random_seq_64), freelist_random_seq_64 },
> +	{ ARRAY_SIZE(freelist_random_seq_128), freelist_random_seq_128 },
> +	{ ARRAY_SIZE(freelist_random_seq_256), freelist_random_seq_256 },
> +};

I'd like to remove this global static list even if we can't get random
sequence in early boot-up process. In this stage that kernel is not
yet initialized, malicious user cannot do anything so random sequence
doesn't give any more security. After kernel initialization, we will
use per cache random sequence so problem suface is really small. If you
want to randomize freelist sequence even in this case, you can manually
permute the sequence with calling prandom_u32_state(). But, I don't
think it is necessary.

Thanks.

> +
> +/* Pre-compute the global pre-computed lists early at boot */
> +static void __init freelist_random_init(void)
> +{
> +	unsigned int seed;
> +	size_t i;
> +	struct rnd_state state;
> +
> +	/* Get best entropy available at this stage */
> +	get_random_bytes_arch(&seed, sizeof(seed));
> +	prandom_seed_state(&state, seed);
> +
> +	for (i = 0; i < ARRAY_SIZE(freelist_random_seqs); i++) {
> +		freelist_randomize(&state, freelist_random_seqs[i].list,
> +				freelist_random_seqs[i].count);
> +	}
> +}
> +#else
> +static inline void __init freelist_random_init(void) { }
> +static inline void cache_random_seq_create(struct kmem_cache *cachep) { }
> +static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
> +
>  /*
>   * Initialisation.  Called after the page allocator have been initialised and
>   * before smp_init().
> @@ -1256,6 +1351,8 @@ void __init kmem_cache_init(void)
>  	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>  		slab_max_order = SLAB_MAX_ORDER_HI;
>  
> +	freelist_random_init();
> +
>  	/* Bootstrap is tricky, because several objects are allocated
>  	 * from caches that do not exist yet:
>  	 * 1) initialize the kmem_cache cache: it contains the struct
> @@ -2337,6 +2434,8 @@ void __kmem_cache_release(struct kmem_cache *cachep)
>  	int i;
>  	struct kmem_cache_node *n;
>  
> +	cache_random_seq_destroy(cachep);
> +
>  	free_percpu(cachep->cpu_cache);
>  
>  	/* NUMA: free the node structures */
> @@ -2443,15 +2542,122 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>  #endif
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/* Hold information during a freelist initialization */
> +struct freelist_init_state {
> +	unsigned int padding;
> +	unsigned int pos;
> +	unsigned int count;
> +	unsigned int rand;
> +	struct m_list freelist_random_seq;
> +};
> +
> +/* Select the right pre-computed list and initialize state */
> +static void freelist_state_initialize(struct freelist_init_state *state,
> +				struct kmem_cache *cachep,
> +				unsigned int count)
> +{
> +	unsigned int idx;
> +	const unsigned int last_idx = ARRAY_SIZE(freelist_random_seqs) - 1;
> +
> +	memset(state, 0, sizeof(*state));
> +	state->count = count;
> +	state->pos = 0;
> +
> +	/* Use best entropy available to define a random shift */
> +	get_random_bytes_arch(&state->rand, sizeof(state->rand));
> +
> +	if (cachep->random_seq) {
> +		state->freelist_random_seq.list = cachep->random_seq;
> +		state->freelist_random_seq.count = count;
> +	} else {
> +		/* count is always >= 2 */
> +		idx = ilog2(count) - 1;
> +		if (idx >= last_idx)
> +			idx = last_idx;
> +		else if (roundup_pow_of_two(idx + 1) != count)
> +			idx++;
> +		state->freelist_random_seq = freelist_random_seqs[idx];
> +	}
> +}
> +
> +/* Get the next entry on the list depending on the target list size */
> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> +{
> +	freelist_idx_t ret;
> +
> +	if (state->pos == state->freelist_random_seq.count) {
> +		state->padding += state->pos;
> +		state->pos = 0;
> +	}
> +
> +	/* Randomize the entry using the random shift */
> +	ret = state->freelist_random_seq.list[state->pos++];
> +	ret = (ret + state->rand) % state->freelist_random_seq.count;
> +	return ret;
> +}
> +
> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> +{
> +	freelist_idx_t entry;
> +
> +	do {
> +		entry = get_next_entry(state);
> +	} while ((entry + state->padding) >= state->count);
> +
> +	return entry + state->padding;
> +}
> +
> +/*
> + * Shuffle the freelist initialization state based on pre-computed lists.
> + * return true if the list was successfully shuffled, false otherwise.
> + */
> +static bool shuffle_freelist(struct kmem_cache *cachep, struct page *page)
> +{
> +	unsigned int objfreelist, i, count = cachep->num;
> +	struct freelist_init_state state;
> +
> +	if (count < 2)
> +		return false;
> +
> +	objfreelist = 0;
> +	freelist_state_initialize(&state, cachep, count);
> +
> +	/* Take the first random entry as the objfreelist */
> +	if (OBJFREELIST_SLAB(cachep)) {
> +		objfreelist = next_random_slot(&state);
> +		page->freelist = index_to_obj(cachep, page, objfreelist) +
> +						obj_offset(cachep);
> +		count--;
> +	}
> +	for (i = 0; i < count; i++)
> +		set_free_obj(page, i, next_random_slot(&state));
> +
> +	if (OBJFREELIST_SLAB(cachep))
> +		set_free_obj(page, i, objfreelist);
> +	return true;
> +}
> +#else
> +static inline bool shuffle_freelist(struct kmem_cache *cachep,
> +				struct page *page)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
>  static void cache_init_objs(struct kmem_cache *cachep,
>  			    struct page *page)
>  {
>  	int i;
>  	void *objp;
> +	bool shuffled;
>  
>  	cache_init_objs_debug(cachep, page);
>  
> -	if (OBJFREELIST_SLAB(cachep)) {
> +	/* Try to randomize the freelist if enabled */
> +	shuffled = shuffle_freelist(cachep, page);
> +
> +	if (!shuffled && OBJFREELIST_SLAB(cachep)) {
>  		page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
>  						obj_offset(cachep);
>  	}
> @@ -2465,7 +2671,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
>  			kasan_poison_object_data(cachep, objp);
>  		}
>  
> -		set_free_obj(page, i, i);
> +		if (!shuffled)
> +			set_free_obj(page, i, i);
>  	}
>  }
>  
> @@ -3815,6 +4022,8 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
>  	int shared = 0;
>  	int batchcount = 0;
>  
> +	cache_random_seq_create(cachep);
> +
>  	if (!is_root_cache(cachep)) {
>  		struct kmem_cache *root = memcg_root_cache(cachep);
>  		limit = root->limit;
> -- 
> 2.8.0.rc3.226.g39d4020
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-26  0:40   ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2016-04-26  0:40 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Andrew Morton,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, Apr 25, 2016 at 01:39:23PM -0700, Thomas Garnier wrote:
> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> SLAB freelist. The list is randomized during initialization of a new set
> of pages. The order on different freelist sizes is pre-computed at boot
> for performance. Each kmem_cache has its own randomized freelist except
> early on boot where global lists are used. This security feature reduces
> the predictability of the kernel SLAB allocator against heap overflows
> rendering attacks much less stable.
> 
> For example this attack against SLUB (also applicable against SLAB)
> would be affected:
> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> 
> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> a controllable heap is opened to new attacks not yet publicly discussed.
> A kernel heap overflow can be transformed to multiple use-after-free.
> This feature makes this type of attack harder too.
> 
> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available in the boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API. We also generate a shift
> random number to shift pre-computed freelist for each new set of pages.
> 
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.
> 
> Performance results highlighted no major changes:
> 
> slab_test 1 run on boot. Difference only seen on the 2048 size test
> being the worse case scenario covered by freelist randomization. New
> slab pages are constantly being created on the 10000 allocations.
> Variance should be mainly due to getting new pages every few
> allocations.
> 
> Before:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
> 10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
> 10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
> 10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
> 10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
> 10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
> 10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
> 10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
> 10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
> 10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
> 10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
> 10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 121 cycles
> 10000 times kmalloc(16)/kfree -> 121 cycles
> 10000 times kmalloc(32)/kfree -> 121 cycles
> 10000 times kmalloc(64)/kfree -> 121 cycles
> 10000 times kmalloc(128)/kfree -> 121 cycles
> 10000 times kmalloc(256)/kfree -> 119 cycles
> 10000 times kmalloc(512)/kfree -> 119 cycles
> 10000 times kmalloc(1024)/kfree -> 119 cycles
> 10000 times kmalloc(2048)/kfree -> 119 cycles
> 10000 times kmalloc(4096)/kfree -> 121 cycles
> 10000 times kmalloc(8192)/kfree -> 119 cycles
> 10000 times kmalloc(16384)/kfree -> 119 cycles
> 
> After:
> 
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
> 10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
> 10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
> 10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
> 10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
> 10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
> 10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
> 10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
> 10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
> 10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
> 10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
> 10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 121 cycles
> 10000 times kmalloc(16)/kfree -> 121 cycles
> 10000 times kmalloc(32)/kfree -> 123 cycles
> 10000 times kmalloc(64)/kfree -> 142 cycles
> 10000 times kmalloc(128)/kfree -> 121 cycles
> 10000 times kmalloc(256)/kfree -> 119 cycles
> 10000 times kmalloc(512)/kfree -> 119 cycles
> 10000 times kmalloc(1024)/kfree -> 119 cycles
> 10000 times kmalloc(2048)/kfree -> 119 cycles
> 10000 times kmalloc(4096)/kfree -> 119 cycles
> 10000 times kmalloc(8192)/kfree -> 119 cycles
> 10000 times kmalloc(16384)/kfree -> 119 cycles
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160422
> ---
>  include/linux/slab_def.h |   4 +
>  init/Kconfig             |   9 ++
>  mm/slab.c                | 213 ++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 224 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 9edbbf3..182ec26 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -80,6 +80,10 @@ struct kmem_cache {
>  	struct kasan_cache kasan_info;
>  #endif
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +	void *random_seq;
> +#endif
> +
>  	struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 0c66640..73453d0 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,15 @@ config SLOB
>  
>  endchoice
>  
> +config FREELIST_RANDOM
> +	default n
> +	depends on SLAB
> +	bool "SLAB freelist randomization"
> +	help
> +	  Randomizes the freelist order used on creating new SLABs. This
> +	  security feature reduces the predictability of the kernel slab
> +	  allocator against heap overflows.
> +
>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slab.c b/mm/slab.c
> index b82ee6b..89eb617 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -116,6 +116,7 @@
>  #include	<linux/kmemcheck.h>
>  #include	<linux/memory.h>
>  #include	<linux/prefetch.h>
> +#include	<linux/log2.h>
>  
>  #include	<net/sock.h>
>  
> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
> +			size_t count)
> +{
> +	size_t i;
> +	unsigned int rand;
> +
> +	for (i = 0; i < count; i++)
> +		list[i] = i;
> +
> +	/* Fisher-Yates shuffle */
> +	for (i = count - 1; i > 0; i--) {
> +		rand = prandom_u32_state(state);
> +		rand %= (i + 1);
> +		swap(list[i], list[rand]);
> +	}
> +}
> +
> +/* Create a random sequence per cache */
> +static void cache_random_seq_create(struct kmem_cache *cachep)
> +{
> +	unsigned int seed, count = cachep->num;
> +	struct rnd_state state;
> +
> +	if (count < 2)
> +		return;
> +
> +	cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
> +	BUG_ON(cachep->random_seq == NULL);

Hello,

Please make function return int and propagate error to the cache creator.

> +
> +	/* Get best entropy at this stage */
> +	get_random_bytes_arch(&seed, sizeof(seed));
> +	prandom_seed_state(&state, seed);
> +
> +	freelist_randomize(&state, cachep->random_seq, count);
> +}
> +
> +/* Destroy the per-cache random freelist sequence */
> +static void cache_random_seq_destroy(struct kmem_cache *cachep)
> +{
> +	kfree(cachep->random_seq);
> +	cachep->random_seq = NULL;
> +}
> +
> +/*
> + * Global static list are used when pre-computed cache list are not yet
> + * available. Lists of different sizes are created to optimize performance on
> + * SLABS with different object counts.
> + */
> +static freelist_idx_t freelist_random_seq_2[2];
> +static freelist_idx_t freelist_random_seq_4[4];
> +static freelist_idx_t freelist_random_seq_8[8];
> +static freelist_idx_t freelist_random_seq_16[16];
> +static freelist_idx_t freelist_random_seq_32[32];
> +static freelist_idx_t freelist_random_seq_64[64];
> +static freelist_idx_t freelist_random_seq_128[128];
> +static freelist_idx_t freelist_random_seq_256[256];
> +const static struct m_list {
> +	size_t count;
> +	freelist_idx_t *list;
> +} freelist_random_seqs[] = {
> +	{ ARRAY_SIZE(freelist_random_seq_2), freelist_random_seq_2 },
> +	{ ARRAY_SIZE(freelist_random_seq_4), freelist_random_seq_4 },
> +	{ ARRAY_SIZE(freelist_random_seq_8), freelist_random_seq_8 },
> +	{ ARRAY_SIZE(freelist_random_seq_16), freelist_random_seq_16 },
> +	{ ARRAY_SIZE(freelist_random_seq_32), freelist_random_seq_32 },
> +	{ ARRAY_SIZE(freelist_random_seq_64), freelist_random_seq_64 },
> +	{ ARRAY_SIZE(freelist_random_seq_128), freelist_random_seq_128 },
> +	{ ARRAY_SIZE(freelist_random_seq_256), freelist_random_seq_256 },
> +};

I'd like to remove this global static list even if we can't get random
sequence in early boot-up process. In this stage that kernel is not
yet initialized, malicious user cannot do anything so random sequence
doesn't give any more security. After kernel initialization, we will
use per cache random sequence so problem suface is really small. If you
want to randomize freelist sequence even in this case, you can manually
permute the sequence with calling prandom_u32_state(). But, I don't
think it is necessary.

Thanks.

> +
> +/* Pre-compute the global pre-computed lists early at boot */
> +static void __init freelist_random_init(void)
> +{
> +	unsigned int seed;
> +	size_t i;
> +	struct rnd_state state;
> +
> +	/* Get best entropy available at this stage */
> +	get_random_bytes_arch(&seed, sizeof(seed));
> +	prandom_seed_state(&state, seed);
> +
> +	for (i = 0; i < ARRAY_SIZE(freelist_random_seqs); i++) {
> +		freelist_randomize(&state, freelist_random_seqs[i].list,
> +				freelist_random_seqs[i].count);
> +	}
> +}
> +#else
> +static inline void __init freelist_random_init(void) { }
> +static inline void cache_random_seq_create(struct kmem_cache *cachep) { }
> +static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
> +
>  /*
>   * Initialisation.  Called after the page allocator have been initialised and
>   * before smp_init().
> @@ -1256,6 +1351,8 @@ void __init kmem_cache_init(void)
>  	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>  		slab_max_order = SLAB_MAX_ORDER_HI;
>  
> +	freelist_random_init();
> +
>  	/* Bootstrap is tricky, because several objects are allocated
>  	 * from caches that do not exist yet:
>  	 * 1) initialize the kmem_cache cache: it contains the struct
> @@ -2337,6 +2434,8 @@ void __kmem_cache_release(struct kmem_cache *cachep)
>  	int i;
>  	struct kmem_cache_node *n;
>  
> +	cache_random_seq_destroy(cachep);
> +
>  	free_percpu(cachep->cpu_cache);
>  
>  	/* NUMA: free the node structures */
> @@ -2443,15 +2542,122 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>  #endif
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +/* Hold information during a freelist initialization */
> +struct freelist_init_state {
> +	unsigned int padding;
> +	unsigned int pos;
> +	unsigned int count;
> +	unsigned int rand;
> +	struct m_list freelist_random_seq;
> +};
> +
> +/* Select the right pre-computed list and initialize state */
> +static void freelist_state_initialize(struct freelist_init_state *state,
> +				struct kmem_cache *cachep,
> +				unsigned int count)
> +{
> +	unsigned int idx;
> +	const unsigned int last_idx = ARRAY_SIZE(freelist_random_seqs) - 1;
> +
> +	memset(state, 0, sizeof(*state));
> +	state->count = count;
> +	state->pos = 0;
> +
> +	/* Use best entropy available to define a random shift */
> +	get_random_bytes_arch(&state->rand, sizeof(state->rand));
> +
> +	if (cachep->random_seq) {
> +		state->freelist_random_seq.list = cachep->random_seq;
> +		state->freelist_random_seq.count = count;
> +	} else {
> +		/* count is always >= 2 */
> +		idx = ilog2(count) - 1;
> +		if (idx >= last_idx)
> +			idx = last_idx;
> +		else if (roundup_pow_of_two(idx + 1) != count)
> +			idx++;
> +		state->freelist_random_seq = freelist_random_seqs[idx];
> +	}
> +}
> +
> +/* Get the next entry on the list depending on the target list size */
> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
> +{
> +	freelist_idx_t ret;
> +
> +	if (state->pos == state->freelist_random_seq.count) {
> +		state->padding += state->pos;
> +		state->pos = 0;
> +	}
> +
> +	/* Randomize the entry using the random shift */
> +	ret = state->freelist_random_seq.list[state->pos++];
> +	ret = (ret + state->rand) % state->freelist_random_seq.count;
> +	return ret;
> +}
> +
> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
> +{
> +	freelist_idx_t entry;
> +
> +	do {
> +		entry = get_next_entry(state);
> +	} while ((entry + state->padding) >= state->count);
> +
> +	return entry + state->padding;
> +}
> +
> +/*
> + * Shuffle the freelist initialization state based on pre-computed lists.
> + * return true if the list was successfully shuffled, false otherwise.
> + */
> +static bool shuffle_freelist(struct kmem_cache *cachep, struct page *page)
> +{
> +	unsigned int objfreelist, i, count = cachep->num;
> +	struct freelist_init_state state;
> +
> +	if (count < 2)
> +		return false;
> +
> +	objfreelist = 0;
> +	freelist_state_initialize(&state, cachep, count);
> +
> +	/* Take the first random entry as the objfreelist */
> +	if (OBJFREELIST_SLAB(cachep)) {
> +		objfreelist = next_random_slot(&state);
> +		page->freelist = index_to_obj(cachep, page, objfreelist) +
> +						obj_offset(cachep);
> +		count--;
> +	}
> +	for (i = 0; i < count; i++)
> +		set_free_obj(page, i, next_random_slot(&state));
> +
> +	if (OBJFREELIST_SLAB(cachep))
> +		set_free_obj(page, i, objfreelist);
> +	return true;
> +}
> +#else
> +static inline bool shuffle_freelist(struct kmem_cache *cachep,
> +				struct page *page)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_FREELIST_RANDOM */
> +
>  static void cache_init_objs(struct kmem_cache *cachep,
>  			    struct page *page)
>  {
>  	int i;
>  	void *objp;
> +	bool shuffled;
>  
>  	cache_init_objs_debug(cachep, page);
>  
> -	if (OBJFREELIST_SLAB(cachep)) {
> +	/* Try to randomize the freelist if enabled */
> +	shuffled = shuffle_freelist(cachep, page);
> +
> +	if (!shuffled && OBJFREELIST_SLAB(cachep)) {
>  		page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
>  						obj_offset(cachep);
>  	}
> @@ -2465,7 +2671,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
>  			kasan_poison_object_data(cachep, objp);
>  		}
>  
> -		set_free_obj(page, i, i);
> +		if (!shuffled)
> +			set_free_obj(page, i, i);
>  	}
>  }
>  
> @@ -3815,6 +4022,8 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
>  	int shared = 0;
>  	int batchcount = 0;
>  
> +	cache_random_seq_create(cachep);
> +
>  	if (!is_root_cache(cachep)) {
>  		struct kmem_cache *root = memcg_root_cache(cachep);
>  		limit = root->limit;
> -- 
> 2.8.0.rc3.226.g39d4020
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-25 21:38         ` Andrew Morton
@ 2016-04-25 21:43           ` Thomas Garnier
  -1 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 21:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, Apr 25, 2016 at 2:38 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 25 Apr 2016 14:14:33 -0700 Thomas Garnier <thgarnie@google.com> wrote:
>
>> >>> +     /* Get best entropy at this stage */
>> >>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> >>
>> >> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
>> >>
>>
>> The arch_* functions will return 0 which will break the loop in
>> get_random_bytes_arch and make it uses extract_entropy (as does
>> get_random_bytes).
>> (cf http://lxr.free-electrons.com/source/drivers/char/random.c#L1335)
>>
>
> oop, sorry, I misread the code.
>
> (and the get_random_bytes_arch() comment "This function will use the
> architecture-specific hardware random number generator if it is
> available" is misleading, so there)

No problem, better double check it. I agree it is misleading.

Thomas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-25 21:43           ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 21:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, Apr 25, 2016 at 2:38 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 25 Apr 2016 14:14:33 -0700 Thomas Garnier <thgarnie@google.com> wrote:
>
>> >>> +     /* Get best entropy at this stage */
>> >>> +     get_random_bytes_arch(&seed, sizeof(seed));
>> >>
>> >> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
>> >>
>>
>> The arch_* functions will return 0 which will break the loop in
>> get_random_bytes_arch and make it uses extract_entropy (as does
>> get_random_bytes).
>> (cf http://lxr.free-electrons.com/source/drivers/char/random.c#L1335)
>>
>
> oop, sorry, I misread the code.
>
> (and the get_random_bytes_arch() comment "This function will use the
> architecture-specific hardware random number generator if it is
> available" is misleading, so there)

No problem, better double check it. I agree it is misleading.

Thomas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-25 21:14       ` Thomas Garnier
@ 2016-04-25 21:38         ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-04-25 21:38 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, 25 Apr 2016 14:14:33 -0700 Thomas Garnier <thgarnie@google.com> wrote:

> >>> +     /* Get best entropy at this stage */
> >>> +     get_random_bytes_arch(&seed, sizeof(seed));
> >>
> >> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
> >>
> 
> The arch_* functions will return 0 which will break the loop in
> get_random_bytes_arch and make it uses extract_entropy (as does
> get_random_bytes).
> (cf http://lxr.free-electrons.com/source/drivers/char/random.c#L1335)
> 

oop, sorry, I misread the code.

(and the get_random_bytes_arch() comment "This function will use the
architecture-specific hardware random number generator if it is
available" is misleading, so there)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-25 21:38         ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-04-25 21:38 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, 25 Apr 2016 14:14:33 -0700 Thomas Garnier <thgarnie@google.com> wrote:

> >>> +     /* Get best entropy at this stage */
> >>> +     get_random_bytes_arch(&seed, sizeof(seed));
> >>
> >> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
> >>
> 
> The arch_* functions will return 0 which will break the loop in
> get_random_bytes_arch and make it uses extract_entropy (as does
> get_random_bytes).
> (cf http://lxr.free-electrons.com/source/drivers/char/random.c#L1335)
> 

oop, sorry, I misread the code.

(and the get_random_bytes_arch() comment "This function will use the
architecture-specific hardware random number generator if it is
available" is misleading, so there)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-25 21:13     ` Thomas Garnier
@ 2016-04-25 21:14       ` Thomas Garnier
  -1 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 21:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, Apr 25, 2016 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Mon, Apr 25, 2016 at 2:10 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 25 Apr 2016 13:39:23 -0700 Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>>> SLAB freelist. The list is randomized during initialization of a new set
>>> of pages. The order on different freelist sizes is pre-computed at boot
>>> for performance. Each kmem_cache has its own randomized freelist except
>>> early on boot where global lists are used. This security feature reduces
>>> the predictability of the kernel SLAB allocator against heap overflows
>>> rendering attacks much less stable.
>>>
>>> For example this attack against SLUB (also applicable against SLAB)
>>> would be affected:
>>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>>
>>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>>> a controllable heap is opened to new attacks not yet publicly discussed.
>>> A kernel heap overflow can be transformed to multiple use-after-free.
>>> This feature makes this type of attack harder too.
>>>
>>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>>> entropy is available in the boot stage. In the worse case this function
>>> will fallback to the get_random_bytes sub API. We also generate a shift
>>> random number to shift pre-computed freelist for each new set of pages.
>>>
>>> The config option name is not specific to the SLAB as this approach will
>>> be extended to other allocators like SLUB.
>>>
>>> Performance results highlighted no major changes:
>>>
>>> slab_test 1 run on boot. Difference only seen on the 2048 size test
>>> being the worse case scenario covered by freelist randomization. New
>>> slab pages are constantly being created on the 10000 allocations.
>>> Variance should be mainly due to getting new pages every few
>>> allocations.
>>>
>>> ...
>>>
>>> --- a/include/linux/slab_def.h
>>> +++ b/include/linux/slab_def.h
>>> @@ -80,6 +80,10 @@ struct kmem_cache {
>>>       struct kasan_cache kasan_info;
>>>  #endif
>>>
>>> +#ifdef CONFIG_FREELIST_RANDOM
>>
>> CONFIG_FREELIST_RANDOM bugs me a bit - "freelist" is so vague.
>> CONFIG_SLAB_FREELIST_RANDOM would be better.  I mean, what Kconfig
>> identifier could be used for implementing randomisation in
>> slub/slob/etc once CONFIG_FREELIST_RANDOM is used up?
>>
>>> +     void *random_seq;
>>> +#endif
>>> +
>>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>>  };
>>>
>>> diff --git a/init/Kconfig b/init/Kconfig
>>> index 0c66640..73453d0 100644
>>> --- a/init/Kconfig
>>> +++ b/init/Kconfig
>>> @@ -1742,6 +1742,15 @@ config SLOB
>>>
>>>  endchoice
>>>
>>> +config FREELIST_RANDOM
>>> +     default n
>>> +     depends on SLAB
>>> +     bool "SLAB freelist randomization"
>>> +     help
>>> +       Randomizes the freelist order used on creating new SLABs. This
>>> +       security feature reduces the predictability of the kernel slab
>>> +       allocator against heap overflows.
>>> +
>>>  config SLUB_CPU_PARTIAL
>>>       default y
>>>       depends on SLUB && SMP
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index b82ee6b..89eb617 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -116,6 +116,7 @@
>>>  #include     <linux/kmemcheck.h>
>>>  #include     <linux/memory.h>
>>>  #include     <linux/prefetch.h>
>>> +#include     <linux/log2.h>
>>>
>>>  #include     <net/sock.h>
>>>
>>> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>>       }
>>>  }
>>>
>>> +#ifdef CONFIG_FREELIST_RANDOM
>>> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
>>> +                     size_t count)
>>> +{
>>> +     size_t i;
>>> +     unsigned int rand;
>>> +
>>> +     for (i = 0; i < count; i++)
>>> +             list[i] = i;
>>> +
>>> +     /* Fisher-Yates shuffle */
>>> +     for (i = count - 1; i > 0; i--) {
>>> +             rand = prandom_u32_state(state);
>>> +             rand %= (i + 1);
>>> +             swap(list[i], list[rand]);
>>> +     }
>>> +}
>>> +
>>> +/* Create a random sequence per cache */
>>> +static void cache_random_seq_create(struct kmem_cache *cachep)
>>> +{
>>> +     unsigned int seed, count = cachep->num;
>>> +     struct rnd_state state;
>>> +
>>> +     if (count < 2)
>>> +             return;
>>> +
>>> +     cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
>>> +     BUG_ON(cachep->random_seq == NULL);
>
> On your previous email. (trying to stay in one thread). I added a
> comment on this
> version to explain that we need best entropy at this boot stage.
>
>>
>> Yikes, that's a bit rude.  Is there no way of recovering from this?  If
>> the answer to that is really really "no" then I guess we should put a
>> __GFP_NOFAIL in there.  Add a comment explaining why (apologetically -
>> __GFP_NOFAIL is unpopular!) and remove the now-unneeded BUG_ON.
>>
>>
>
> We can always use the static. I will update on next iteration to remove the
> BUG_ON.
>
>>> +     /* Get best entropy at this stage */
>>> +     get_random_bytes_arch(&seed, sizeof(seed));
>>
>> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
>>

The arch_* functions will return 0 which will break the loop in
get_random_bytes_arch and make it uses extract_entropy (as does
get_random_bytes).
(cf http://lxr.free-electrons.com/source/drivers/char/random.c#L1335)

I might be missing something.

>
>
>>
>>> +     prandom_seed_state(&state, seed);
>>> +
>>> +     freelist_randomize(&state, cachep->random_seq, count);
>>> +}
>>> +
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-25 21:14       ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 21:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, Apr 25, 2016 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Mon, Apr 25, 2016 at 2:10 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 25 Apr 2016 13:39:23 -0700 Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>>> SLAB freelist. The list is randomized during initialization of a new set
>>> of pages. The order on different freelist sizes is pre-computed at boot
>>> for performance. Each kmem_cache has its own randomized freelist except
>>> early on boot where global lists are used. This security feature reduces
>>> the predictability of the kernel SLAB allocator against heap overflows
>>> rendering attacks much less stable.
>>>
>>> For example this attack against SLUB (also applicable against SLAB)
>>> would be affected:
>>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>>
>>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>>> a controllable heap is opened to new attacks not yet publicly discussed.
>>> A kernel heap overflow can be transformed to multiple use-after-free.
>>> This feature makes this type of attack harder too.
>>>
>>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>>> entropy is available in the boot stage. In the worse case this function
>>> will fallback to the get_random_bytes sub API. We also generate a shift
>>> random number to shift pre-computed freelist for each new set of pages.
>>>
>>> The config option name is not specific to the SLAB as this approach will
>>> be extended to other allocators like SLUB.
>>>
>>> Performance results highlighted no major changes:
>>>
>>> slab_test 1 run on boot. Difference only seen on the 2048 size test
>>> being the worse case scenario covered by freelist randomization. New
>>> slab pages are constantly being created on the 10000 allocations.
>>> Variance should be mainly due to getting new pages every few
>>> allocations.
>>>
>>> ...
>>>
>>> --- a/include/linux/slab_def.h
>>> +++ b/include/linux/slab_def.h
>>> @@ -80,6 +80,10 @@ struct kmem_cache {
>>>       struct kasan_cache kasan_info;
>>>  #endif
>>>
>>> +#ifdef CONFIG_FREELIST_RANDOM
>>
>> CONFIG_FREELIST_RANDOM bugs me a bit - "freelist" is so vague.
>> CONFIG_SLAB_FREELIST_RANDOM would be better.  I mean, what Kconfig
>> identifier could be used for implementing randomisation in
>> slub/slob/etc once CONFIG_FREELIST_RANDOM is used up?
>>
>>> +     void *random_seq;
>>> +#endif
>>> +
>>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>>  };
>>>
>>> diff --git a/init/Kconfig b/init/Kconfig
>>> index 0c66640..73453d0 100644
>>> --- a/init/Kconfig
>>> +++ b/init/Kconfig
>>> @@ -1742,6 +1742,15 @@ config SLOB
>>>
>>>  endchoice
>>>
>>> +config FREELIST_RANDOM
>>> +     default n
>>> +     depends on SLAB
>>> +     bool "SLAB freelist randomization"
>>> +     help
>>> +       Randomizes the freelist order used on creating new SLABs. This
>>> +       security feature reduces the predictability of the kernel slab
>>> +       allocator against heap overflows.
>>> +
>>>  config SLUB_CPU_PARTIAL
>>>       default y
>>>       depends on SLUB && SMP
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index b82ee6b..89eb617 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -116,6 +116,7 @@
>>>  #include     <linux/kmemcheck.h>
>>>  #include     <linux/memory.h>
>>>  #include     <linux/prefetch.h>
>>> +#include     <linux/log2.h>
>>>
>>>  #include     <net/sock.h>
>>>
>>> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>>       }
>>>  }
>>>
>>> +#ifdef CONFIG_FREELIST_RANDOM
>>> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
>>> +                     size_t count)
>>> +{
>>> +     size_t i;
>>> +     unsigned int rand;
>>> +
>>> +     for (i = 0; i < count; i++)
>>> +             list[i] = i;
>>> +
>>> +     /* Fisher-Yates shuffle */
>>> +     for (i = count - 1; i > 0; i--) {
>>> +             rand = prandom_u32_state(state);
>>> +             rand %= (i + 1);
>>> +             swap(list[i], list[rand]);
>>> +     }
>>> +}
>>> +
>>> +/* Create a random sequence per cache */
>>> +static void cache_random_seq_create(struct kmem_cache *cachep)
>>> +{
>>> +     unsigned int seed, count = cachep->num;
>>> +     struct rnd_state state;
>>> +
>>> +     if (count < 2)
>>> +             return;
>>> +
>>> +     cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
>>> +     BUG_ON(cachep->random_seq == NULL);
>
> On your previous email. (trying to stay in one thread). I added a
> comment on this
> version to explain that we need best entropy at this boot stage.
>
>>
>> Yikes, that's a bit rude.  Is there no way of recovering from this?  If
>> the answer to that is really really "no" then I guess we should put a
>> __GFP_NOFAIL in there.  Add a comment explaining why (apologetically -
>> __GFP_NOFAIL is unpopular!) and remove the now-unneeded BUG_ON.
>>
>>
>
> We can always use the static. I will update on next iteration to remove the
> BUG_ON.
>
>>> +     /* Get best entropy at this stage */
>>> +     get_random_bytes_arch(&seed, sizeof(seed));
>>
>> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
>>

The arch_* functions will return 0 which will break the loop in
get_random_bytes_arch and make it uses extract_entropy (as does
get_random_bytes).
(cf http://lxr.free-electrons.com/source/drivers/char/random.c#L1335)

I might be missing something.

>
>
>>
>>> +     prandom_seed_state(&state, seed);
>>> +
>>> +     freelist_randomize(&state, cachep->random_seq, count);
>>> +}
>>> +
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-25 21:10   ` Andrew Morton
@ 2016-04-25 21:13     ` Thomas Garnier
  -1 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 21:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, Apr 25, 2016 at 2:10 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 25 Apr 2016 13:39:23 -0700 Thomas Garnier <thgarnie@google.com> wrote:
>
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. Each kmem_cache has its own randomized freelist except
>> early on boot where global lists are used. This security feature reduces
>> the predictability of the kernel SLAB allocator against heap overflows
>> rendering attacks much less stable.
>>
>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available in the boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API. We also generate a shift
>> random number to shift pre-computed freelist for each new set of pages.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>>
>> Performance results highlighted no major changes:
>>
>> slab_test 1 run on boot. Difference only seen on the 2048 size test
>> being the worse case scenario covered by freelist randomization. New
>> slab pages are constantly being created on the 10000 allocations.
>> Variance should be mainly due to getting new pages every few
>> allocations.
>>
>> ...
>>
>> --- a/include/linux/slab_def.h
>> +++ b/include/linux/slab_def.h
>> @@ -80,6 +80,10 @@ struct kmem_cache {
>>       struct kasan_cache kasan_info;
>>  #endif
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>
> CONFIG_FREELIST_RANDOM bugs me a bit - "freelist" is so vague.
> CONFIG_SLAB_FREELIST_RANDOM would be better.  I mean, what Kconfig
> identifier could be used for implementing randomisation in
> slub/slob/etc once CONFIG_FREELIST_RANDOM is used up?
>
>> +     void *random_seq;
>> +#endif
>> +
>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0c66640..73453d0 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>>  endchoice
>>
>> +config FREELIST_RANDOM
>> +     default n
>> +     depends on SLAB
>> +     bool "SLAB freelist randomization"
>> +     help
>> +       Randomizes the freelist order used on creating new SLABs. This
>> +       security feature reduces the predictability of the kernel slab
>> +       allocator against heap overflows.
>> +
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b82ee6b..89eb617 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>>  #include     <linux/kmemcheck.h>
>>  #include     <linux/memory.h>
>>  #include     <linux/prefetch.h>
>> +#include     <linux/log2.h>
>>
>>  #include     <net/sock.h>
>>
>> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>       }
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
>> +                     size_t count)
>> +{
>> +     size_t i;
>> +     unsigned int rand;
>> +
>> +     for (i = 0; i < count; i++)
>> +             list[i] = i;
>> +
>> +     /* Fisher-Yates shuffle */
>> +     for (i = count - 1; i > 0; i--) {
>> +             rand = prandom_u32_state(state);
>> +             rand %= (i + 1);
>> +             swap(list[i], list[rand]);
>> +     }
>> +}
>> +
>> +/* Create a random sequence per cache */
>> +static void cache_random_seq_create(struct kmem_cache *cachep)
>> +{
>> +     unsigned int seed, count = cachep->num;
>> +     struct rnd_state state;
>> +
>> +     if (count < 2)
>> +             return;
>> +
>> +     cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
>> +     BUG_ON(cachep->random_seq == NULL);

On your previous email. (trying to stay in one thread). I added a
comment on this
version to explain that we need best entropy at this boot stage.

>
> Yikes, that's a bit rude.  Is there no way of recovering from this?  If
> the answer to that is really really "no" then I guess we should put a
> __GFP_NOFAIL in there.  Add a comment explaining why (apologetically -
> __GFP_NOFAIL is unpopular!) and remove the now-unneeded BUG_ON.
>
>

We can always use the static. I will update on next iteration to remove the
BUG_ON.

>> +     /* Get best entropy at this stage */
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>
> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
>


>
>> +     prandom_seed_state(&state, seed);
>> +
>> +     freelist_randomize(&state, cachep->random_seq, count);
>> +}
>> +
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-25 21:13     ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 21:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, Greg Thelen, Laura Abbott, kernel-hardening, LKML,
	Linux-MM

On Mon, Apr 25, 2016 at 2:10 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 25 Apr 2016 13:39:23 -0700 Thomas Garnier <thgarnie@google.com> wrote:
>
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. Each kmem_cache has its own randomized freelist except
>> early on boot where global lists are used. This security feature reduces
>> the predictability of the kernel SLAB allocator against heap overflows
>> rendering attacks much less stable.
>>
>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available in the boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API. We also generate a shift
>> random number to shift pre-computed freelist for each new set of pages.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>>
>> Performance results highlighted no major changes:
>>
>> slab_test 1 run on boot. Difference only seen on the 2048 size test
>> being the worse case scenario covered by freelist randomization. New
>> slab pages are constantly being created on the 10000 allocations.
>> Variance should be mainly due to getting new pages every few
>> allocations.
>>
>> ...
>>
>> --- a/include/linux/slab_def.h
>> +++ b/include/linux/slab_def.h
>> @@ -80,6 +80,10 @@ struct kmem_cache {
>>       struct kasan_cache kasan_info;
>>  #endif
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>
> CONFIG_FREELIST_RANDOM bugs me a bit - "freelist" is so vague.
> CONFIG_SLAB_FREELIST_RANDOM would be better.  I mean, what Kconfig
> identifier could be used for implementing randomisation in
> slub/slob/etc once CONFIG_FREELIST_RANDOM is used up?
>
>> +     void *random_seq;
>> +#endif
>> +
>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0c66640..73453d0 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>>  endchoice
>>
>> +config FREELIST_RANDOM
>> +     default n
>> +     depends on SLAB
>> +     bool "SLAB freelist randomization"
>> +     help
>> +       Randomizes the freelist order used on creating new SLABs. This
>> +       security feature reduces the predictability of the kernel slab
>> +       allocator against heap overflows.
>> +
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b82ee6b..89eb617 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>>  #include     <linux/kmemcheck.h>
>>  #include     <linux/memory.h>
>>  #include     <linux/prefetch.h>
>> +#include     <linux/log2.h>
>>
>>  #include     <net/sock.h>
>>
>> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>>       }
>>  }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
>> +                     size_t count)
>> +{
>> +     size_t i;
>> +     unsigned int rand;
>> +
>> +     for (i = 0; i < count; i++)
>> +             list[i] = i;
>> +
>> +     /* Fisher-Yates shuffle */
>> +     for (i = count - 1; i > 0; i--) {
>> +             rand = prandom_u32_state(state);
>> +             rand %= (i + 1);
>> +             swap(list[i], list[rand]);
>> +     }
>> +}
>> +
>> +/* Create a random sequence per cache */
>> +static void cache_random_seq_create(struct kmem_cache *cachep)
>> +{
>> +     unsigned int seed, count = cachep->num;
>> +     struct rnd_state state;
>> +
>> +     if (count < 2)
>> +             return;
>> +
>> +     cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
>> +     BUG_ON(cachep->random_seq == NULL);

On your previous email. (trying to stay in one thread). I added a
comment on this
version to explain that we need best entropy at this boot stage.

>
> Yikes, that's a bit rude.  Is there no way of recovering from this?  If
> the answer to that is really really "no" then I guess we should put a
> __GFP_NOFAIL in there.  Add a comment explaining why (apologetically -
> __GFP_NOFAIL is unpopular!) and remove the now-unneeded BUG_ON.
>
>

We can always use the static. I will update on next iteration to remove the
BUG_ON.

>> +     /* Get best entropy at this stage */
>> +     get_random_bytes_arch(&seed, sizeof(seed));
>
> See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?
>


>
>> +     prandom_seed_state(&state, seed);
>> +
>> +     freelist_randomize(&state, cachep->random_seq, count);
>> +}
>> +
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
  2016-04-25 20:39 ` Thomas Garnier
@ 2016-04-25 21:10   ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-04-25 21:10 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, 25 Apr 2016 13:39:23 -0700 Thomas Garnier <thgarnie@google.com> wrote:

> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> SLAB freelist. The list is randomized during initialization of a new set
> of pages. The order on different freelist sizes is pre-computed at boot
> for performance. Each kmem_cache has its own randomized freelist except
> early on boot where global lists are used. This security feature reduces
> the predictability of the kernel SLAB allocator against heap overflows
> rendering attacks much less stable.
> 
> For example this attack against SLUB (also applicable against SLAB)
> would be affected:
> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> 
> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> a controllable heap is opened to new attacks not yet publicly discussed.
> A kernel heap overflow can be transformed to multiple use-after-free.
> This feature makes this type of attack harder too.
> 
> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available in the boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API. We also generate a shift
> random number to shift pre-computed freelist for each new set of pages.
> 
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.
> 
> Performance results highlighted no major changes:
> 
> slab_test 1 run on boot. Difference only seen on the 2048 size test
> being the worse case scenario covered by freelist randomization. New
> slab pages are constantly being created on the 10000 allocations.
> Variance should be mainly due to getting new pages every few
> allocations.
>
> ...
>
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -80,6 +80,10 @@ struct kmem_cache {
>  	struct kasan_cache kasan_info;
>  #endif
>  
> +#ifdef CONFIG_FREELIST_RANDOM

CONFIG_FREELIST_RANDOM bugs me a bit - "freelist" is so vague. 
CONFIG_SLAB_FREELIST_RANDOM would be better.  I mean, what Kconfig
identifier could be used for implementing randomisation in
slub/slob/etc once CONFIG_FREELIST_RANDOM is used up?

> +	void *random_seq;
> +#endif
> +
>  	struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 0c66640..73453d0 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,15 @@ config SLOB
>  
>  endchoice
>  
> +config FREELIST_RANDOM
> +	default n
> +	depends on SLAB
> +	bool "SLAB freelist randomization"
> +	help
> +	  Randomizes the freelist order used on creating new SLABs. This
> +	  security feature reduces the predictability of the kernel slab
> +	  allocator against heap overflows.
> +
>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slab.c b/mm/slab.c
> index b82ee6b..89eb617 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -116,6 +116,7 @@
>  #include	<linux/kmemcheck.h>
>  #include	<linux/memory.h>
>  #include	<linux/prefetch.h>
> +#include	<linux/log2.h>
>  
>  #include	<net/sock.h>
>  
> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
> +			size_t count)
> +{
> +	size_t i;
> +	unsigned int rand;
> +
> +	for (i = 0; i < count; i++)
> +		list[i] = i;
> +
> +	/* Fisher-Yates shuffle */
> +	for (i = count - 1; i > 0; i--) {
> +		rand = prandom_u32_state(state);
> +		rand %= (i + 1);
> +		swap(list[i], list[rand]);
> +	}
> +}
> +
> +/* Create a random sequence per cache */
> +static void cache_random_seq_create(struct kmem_cache *cachep)
> +{
> +	unsigned int seed, count = cachep->num;
> +	struct rnd_state state;
> +
> +	if (count < 2)
> +		return;
> +
> +	cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
> +	BUG_ON(cachep->random_seq == NULL);

Yikes, that's a bit rude.  Is there no way of recovering from this?  If
the answer to that is really really "no" then I guess we should put a
__GFP_NOFAIL in there.  Add a comment explaining why (apologetically -
__GFP_NOFAIL is unpopular!) and remove the now-unneeded BUG_ON.


> +	/* Get best entropy at this stage */
> +	get_random_bytes_arch(&seed, sizeof(seed));

See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?


> +	prandom_seed_state(&state, seed);
> +
> +	freelist_randomize(&state, cachep->random_seq, count);
> +}
> +

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-25 21:10   ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-04-25 21:10 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Kees Cook, gthelen, labbott, kernel-hardening, linux-kernel,
	linux-mm

On Mon, 25 Apr 2016 13:39:23 -0700 Thomas Garnier <thgarnie@google.com> wrote:

> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
> SLAB freelist. The list is randomized during initialization of a new set
> of pages. The order on different freelist sizes is pre-computed at boot
> for performance. Each kmem_cache has its own randomized freelist except
> early on boot where global lists are used. This security feature reduces
> the predictability of the kernel SLAB allocator against heap overflows
> rendering attacks much less stable.
> 
> For example this attack against SLUB (also applicable against SLAB)
> would be affected:
> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
> 
> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
> a controllable heap is opened to new attacks not yet publicly discussed.
> A kernel heap overflow can be transformed to multiple use-after-free.
> This feature makes this type of attack harder too.
> 
> To generate entropy, we use get_random_bytes_arch because 0 bits of
> entropy is available in the boot stage. In the worse case this function
> will fallback to the get_random_bytes sub API. We also generate a shift
> random number to shift pre-computed freelist for each new set of pages.
> 
> The config option name is not specific to the SLAB as this approach will
> be extended to other allocators like SLUB.
> 
> Performance results highlighted no major changes:
> 
> slab_test 1 run on boot. Difference only seen on the 2048 size test
> being the worse case scenario covered by freelist randomization. New
> slab pages are constantly being created on the 10000 allocations.
> Variance should be mainly due to getting new pages every few
> allocations.
>
> ...
>
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -80,6 +80,10 @@ struct kmem_cache {
>  	struct kasan_cache kasan_info;
>  #endif
>  
> +#ifdef CONFIG_FREELIST_RANDOM

CONFIG_FREELIST_RANDOM bugs me a bit - "freelist" is so vague. 
CONFIG_SLAB_FREELIST_RANDOM would be better.  I mean, what Kconfig
identifier could be used for implementing randomisation in
slub/slob/etc once CONFIG_FREELIST_RANDOM is used up?

> +	void *random_seq;
> +#endif
> +
>  	struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 0c66640..73453d0 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,15 @@ config SLOB
>  
>  endchoice
>  
> +config FREELIST_RANDOM
> +	default n
> +	depends on SLAB
> +	bool "SLAB freelist randomization"
> +	help
> +	  Randomizes the freelist order used on creating new SLABs. This
> +	  security feature reduces the predictability of the kernel slab
> +	  allocator against heap overflows.
> +
>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slab.c b/mm/slab.c
> index b82ee6b..89eb617 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -116,6 +116,7 @@
>  #include	<linux/kmemcheck.h>
>  #include	<linux/memory.h>
>  #include	<linux/prefetch.h>
> +#include	<linux/log2.h>
>  
>  #include	<net/sock.h>
>  
> @@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +#ifdef CONFIG_FREELIST_RANDOM
> +static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
> +			size_t count)
> +{
> +	size_t i;
> +	unsigned int rand;
> +
> +	for (i = 0; i < count; i++)
> +		list[i] = i;
> +
> +	/* Fisher-Yates shuffle */
> +	for (i = count - 1; i > 0; i--) {
> +		rand = prandom_u32_state(state);
> +		rand %= (i + 1);
> +		swap(list[i], list[rand]);
> +	}
> +}
> +
> +/* Create a random sequence per cache */
> +static void cache_random_seq_create(struct kmem_cache *cachep)
> +{
> +	unsigned int seed, count = cachep->num;
> +	struct rnd_state state;
> +
> +	if (count < 2)
> +		return;
> +
> +	cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
> +	BUG_ON(cachep->random_seq == NULL);

Yikes, that's a bit rude.  Is there no way of recovering from this?  If
the answer to that is really really "no" then I guess we should put a
__GFP_NOFAIL in there.  Add a comment explaining why (apologetically -
__GFP_NOFAIL is unpopular!) and remove the now-unneeded BUG_ON.


> +	/* Get best entropy at this stage */
> +	get_random_bytes_arch(&seed, sizeof(seed));

See concerns in other email - isn't this a no-op if CONFIG_ARCH_RANDOM=n?


> +	prandom_seed_state(&state, seed);
> +
> +	freelist_randomize(&state, cachep->random_seq, count);
> +}
> +

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-25 20:39 ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 20:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kees Cook
  Cc: gthelen, labbott, kernel-hardening, linux-kernel, linux-mm,
	Thomas Garnier

Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
SLAB freelist. The list is randomized during initialization of a new set
of pages. The order on different freelist sizes is pre-computed at boot
for performance. Each kmem_cache has its own randomized freelist except
early on boot where global lists are used. This security feature reduces
the predictability of the kernel SLAB allocator against heap overflows
rendering attacks much less stable.

For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

Also, since v4.6 the freelist was moved at the end of the SLAB. It means
a controllable heap is opened to new attacks not yet publicly discussed.
A kernel heap overflow can be transformed to multiple use-after-free.
This feature makes this type of attack harder too.

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available in the boot stage. In the worse case this function
will fallback to the get_random_bytes sub API. We also generate a shift
random number to shift pre-computed freelist for each new set of pages.

The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.

Performance results highlighted no major changes:

slab_test 1 run on boot. Difference only seen on the 2048 size test
being the worse case scenario covered by freelist randomization. New
slab pages are constantly being created on the 10000 allocations.
Variance should be mainly due to getting new pages every few
allocations.

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 121 cycles
10000 times kmalloc(16)/kfree -> 121 cycles
10000 times kmalloc(32)/kfree -> 121 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 121 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 119 cycles
10000 times kmalloc(1024)/kfree -> 119 cycles
10000 times kmalloc(2048)/kfree -> 119 cycles
10000 times kmalloc(4096)/kfree -> 121 cycles
10000 times kmalloc(8192)/kfree -> 119 cycles
10000 times kmalloc(16384)/kfree -> 119 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 121 cycles
10000 times kmalloc(16)/kfree -> 121 cycles
10000 times kmalloc(32)/kfree -> 123 cycles
10000 times kmalloc(64)/kfree -> 142 cycles
10000 times kmalloc(128)/kfree -> 121 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 119 cycles
10000 times kmalloc(1024)/kfree -> 119 cycles
10000 times kmalloc(2048)/kfree -> 119 cycles
10000 times kmalloc(4096)/kfree -> 119 cycles
10000 times kmalloc(8192)/kfree -> 119 cycles
10000 times kmalloc(16384)/kfree -> 119 cycles

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160422
---
 include/linux/slab_def.h |   4 +
 init/Kconfig             |   9 ++
 mm/slab.c                | 213 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 9edbbf3..182ec26 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -80,6 +80,10 @@ struct kmem_cache {
 	struct kasan_cache kasan_info;
 #endif
 
+#ifdef CONFIG_FREELIST_RANDOM
+	void *random_seq;
+#endif
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/init/Kconfig b/init/Kconfig
index 0c66640..73453d0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,15 @@ config SLOB
 
 endchoice
 
+config FREELIST_RANDOM
+	default n
+	depends on SLAB
+	bool "SLAB freelist randomization"
+	help
+	  Randomizes the freelist order used on creating new SLABs. This
+	  security feature reduces the predictability of the kernel slab
+	  allocator against heap overflows.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slab.c b/mm/slab.c
index b82ee6b..89eb617 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -116,6 +116,7 @@
 #include	<linux/kmemcheck.h>
 #include	<linux/memory.h>
 #include	<linux/prefetch.h>
+#include	<linux/log2.h>
 
 #include	<net/sock.h>
 
@@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
+			size_t count)
+{
+	size_t i;
+	unsigned int rand;
+
+	for (i = 0; i < count; i++)
+		list[i] = i;
+
+	/* Fisher-Yates shuffle */
+	for (i = count - 1; i > 0; i--) {
+		rand = prandom_u32_state(state);
+		rand %= (i + 1);
+		swap(list[i], list[rand]);
+	}
+}
+
+/* Create a random sequence per cache */
+static void cache_random_seq_create(struct kmem_cache *cachep)
+{
+	unsigned int seed, count = cachep->num;
+	struct rnd_state state;
+
+	if (count < 2)
+		return;
+
+	cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
+	BUG_ON(cachep->random_seq == NULL);
+
+	/* Get best entropy at this stage */
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&state, seed);
+
+	freelist_randomize(&state, cachep->random_seq, count);
+}
+
+/* Destroy the per-cache random freelist sequence */
+static void cache_random_seq_destroy(struct kmem_cache *cachep)
+{
+	kfree(cachep->random_seq);
+	cachep->random_seq = NULL;
+}
+
+/*
+ * Global static list are used when pre-computed cache list are not yet
+ * available. Lists of different sizes are created to optimize performance on
+ * SLABS with different object counts.
+ */
+static freelist_idx_t freelist_random_seq_2[2];
+static freelist_idx_t freelist_random_seq_4[4];
+static freelist_idx_t freelist_random_seq_8[8];
+static freelist_idx_t freelist_random_seq_16[16];
+static freelist_idx_t freelist_random_seq_32[32];
+static freelist_idx_t freelist_random_seq_64[64];
+static freelist_idx_t freelist_random_seq_128[128];
+static freelist_idx_t freelist_random_seq_256[256];
+const static struct m_list {
+	size_t count;
+	freelist_idx_t *list;
+} freelist_random_seqs[] = {
+	{ ARRAY_SIZE(freelist_random_seq_2), freelist_random_seq_2 },
+	{ ARRAY_SIZE(freelist_random_seq_4), freelist_random_seq_4 },
+	{ ARRAY_SIZE(freelist_random_seq_8), freelist_random_seq_8 },
+	{ ARRAY_SIZE(freelist_random_seq_16), freelist_random_seq_16 },
+	{ ARRAY_SIZE(freelist_random_seq_32), freelist_random_seq_32 },
+	{ ARRAY_SIZE(freelist_random_seq_64), freelist_random_seq_64 },
+	{ ARRAY_SIZE(freelist_random_seq_128), freelist_random_seq_128 },
+	{ ARRAY_SIZE(freelist_random_seq_256), freelist_random_seq_256 },
+};
+
+/* Pre-compute the global pre-computed lists early at boot */
+static void __init freelist_random_init(void)
+{
+	unsigned int seed;
+	size_t i;
+	struct rnd_state state;
+
+	/* Get best entropy available at this stage */
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&state, seed);
+
+	for (i = 0; i < ARRAY_SIZE(freelist_random_seqs); i++) {
+		freelist_randomize(&state, freelist_random_seqs[i].list,
+				freelist_random_seqs[i].count);
+	}
+}
+#else
+static inline void __init freelist_random_init(void) { }
+static inline void cache_random_seq_create(struct kmem_cache *cachep) { }
+static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -1256,6 +1351,8 @@ void __init kmem_cache_init(void)
 	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
 		slab_max_order = SLAB_MAX_ORDER_HI;
 
+	freelist_random_init();
+
 	/* Bootstrap is tricky, because several objects are allocated
 	 * from caches that do not exist yet:
 	 * 1) initialize the kmem_cache cache: it contains the struct
@@ -2337,6 +2434,8 @@ void __kmem_cache_release(struct kmem_cache *cachep)
 	int i;
 	struct kmem_cache_node *n;
 
+	cache_random_seq_destroy(cachep);
+
 	free_percpu(cachep->cpu_cache);
 
 	/* NUMA: free the node structures */
@@ -2443,15 +2542,122 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 #endif
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/* Hold information during a freelist initialization */
+struct freelist_init_state {
+	unsigned int padding;
+	unsigned int pos;
+	unsigned int count;
+	unsigned int rand;
+	struct m_list freelist_random_seq;
+};
+
+/* Select the right pre-computed list and initialize state */
+static void freelist_state_initialize(struct freelist_init_state *state,
+				struct kmem_cache *cachep,
+				unsigned int count)
+{
+	unsigned int idx;
+	const unsigned int last_idx = ARRAY_SIZE(freelist_random_seqs) - 1;
+
+	memset(state, 0, sizeof(*state));
+	state->count = count;
+	state->pos = 0;
+
+	/* Use best entropy available to define a random shift */
+	get_random_bytes_arch(&state->rand, sizeof(state->rand));
+
+	if (cachep->random_seq) {
+		state->freelist_random_seq.list = cachep->random_seq;
+		state->freelist_random_seq.count = count;
+	} else {
+		/* count is always >= 2 */
+		idx = ilog2(count) - 1;
+		if (idx >= last_idx)
+			idx = last_idx;
+		else if (roundup_pow_of_two(idx + 1) != count)
+			idx++;
+		state->freelist_random_seq = freelist_random_seqs[idx];
+	}
+}
+
+/* Get the next entry on the list depending on the target list size */
+static freelist_idx_t get_next_entry(struct freelist_init_state *state)
+{
+	freelist_idx_t ret;
+
+	if (state->pos == state->freelist_random_seq.count) {
+		state->padding += state->pos;
+		state->pos = 0;
+	}
+
+	/* Randomize the entry using the random shift */
+	ret = state->freelist_random_seq.list[state->pos++];
+	ret = (ret + state->rand) % state->freelist_random_seq.count;
+	return ret;
+}
+
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
+{
+	freelist_idx_t entry;
+
+	do {
+		entry = get_next_entry(state);
+	} while ((entry + state->padding) >= state->count);
+
+	return entry + state->padding;
+}
+
+/*
+ * Shuffle the freelist initialization state based on pre-computed lists.
+ * return true if the list was successfully shuffled, false otherwise.
+ */
+static bool shuffle_freelist(struct kmem_cache *cachep, struct page *page)
+{
+	unsigned int objfreelist, i, count = cachep->num;
+	struct freelist_init_state state;
+
+	if (count < 2)
+		return false;
+
+	objfreelist = 0;
+	freelist_state_initialize(&state, cachep, count);
+
+	/* Take the first random entry as the objfreelist */
+	if (OBJFREELIST_SLAB(cachep)) {
+		objfreelist = next_random_slot(&state);
+		page->freelist = index_to_obj(cachep, page, objfreelist) +
+						obj_offset(cachep);
+		count--;
+	}
+	for (i = 0; i < count; i++)
+		set_free_obj(page, i, next_random_slot(&state));
+
+	if (OBJFREELIST_SLAB(cachep))
+		set_free_obj(page, i, objfreelist);
+	return true;
+}
+#else
+static inline bool shuffle_freelist(struct kmem_cache *cachep,
+				struct page *page)
+{
+	return false;
+}
+#endif /* CONFIG_FREELIST_RANDOM */
+
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct page *page)
 {
 	int i;
 	void *objp;
+	bool shuffled;
 
 	cache_init_objs_debug(cachep, page);
 
-	if (OBJFREELIST_SLAB(cachep)) {
+	/* Try to randomize the freelist if enabled */
+	shuffled = shuffle_freelist(cachep, page);
+
+	if (!shuffled && OBJFREELIST_SLAB(cachep)) {
 		page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
 						obj_offset(cachep);
 	}
@@ -2465,7 +2671,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			kasan_poison_object_data(cachep, objp);
 		}
 
-		set_free_obj(page, i, i);
+		if (!shuffled)
+			set_free_obj(page, i, i);
 	}
 }
 
@@ -3815,6 +4022,8 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
 	int shared = 0;
 	int batchcount = 0;
 
+	cache_random_seq_create(cachep);
+
 	if (!is_root_cache(cachep)) {
 		struct kmem_cache *root = memcg_root_cache(cachep);
 		limit = root->limit;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-25 20:39 ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-25 20:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kees Cook
  Cc: gthelen, labbott, kernel-hardening, linux-kernel, linux-mm,
	Thomas Garnier

Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
SLAB freelist. The list is randomized during initialization of a new set
of pages. The order on different freelist sizes is pre-computed at boot
for performance. Each kmem_cache has its own randomized freelist except
early on boot where global lists are used. This security feature reduces
the predictability of the kernel SLAB allocator against heap overflows
rendering attacks much less stable.

For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

Also, since v4.6 the freelist was moved at the end of the SLAB. It means
a controllable heap is opened to new attacks not yet publicly discussed.
A kernel heap overflow can be transformed to multiple use-after-free.
This feature makes this type of attack harder too.

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available in the boot stage. In the worse case this function
will fallback to the get_random_bytes sub API. We also generate a shift
random number to shift pre-computed freelist for each new set of pages.

The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.

Performance results highlighted no major changes:

slab_test 1 run on boot. Difference only seen on the 2048 size test
being the worse case scenario covered by freelist randomization. New
slab pages are constantly being created on the 10000 allocations.
Variance should be mainly due to getting new pages every few
allocations.

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 121 cycles
10000 times kmalloc(16)/kfree -> 121 cycles
10000 times kmalloc(32)/kfree -> 121 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 121 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 119 cycles
10000 times kmalloc(1024)/kfree -> 119 cycles
10000 times kmalloc(2048)/kfree -> 119 cycles
10000 times kmalloc(4096)/kfree -> 121 cycles
10000 times kmalloc(8192)/kfree -> 119 cycles
10000 times kmalloc(16384)/kfree -> 119 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 121 cycles
10000 times kmalloc(16)/kfree -> 121 cycles
10000 times kmalloc(32)/kfree -> 123 cycles
10000 times kmalloc(64)/kfree -> 142 cycles
10000 times kmalloc(128)/kfree -> 121 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 119 cycles
10000 times kmalloc(1024)/kfree -> 119 cycles
10000 times kmalloc(2048)/kfree -> 119 cycles
10000 times kmalloc(4096)/kfree -> 119 cycles
10000 times kmalloc(8192)/kfree -> 119 cycles
10000 times kmalloc(16384)/kfree -> 119 cycles

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160422
---
 include/linux/slab_def.h |   4 +
 init/Kconfig             |   9 ++
 mm/slab.c                | 213 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 9edbbf3..182ec26 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -80,6 +80,10 @@ struct kmem_cache {
 	struct kasan_cache kasan_info;
 #endif
 
+#ifdef CONFIG_FREELIST_RANDOM
+	void *random_seq;
+#endif
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/init/Kconfig b/init/Kconfig
index 0c66640..73453d0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,15 @@ config SLOB
 
 endchoice
 
+config FREELIST_RANDOM
+	default n
+	depends on SLAB
+	bool "SLAB freelist randomization"
+	help
+	  Randomizes the freelist order used on creating new SLABs. This
+	  security feature reduces the predictability of the kernel slab
+	  allocator against heap overflows.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slab.c b/mm/slab.c
index b82ee6b..89eb617 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -116,6 +116,7 @@
 #include	<linux/kmemcheck.h>
 #include	<linux/memory.h>
 #include	<linux/prefetch.h>
+#include	<linux/log2.h>
 
 #include	<net/sock.h>
 
@@ -1230,6 +1231,100 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
+			size_t count)
+{
+	size_t i;
+	unsigned int rand;
+
+	for (i = 0; i < count; i++)
+		list[i] = i;
+
+	/* Fisher-Yates shuffle */
+	for (i = count - 1; i > 0; i--) {
+		rand = prandom_u32_state(state);
+		rand %= (i + 1);
+		swap(list[i], list[rand]);
+	}
+}
+
+/* Create a random sequence per cache */
+static void cache_random_seq_create(struct kmem_cache *cachep)
+{
+	unsigned int seed, count = cachep->num;
+	struct rnd_state state;
+
+	if (count < 2)
+		return;
+
+	cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), GFP_KERNEL);
+	BUG_ON(cachep->random_seq == NULL);
+
+	/* Get best entropy at this stage */
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&state, seed);
+
+	freelist_randomize(&state, cachep->random_seq, count);
+}
+
+/* Destroy the per-cache random freelist sequence */
+static void cache_random_seq_destroy(struct kmem_cache *cachep)
+{
+	kfree(cachep->random_seq);
+	cachep->random_seq = NULL;
+}
+
+/*
+ * Global static list are used when pre-computed cache list are not yet
+ * available. Lists of different sizes are created to optimize performance on
+ * SLABS with different object counts.
+ */
+static freelist_idx_t freelist_random_seq_2[2];
+static freelist_idx_t freelist_random_seq_4[4];
+static freelist_idx_t freelist_random_seq_8[8];
+static freelist_idx_t freelist_random_seq_16[16];
+static freelist_idx_t freelist_random_seq_32[32];
+static freelist_idx_t freelist_random_seq_64[64];
+static freelist_idx_t freelist_random_seq_128[128];
+static freelist_idx_t freelist_random_seq_256[256];
+const static struct m_list {
+	size_t count;
+	freelist_idx_t *list;
+} freelist_random_seqs[] = {
+	{ ARRAY_SIZE(freelist_random_seq_2), freelist_random_seq_2 },
+	{ ARRAY_SIZE(freelist_random_seq_4), freelist_random_seq_4 },
+	{ ARRAY_SIZE(freelist_random_seq_8), freelist_random_seq_8 },
+	{ ARRAY_SIZE(freelist_random_seq_16), freelist_random_seq_16 },
+	{ ARRAY_SIZE(freelist_random_seq_32), freelist_random_seq_32 },
+	{ ARRAY_SIZE(freelist_random_seq_64), freelist_random_seq_64 },
+	{ ARRAY_SIZE(freelist_random_seq_128), freelist_random_seq_128 },
+	{ ARRAY_SIZE(freelist_random_seq_256), freelist_random_seq_256 },
+};
+
+/* Pre-compute the global pre-computed lists early at boot */
+static void __init freelist_random_init(void)
+{
+	unsigned int seed;
+	size_t i;
+	struct rnd_state state;
+
+	/* Get best entropy available at this stage */
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&state, seed);
+
+	for (i = 0; i < ARRAY_SIZE(freelist_random_seqs); i++) {
+		freelist_randomize(&state, freelist_random_seqs[i].list,
+				freelist_random_seqs[i].count);
+	}
+}
+#else
+static inline void __init freelist_random_init(void) { }
+static inline void cache_random_seq_create(struct kmem_cache *cachep) { }
+static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -1256,6 +1351,8 @@ void __init kmem_cache_init(void)
 	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
 		slab_max_order = SLAB_MAX_ORDER_HI;
 
+	freelist_random_init();
+
 	/* Bootstrap is tricky, because several objects are allocated
 	 * from caches that do not exist yet:
 	 * 1) initialize the kmem_cache cache: it contains the struct
@@ -2337,6 +2434,8 @@ void __kmem_cache_release(struct kmem_cache *cachep)
 	int i;
 	struct kmem_cache_node *n;
 
+	cache_random_seq_destroy(cachep);
+
 	free_percpu(cachep->cpu_cache);
 
 	/* NUMA: free the node structures */
@@ -2443,15 +2542,122 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 #endif
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/* Hold information during a freelist initialization */
+struct freelist_init_state {
+	unsigned int padding;
+	unsigned int pos;
+	unsigned int count;
+	unsigned int rand;
+	struct m_list freelist_random_seq;
+};
+
+/* Select the right pre-computed list and initialize state */
+static void freelist_state_initialize(struct freelist_init_state *state,
+				struct kmem_cache *cachep,
+				unsigned int count)
+{
+	unsigned int idx;
+	const unsigned int last_idx = ARRAY_SIZE(freelist_random_seqs) - 1;
+
+	memset(state, 0, sizeof(*state));
+	state->count = count;
+	state->pos = 0;
+
+	/* Use best entropy available to define a random shift */
+	get_random_bytes_arch(&state->rand, sizeof(state->rand));
+
+	if (cachep->random_seq) {
+		state->freelist_random_seq.list = cachep->random_seq;
+		state->freelist_random_seq.count = count;
+	} else {
+		/* count is always >= 2 */
+		idx = ilog2(count) - 1;
+		if (idx >= last_idx)
+			idx = last_idx;
+		else if (roundup_pow_of_two(idx + 1) != count)
+			idx++;
+		state->freelist_random_seq = freelist_random_seqs[idx];
+	}
+}
+
+/* Get the next entry on the list depending on the target list size */
+static freelist_idx_t get_next_entry(struct freelist_init_state *state)
+{
+	freelist_idx_t ret;
+
+	if (state->pos == state->freelist_random_seq.count) {
+		state->padding += state->pos;
+		state->pos = 0;
+	}
+
+	/* Randomize the entry using the random shift */
+	ret = state->freelist_random_seq.list[state->pos++];
+	ret = (ret + state->rand) % state->freelist_random_seq.count;
+	return ret;
+}
+
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
+{
+	freelist_idx_t entry;
+
+	do {
+		entry = get_next_entry(state);
+	} while ((entry + state->padding) >= state->count);
+
+	return entry + state->padding;
+}
+
+/*
+ * Shuffle the freelist initialization state based on pre-computed lists.
+ * return true if the list was successfully shuffled, false otherwise.
+ */
+static bool shuffle_freelist(struct kmem_cache *cachep, struct page *page)
+{
+	unsigned int objfreelist, i, count = cachep->num;
+	struct freelist_init_state state;
+
+	if (count < 2)
+		return false;
+
+	objfreelist = 0;
+	freelist_state_initialize(&state, cachep, count);
+
+	/* Take the first random entry as the objfreelist */
+	if (OBJFREELIST_SLAB(cachep)) {
+		objfreelist = next_random_slot(&state);
+		page->freelist = index_to_obj(cachep, page, objfreelist) +
+						obj_offset(cachep);
+		count--;
+	}
+	for (i = 0; i < count; i++)
+		set_free_obj(page, i, next_random_slot(&state));
+
+	if (OBJFREELIST_SLAB(cachep))
+		set_free_obj(page, i, objfreelist);
+	return true;
+}
+#else
+static inline bool shuffle_freelist(struct kmem_cache *cachep,
+				struct page *page)
+{
+	return false;
+}
+#endif /* CONFIG_FREELIST_RANDOM */
+
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct page *page)
 {
 	int i;
 	void *objp;
+	bool shuffled;
 
 	cache_init_objs_debug(cachep, page);
 
-	if (OBJFREELIST_SLAB(cachep)) {
+	/* Try to randomize the freelist if enabled */
+	shuffled = shuffle_freelist(cachep, page);
+
+	if (!shuffled && OBJFREELIST_SLAB(cachep)) {
 		page->freelist = index_to_obj(cachep, page, cachep->num - 1) +
 						obj_offset(cachep);
 	}
@@ -2465,7 +2671,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			kasan_poison_object_data(cachep, objp);
 		}
 
-		set_free_obj(page, i, i);
+		if (!shuffled)
+			set_free_obj(page, i, i);
 	}
 }
 
@@ -3815,6 +4022,8 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
 	int shared = 0;
 	int batchcount = 0;
 
+	cache_random_seq_create(cachep);
+
 	if (!is_root_cache(cachep)) {
 		struct kmem_cache *root = memcg_root_cache(cachep);
 		limit = root->limit;
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-18 17:00 ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-18 17:00 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kees Cook
  Cc: gthelen, labbott, kernel-hardening, linux-kernel, linux-mm,
	Thomas Garnier

Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
SLAB freelist. The list is randomized during initialization of a new set
of pages. The order on different freelist sizes is pre-computed at boot
for performance. This security feature reduces the predictability of the
kernel SLAB allocator against heap overflows rendering attacks much less
stable.

For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

Also, since v4.6 the freelist was moved at the end of the SLAB. It means
a controllable heap is opened to new attacks not yet publicly discussed.
A kernel heap overflow can be transformed to multiple use-after-free.
This feature makes this type of attack harder too.

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available at that boot stage. In the worse case this function
will fallback to the get_random_bytes sub API.

The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.

Performance results highlighted no major changes:

Netperf average on 10 runs:

threads,base,change
16,576943.10,585905.90 (101.55%)
32,564082.00,569741.20 (101.00%)
48,558334.30,561851.20 (100.63%)
64,552025.20,556448.30 (100.80%)
80,552294.40,551743.10 (99.90%)
96,552435.30,547529.20 (99.11%)
112,551320.60,550183.20 (99.79%)
128,549138.30,550542.70 (100.26%)
144,549344.50,544529.10 (99.12%)
160,550360.80,539929.30 (98.10%)

slab_test 1 run on boot. After is faster except for odd result on size
2048.

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 118 cycles
10000 times kmalloc(16)/kfree -> 118 cycles
10000 times kmalloc(32)/kfree -> 118 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 118 cycles
10000 times kmalloc(256)/kfree -> 115 cycles
10000 times kmalloc(512)/kfree -> 115 cycles
10000 times kmalloc(1024)/kfree -> 115 cycles
10000 times kmalloc(2048)/kfree -> 115 cycles
10000 times kmalloc(4096)/kfree -> 115 cycles
10000 times kmalloc(8192)/kfree -> 115 cycles
10000 times kmalloc(16384)/kfree -> 115 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 115 cycles
10000 times kmalloc(16)/kfree -> 115 cycles
10000 times kmalloc(32)/kfree -> 115 cycles
10000 times kmalloc(64)/kfree -> 120 cycles
10000 times kmalloc(128)/kfree -> 127 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 112 cycles
10000 times kmalloc(1024)/kfree -> 112 cycles
10000 times kmalloc(2048)/kfree -> 112 cycles
10000 times kmalloc(4096)/kfree -> 112 cycles
10000 times kmalloc(8192)/kfree -> 112 cycles
10000 times kmalloc(16384)/kfree -> 112 cycles

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160418
---
 init/Kconfig |   9 ++++
 mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d..ee35418 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,15 @@ config SLOB
 
 endchoice
 
+config FREELIST_RANDOM
+	default n
+	depends on SLAB
+	bool "SLAB freelist randomization"
+	help
+	  Randomizes the freelist order used on creating new SLABs. This
+	  security feature reduces the predictability of the kernel slab
+	  allocator against heap overflows.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slab.c b/mm/slab.c
index b70aabf..8371d80 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -116,6 +116,7 @@
 #include	<linux/kmemcheck.h>
 #include	<linux/memory.h>
 #include	<linux/prefetch.h>
+#include	<linux/log2.h>
 
 #include	<net/sock.h>
 
@@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/*
+ * Master lists are pre-computed random lists
+ * Lists of different sizes are used to optimize performance on SLABS with
+ * different object counts.
+ */
+static freelist_idx_t master_list_2[2];
+static freelist_idx_t master_list_4[4];
+static freelist_idx_t master_list_8[8];
+static freelist_idx_t master_list_16[16];
+static freelist_idx_t master_list_32[32];
+static freelist_idx_t master_list_64[64];
+static freelist_idx_t master_list_128[128];
+static freelist_idx_t master_list_256[256];
+const static struct m_list {
+	size_t count;
+	freelist_idx_t *list;
+} master_lists[] = {
+	{ ARRAY_SIZE(master_list_2), master_list_2 },
+	{ ARRAY_SIZE(master_list_4), master_list_4 },
+	{ ARRAY_SIZE(master_list_8), master_list_8 },
+	{ ARRAY_SIZE(master_list_16), master_list_16 },
+	{ ARRAY_SIZE(master_list_32), master_list_32 },
+	{ ARRAY_SIZE(master_list_64), master_list_64 },
+	{ ARRAY_SIZE(master_list_128), master_list_128 },
+	{ ARRAY_SIZE(master_list_256), master_list_256 },
+};
+
+/* Pre-compute the Freelist master lists at boot */
+static void __init freelist_random_init(void)
+{
+	unsigned int seed;
+	size_t z, i, rand;
+	struct rnd_state slab_rand;
+
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&slab_rand, seed);
+
+	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
+		for (i = 0; i < master_lists[z].count; i++)
+			master_lists[z].list[i] = i;
+
+		/* Fisher-Yates shuffle */
+		for (i = master_lists[z].count - 1; i > 0; i--) {
+			rand = prandom_u32_state(&slab_rand);
+			rand %= (i + 1);
+			swap(master_lists[z].list[i],
+				master_lists[z].list[rand]);
+		}
+	}
+}
+#else
+static inline void __init freelist_random_init(void) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
 	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
 		slab_max_order = SLAB_MAX_ORDER_HI;
 
+	freelist_random_init();
+
 	/* Bootstrap is tricky, because several objects are allocated
 	 * from caches that do not exist yet:
 	 * 1) initialize the kmem_cache cache: it contains the struct
@@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 #endif
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/* Identify if the target freelist matches the pre-computed list */
+enum master_type {
+	match,
+	less,
+	more
+};
+
+/* Hold information during a freelist initialization */
+struct freelist_init_state {
+	unsigned int padding;
+	unsigned int pos;
+	unsigned int count;
+	struct m_list master_list;
+	unsigned int master_count;
+	enum master_type type;
+};
+
+/* Select the right pre-computed master list and initialize state */
+static void freelist_state_initialize(struct freelist_init_state *state,
+				      unsigned int count)
+{
+	unsigned int idx;
+	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
+
+	memset(state, 0, sizeof(*state));
+	state->count = count;
+	state->pos = 0;
+	/* count is always >= 2 */
+	idx = ilog2(count) - 1;
+	if (idx >= last_idx)
+		idx = last_idx;
+	else if (roundup_pow_of_two(idx + 1) != count)
+		idx++;
+	state->master_list = master_lists[idx];
+	if (state->master_list.count == state->count)
+		state->type = match;
+	else if (state->master_list.count > state->count)
+		state->type = more;
+	else
+		state->type = less;
+}
+
+/* Get the next entry on the master list depending on the target list size */
+static freelist_idx_t get_next_entry(struct freelist_init_state *state)
+{
+	if (state->type == less && state->pos == state->master_list.count) {
+		state->padding += state->pos;
+		state->pos = 0;
+	}
+	BUG_ON(state->pos >= state->master_list.count);
+	return state->master_list.list[state->pos++];
+}
+
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
+{
+	freelist_idx_t cur, entry;
+
+	entry = get_next_entry(state);
+
+	if (state->type != match) {
+		while ((entry + state->padding) >= state->count)
+			entry = get_next_entry(state);
+		cur = entry + state->padding;
+		BUG_ON(cur >= state->count);
+	} else {
+		cur = entry;
+	}
+
+	return cur;
+}
+
+/* Shuffle the freelist initialization state based on pre-computed lists */
+static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
+			     unsigned int count)
+{
+	unsigned int i;
+	struct freelist_init_state state;
+
+	if (count < 2) {
+		for (i = 0; i < count; i++)
+			set_free_obj(page, i, i);
+		return;
+	}
+
+	/* Last chunk is used already in this case */
+	if (OBJFREELIST_SLAB(cachep))
+		count--;
+
+	freelist_state_initialize(&state, count);
+	for (i = 0; i < count; i++)
+		set_free_obj(page, i, next_random_slot(&state));
+
+	if (OBJFREELIST_SLAB(cachep))
+		set_free_obj(page, i, i);
+}
+#else
+static inline void shuffle_freelist(struct kmem_cache *cachep,
+				    struct page *page, unsigned int count) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct page *page)
 {
@@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			kasan_poison_object_data(cachep, objp);
 		}
 
-		set_free_obj(page, i, i);
+		/* If enabled, initialization is done in shuffle_freelist */
+		if (!config_enabled(CONFIG_FREELIST_RANDOM))
+			set_free_obj(page, i, i);
 	}
+
+	shuffle_freelist(cachep, page, cachep->num);
 }
 
 static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2] mm: SLAB freelist randomization
@ 2016-04-18 17:00 ` Thomas Garnier
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Garnier @ 2016-04-18 17:00 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Kees Cook
  Cc: gthelen, labbott, kernel-hardening, linux-kernel, linux-mm,
	Thomas Garnier

Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
SLAB freelist. The list is randomized during initialization of a new set
of pages. The order on different freelist sizes is pre-computed at boot
for performance. This security feature reduces the predictability of the
kernel SLAB allocator against heap overflows rendering attacks much less
stable.

For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

Also, since v4.6 the freelist was moved at the end of the SLAB. It means
a controllable heap is opened to new attacks not yet publicly discussed.
A kernel heap overflow can be transformed to multiple use-after-free.
This feature makes this type of attack harder too.

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available at that boot stage. In the worse case this function
will fallback to the get_random_bytes sub API.

The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.

Performance results highlighted no major changes:

Netperf average on 10 runs:

threads,base,change
16,576943.10,585905.90 (101.55%)
32,564082.00,569741.20 (101.00%)
48,558334.30,561851.20 (100.63%)
64,552025.20,556448.30 (100.80%)
80,552294.40,551743.10 (99.90%)
96,552435.30,547529.20 (99.11%)
112,551320.60,550183.20 (99.79%)
128,549138.30,550542.70 (100.26%)
144,549344.50,544529.10 (99.12%)
160,550360.80,539929.30 (98.10%)

slab_test 1 run on boot. After is faster except for odd result on size
2048.

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 118 cycles
10000 times kmalloc(16)/kfree -> 118 cycles
10000 times kmalloc(32)/kfree -> 118 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 118 cycles
10000 times kmalloc(256)/kfree -> 115 cycles
10000 times kmalloc(512)/kfree -> 115 cycles
10000 times kmalloc(1024)/kfree -> 115 cycles
10000 times kmalloc(2048)/kfree -> 115 cycles
10000 times kmalloc(4096)/kfree -> 115 cycles
10000 times kmalloc(8192)/kfree -> 115 cycles
10000 times kmalloc(16384)/kfree -> 115 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 115 cycles
10000 times kmalloc(16)/kfree -> 115 cycles
10000 times kmalloc(32)/kfree -> 115 cycles
10000 times kmalloc(64)/kfree -> 120 cycles
10000 times kmalloc(128)/kfree -> 127 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 112 cycles
10000 times kmalloc(1024)/kfree -> 112 cycles
10000 times kmalloc(2048)/kfree -> 112 cycles
10000 times kmalloc(4096)/kfree -> 112 cycles
10000 times kmalloc(8192)/kfree -> 112 cycles
10000 times kmalloc(16384)/kfree -> 112 cycles

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160418
---
 init/Kconfig |   9 ++++
 mm/slab.c    | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d..ee35418 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,15 @@ config SLOB
 
 endchoice
 
+config FREELIST_RANDOM
+	default n
+	depends on SLAB
+	bool "SLAB freelist randomization"
+	help
+	  Randomizes the freelist order used on creating new SLABs. This
+	  security feature reduces the predictability of the kernel slab
+	  allocator against heap overflows.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slab.c b/mm/slab.c
index b70aabf..8371d80 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -116,6 +116,7 @@
 #include	<linux/kmemcheck.h>
 #include	<linux/memory.h>
 #include	<linux/prefetch.h>
+#include	<linux/log2.h>
 
 #include	<net/sock.h>
 
@@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/*
+ * Master lists are pre-computed random lists
+ * Lists of different sizes are used to optimize performance on SLABS with
+ * different object counts.
+ */
+static freelist_idx_t master_list_2[2];
+static freelist_idx_t master_list_4[4];
+static freelist_idx_t master_list_8[8];
+static freelist_idx_t master_list_16[16];
+static freelist_idx_t master_list_32[32];
+static freelist_idx_t master_list_64[64];
+static freelist_idx_t master_list_128[128];
+static freelist_idx_t master_list_256[256];
+const static struct m_list {
+	size_t count;
+	freelist_idx_t *list;
+} master_lists[] = {
+	{ ARRAY_SIZE(master_list_2), master_list_2 },
+	{ ARRAY_SIZE(master_list_4), master_list_4 },
+	{ ARRAY_SIZE(master_list_8), master_list_8 },
+	{ ARRAY_SIZE(master_list_16), master_list_16 },
+	{ ARRAY_SIZE(master_list_32), master_list_32 },
+	{ ARRAY_SIZE(master_list_64), master_list_64 },
+	{ ARRAY_SIZE(master_list_128), master_list_128 },
+	{ ARRAY_SIZE(master_list_256), master_list_256 },
+};
+
+/* Pre-compute the Freelist master lists at boot */
+static void __init freelist_random_init(void)
+{
+	unsigned int seed;
+	size_t z, i, rand;
+	struct rnd_state slab_rand;
+
+	get_random_bytes_arch(&seed, sizeof(seed));
+	prandom_seed_state(&slab_rand, seed);
+
+	for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
+		for (i = 0; i < master_lists[z].count; i++)
+			master_lists[z].list[i] = i;
+
+		/* Fisher-Yates shuffle */
+		for (i = master_lists[z].count - 1; i > 0; i--) {
+			rand = prandom_u32_state(&slab_rand);
+			rand %= (i + 1);
+			swap(master_lists[z].list[i],
+				master_lists[z].list[rand]);
+		}
+	}
+}
+#else
+static inline void __init freelist_random_init(void) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
 	if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
 		slab_max_order = SLAB_MAX_ORDER_HI;
 
+	freelist_random_init();
+
 	/* Bootstrap is tricky, because several objects are allocated
 	 * from caches that do not exist yet:
 	 * 1) initialize the kmem_cache cache: it contains the struct
@@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
 #endif
 }
 
+#ifdef CONFIG_FREELIST_RANDOM
+/* Identify if the target freelist matches the pre-computed list */
+enum master_type {
+	match,
+	less,
+	more
+};
+
+/* Hold information during a freelist initialization */
+struct freelist_init_state {
+	unsigned int padding;
+	unsigned int pos;
+	unsigned int count;
+	struct m_list master_list;
+	unsigned int master_count;
+	enum master_type type;
+};
+
+/* Select the right pre-computed master list and initialize state */
+static void freelist_state_initialize(struct freelist_init_state *state,
+				      unsigned int count)
+{
+	unsigned int idx;
+	const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
+
+	memset(state, 0, sizeof(*state));
+	state->count = count;
+	state->pos = 0;
+	/* count is always >= 2 */
+	idx = ilog2(count) - 1;
+	if (idx >= last_idx)
+		idx = last_idx;
+	else if (roundup_pow_of_two(idx + 1) != count)
+		idx++;
+	state->master_list = master_lists[idx];
+	if (state->master_list.count == state->count)
+		state->type = match;
+	else if (state->master_list.count > state->count)
+		state->type = more;
+	else
+		state->type = less;
+}
+
+/* Get the next entry on the master list depending on the target list size */
+static freelist_idx_t get_next_entry(struct freelist_init_state *state)
+{
+	if (state->type == less && state->pos == state->master_list.count) {
+		state->padding += state->pos;
+		state->pos = 0;
+	}
+	BUG_ON(state->pos >= state->master_list.count);
+	return state->master_list.list[state->pos++];
+}
+
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
+{
+	freelist_idx_t cur, entry;
+
+	entry = get_next_entry(state);
+
+	if (state->type != match) {
+		while ((entry + state->padding) >= state->count)
+			entry = get_next_entry(state);
+		cur = entry + state->padding;
+		BUG_ON(cur >= state->count);
+	} else {
+		cur = entry;
+	}
+
+	return cur;
+}
+
+/* Shuffle the freelist initialization state based on pre-computed lists */
+static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
+			     unsigned int count)
+{
+	unsigned int i;
+	struct freelist_init_state state;
+
+	if (count < 2) {
+		for (i = 0; i < count; i++)
+			set_free_obj(page, i, i);
+		return;
+	}
+
+	/* Last chunk is used already in this case */
+	if (OBJFREELIST_SLAB(cachep))
+		count--;
+
+	freelist_state_initialize(&state, count);
+	for (i = 0; i < count; i++)
+		set_free_obj(page, i, next_random_slot(&state));
+
+	if (OBJFREELIST_SLAB(cachep))
+		set_free_obj(page, i, i);
+}
+#else
+static inline void shuffle_freelist(struct kmem_cache *cachep,
+				    struct page *page, unsigned int count) { }
+#endif /* CONFIG_FREELIST_RANDOM */
+
 static void cache_init_objs(struct kmem_cache *cachep,
 			    struct page *page)
 {
@@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
 			kasan_poison_object_data(cachep, objp);
 		}
 
-		set_free_obj(page, i, i);
+		/* If enabled, initialization is done in shuffle_freelist */
+		if (!config_enabled(CONFIG_FREELIST_RANDOM))
+			set_free_obj(page, i, i);
 	}
+
+	shuffle_freelist(cachep, page, cachep->num);
 }
 
 static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-04-26 14:19 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-18 17:14 [PATCH v2] mm: SLAB freelist randomization Thomas Garnier
2016-04-18 17:14 ` [kernel-hardening] " Thomas Garnier
2016-04-18 17:14 ` Thomas Garnier
2016-04-19  7:15 ` Joonsoo Kim
2016-04-19  7:15   ` [kernel-hardening] " Joonsoo Kim
2016-04-19  7:15   ` Joonsoo Kim
2016-04-19 16:44   ` Thomas Garnier
2016-04-19 16:44     ` [kernel-hardening] " Thomas Garnier
2016-04-19 16:44     ` Thomas Garnier
2016-04-20  8:08     ` Joonsoo Kim
2016-04-20  8:08       ` [kernel-hardening] " Joonsoo Kim
2016-04-20  8:08       ` Joonsoo Kim
2016-04-20 14:47       ` Thomas Garnier
2016-04-20 14:47         ` [kernel-hardening] " Thomas Garnier
2016-04-20 14:47         ` Thomas Garnier
  -- strict thread matches above, loose matches on Subject: below --
2016-04-25 20:39 Thomas Garnier
2016-04-25 20:39 ` Thomas Garnier
2016-04-25 21:10 ` Andrew Morton
2016-04-25 21:10   ` Andrew Morton
2016-04-25 21:13   ` Thomas Garnier
2016-04-25 21:13     ` Thomas Garnier
2016-04-25 21:14     ` Thomas Garnier
2016-04-25 21:14       ` Thomas Garnier
2016-04-25 21:38       ` Andrew Morton
2016-04-25 21:38         ` Andrew Morton
2016-04-25 21:43         ` Thomas Garnier
2016-04-25 21:43           ` Thomas Garnier
2016-04-26  0:40 ` Joonsoo Kim
2016-04-26  0:40   ` Joonsoo Kim
2016-04-26  1:58   ` Thomas Garnier
2016-04-26  1:58     ` Thomas Garnier
2016-04-26 14:19 ` Christoph Lameter
2016-04-26 14:19   ` Christoph Lameter
2016-04-18 17:00 Thomas Garnier
2016-04-18 17:00 ` Thomas Garnier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.