linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/3] add init_on_alloc/init_on_free boot options
@ 2019-06-17 15:10 Alexander Potapenko
  2019-06-17 15:10 ` [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 " Alexander Potapenko
  2019-06-17 15:10 ` [PATCH v7 2/2] mm: init: report memory auto-initialization features at boot time Alexander Potapenko
  0 siblings, 2 replies; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-17 15:10 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter, Kees Cook
  Cc: Alexander Potapenko, Masahiro Yamada, Michal Hocko, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening

Provide init_on_alloc and init_on_free boot options.

These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.

Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes.
SLOB allocator isn't supported at the moment, as its emulation of kmem
caches complicates handling of SLAB_TYPESAFE_BY_RCU caches correctly.

Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.

As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations. There's not enough
evidence that doing so can speed up real-life cases, and introducing
ways to opt-out may result in things going out of control.

To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com

Alexander Potapenko (2):
  mm: security: introduce init_on_alloc=1 and init_on_free=1 boot
    options
  mm: init: report memory auto-initialization features at boot time

 .../admin-guide/kernel-parameters.txt         |  9 +++
 drivers/infiniband/core/uverbs_ioctl.c        |  2 +-
 include/linux/mm.h                            | 22 +++++++
 init/main.c                                   | 24 +++++++
 kernel/kexec_core.c                           |  2 +-
 mm/dmapool.c                                  |  2 +-
 mm/page_alloc.c                               | 63 ++++++++++++++++---
 mm/slab.c                                     | 16 ++++-
 mm/slab.h                                     | 19 ++++++
 mm/slub.c                                     | 33 ++++++++--
 net/core/sock.c                               |  2 +-
 security/Kconfig.hardening                    | 29 +++++++++
 12 files changed, 204 insertions(+), 19 deletions(-)
---
 v3: dropped __GFP_NO_AUTOINIT patches
 v5: dropped support for SLOB allocator, handle SLAB_TYPESAFE_BY_RCU
 v6: changed wording in boot-time message
 v7: dropped the test_meminit.c patch (picked by Andrew Morton already),
     minor wording changes
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-17 15:10 [PATCH v7 0/3] add init_on_alloc/init_on_free boot options Alexander Potapenko
@ 2019-06-17 15:10 ` Alexander Potapenko
  2019-06-17 22:10   ` Andrew Morton
                     ` (2 more replies)
  2019-06-17 15:10 ` [PATCH v7 2/2] mm: init: report memory auto-initialization features at boot time Alexander Potapenko
  1 sibling, 3 replies; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-17 15:10 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter, Kees Cook
  Cc: Alexander Potapenko, Masahiro Yamada, Michal Hocko, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening

The new options are needed to prevent possible information leaks and
make control-flow bugs that depend on uninitialized values more
deterministic.

init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.

init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.

Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
zero-initialized to preserve their semantics.

Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.

Slowdown for the new features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

The new features are also going to pave the way for hardware memory
tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects. With MTE, tagging will have the
same cost as memory initialization.

Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized. There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructre for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - unconditionally initialize pages in kernel_init_free_pages()
  - comment from Randy Dunlap: drop 'default false' lines from Kconfig.hardening
 v3:
  - don't call kernel_init_free_pages() from memblock_free_pages()
  - adopted some Kees' comments for the patch description
 v4:
  - use NULL instead of 0 in slab_alloc_node() (found by kbuild test robot)
  - don't write to NULL object in slab_alloc_node() (found by Android
    testing)
 v5:
  - adjusted documentation wording as suggested by Kees
  - disable SLAB_POISON if auto-initialization is on
  - don't wipe RCU cache allocations made without __GFP_ZERO
  - dropped SLOB support
 v7:
  - rebase the patch, added the Acked-by: tag
---
 .../admin-guide/kernel-parameters.txt         |  9 +++
 drivers/infiniband/core/uverbs_ioctl.c        |  2 +-
 include/linux/mm.h                            | 22 +++++++
 kernel/kexec_core.c                           |  2 +-
 mm/dmapool.c                                  |  2 +-
 mm/page_alloc.c                               | 63 ++++++++++++++++---
 mm/slab.c                                     | 16 ++++-
 mm/slab.h                                     | 19 ++++++
 mm/slub.c                                     | 33 ++++++++--
 net/core/sock.c                               |  2 +-
 security/Kconfig.hardening                    | 29 +++++++++
 11 files changed, 180 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..84ee1121a2b9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1673,6 +1673,15 @@
 
 	initrd=		[BOOT] Specify the location of the initial ramdisk
 
+	init_on_alloc=	[MM] Fill newly allocated pages and heap objects with
+			zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
+
+	init_on_free=	[MM] Fill freed pages and heap objects with zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
+
 	init_pkru=	[x86] Specify the default memory protection keys rights
 			register contents for all processes.  0x55555554 by
 			default (disallow access to all but pkey 0).  Can
diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index 829b0c6944d8..61758201d9b2 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
 	pbundle->internal_used =
 		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
-	if (flags & __GFP_ZERO)
+	if (want_init_on_alloc(flags))
 		memset(res, 0, size);
 	return res;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dd0b5f4e1e45..96be2604f313 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2696,6 +2696,28 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+static inline bool want_init_on_alloc(gfp_t flags)
+{
+	if (static_branch_unlikely(&init_on_alloc))
+		return true;
+	return flags & __GFP_ZERO;
+}
+
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_free);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_free);
+#endif
+static inline bool want_init_on_free(void)
+{
+	return static_branch_unlikely(&init_on_free);
+}
+
 extern bool _debug_pagealloc_enabled;
 
 static inline bool debug_pagealloc_enabled(void)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index fd5c95ff9251..2f75dd0d0d81 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		arch_kexec_post_alloc_pages(page_address(pages), count,
 					    gfp_mask);
 
-		if (gfp_mask & __GFP_ZERO)
+		if (want_init_on_alloc(gfp_mask))
 			for (i = 0; i < count; i++)
 				clear_highpage(pages + i);
 	}
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 8c94c89a6f7e..e164012d3491 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 #endif
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (mem_flags & __GFP_ZERO)
+	if (want_init_on_alloc(mem_flags))
 		memset(retval, 0, pool->size);
 
 	return retval;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..50a3b104a491 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -136,6 +136,48 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_free);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_free);
+#endif
+
+static int __init early_init_on_alloc(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result)
+		static_branch_enable(&init_on_alloc);
+	else
+		static_branch_disable(&init_on_alloc);
+	return ret;
+}
+early_param("init_on_alloc", early_init_on_alloc);
+
+static int __init early_init_on_free(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result)
+		static_branch_enable(&init_on_free);
+	else
+		static_branch_disable(&init_on_free);
+	return ret;
+}
+early_param("init_on_free", early_init_on_free);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
@@ -1090,6 +1132,14 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return ret;
 }
 
+static void kernel_init_free_pages(struct page *page, int numpages)
+{
+	int i;
+
+	for (i = 0; i < numpages; i++)
+		clear_highpage(page + i);
+}
+
 static __always_inline bool free_pages_prepare(struct page *page,
 					unsigned int order, bool check_free)
 {
@@ -1142,6 +1192,8 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	}
 	arch_free_page(page, order);
 	kernel_poison_pages(page, 1 << order, 0);
+	if (want_init_on_free())
+		kernel_init_free_pages(page, 1 << order);
 	if (debug_pagealloc_enabled())
 		kernel_map_pages(page, 1 << order, 0);
 
@@ -2020,8 +2072,8 @@ static inline int check_new_page(struct page *page)
 
 static inline bool free_pages_prezeroed(void)
 {
-	return IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
-		page_poisoning_enabled();
+	return (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
+		page_poisoning_enabled()) || want_init_on_free();
 }
 
 #ifdef CONFIG_DEBUG_VM
@@ -2075,13 +2127,10 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
 							unsigned int alloc_flags)
 {
-	int i;
-
 	post_alloc_hook(page, order, gfp_flags);
 
-	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
-		for (i = 0; i < (1 << order); i++)
-			clear_highpage(page + i);
+	if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags))
+		kernel_init_free_pages(page, 1 << order);
 
 	if (order && (gfp_flags & __GFP_COMP))
 		prep_compound_page(page, order);
diff --git a/mm/slab.c b/mm/slab.c
index f7117ad9b3a3..98a89d7c922d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1830,6 +1830,14 @@ static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
 
 	cachep->num = 0;
 
+	/*
+	 * If slab auto-initialization on free is enabled, store the freelist
+	 * off-slab, so that its contents don't end up in one of the allocated
+	 * objects.
+	 */
+	if (unlikely(slab_want_init_on_free(cachep)))
+		return false;
+
 	if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU)
 		return false;
 
@@ -3263,7 +3271,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 
-	if (unlikely(flags & __GFP_ZERO) && ptr)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
 		memset(ptr, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &ptr);
@@ -3320,7 +3328,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
 
-	if (unlikely(flags & __GFP_ZERO) && objp)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
 		memset(objp, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &objp);
@@ -3441,6 +3449,8 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
 	struct array_cache *ac = cpu_cache_get(cachep);
 
 	check_irq_off();
+	if (unlikely(slab_want_init_on_free(cachep)))
+		memset(objp, 0, cachep->object_size);
 	kmemleak_free_recursive(objp, cachep->flags);
 	objp = cache_free_debugcheck(cachep, objp, caller);
 
@@ -3528,7 +3538,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
 
 	/* Clear memory outside IRQ disabled section */
-	if (unlikely(flags & __GFP_ZERO))
+	if (unlikely(slab_want_init_on_alloc(flags, s)))
 		for (i = 0; i < size; i++)
 			memset(p[i], 0, s->object_size);
 
diff --git a/mm/slab.h b/mm/slab.h
index 43ac818b8592..31032d488b29 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -524,4 +524,23 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
 static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
+static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_alloc)) {
+		if (c->ctor)
+			return false;
+		if (c->flags & SLAB_TYPESAFE_BY_RCU)
+			return flags & __GFP_ZERO;
+		return true;
+	}
+	return flags & __GFP_ZERO;
+}
+
+static inline bool slab_want_init_on_free(struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_free))
+		return !(c->ctor || (c->flags & SLAB_TYPESAFE_BY_RCU));
+	return false;
+}
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slub.c b/mm/slub.c
index cd04dbd2b5d0..9c4a8b9a955c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1279,6 +1279,12 @@ static int __init setup_slub_debug(char *str)
 	if (*str == ',')
 		slub_debug_slabs = str + 1;
 out:
+	if ((static_branch_unlikely(&init_on_alloc) ||
+	     static_branch_unlikely(&init_on_free)) &&
+	    (slub_debug & SLAB_POISON)) {
+		pr_warn("disabling SLAB_POISON: can't be used together with memory auto-initialization\n");
+		slub_debug &= ~SLAB_POISON;
+	}
 	return 1;
 }
 
@@ -1424,6 +1430,19 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s, void *x)
 static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 					   void **head, void **tail)
 {
+
+	void *object;
+	void *next = *head;
+	void *old_tail = *tail ? *tail : *head;
+
+	if (slab_want_init_on_free(s))
+		do {
+			object = next;
+			next = get_freepointer(s, object);
+			memset(object, 0, s->size);
+			set_freepointer(s, object, next);
+		} while (object != old_tail);
+
 /*
  * Compiler cannot detect this function can be removed if slab_free_hook()
  * evaluates to nothing.  Thus, catch all relevant config debug options here.
@@ -1433,9 +1452,7 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 	defined(CONFIG_DEBUG_OBJECTS_FREE) ||	\
 	defined(CONFIG_KASAN)
 
-	void *object;
-	void *next = *head;
-	void *old_tail = *tail ? *tail : *head;
+	next = *head;
 
 	/* Head and tail of the reconstructed freelist */
 	*head = NULL;
@@ -2741,8 +2758,14 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		prefetch_freepointer(s, next_object);
 		stat(s, ALLOC_FASTPATH);
 	}
+	/*
+	 * If the object has been wiped upon free, make sure it's fully
+	 * initialized by zeroing out freelist pointer.
+	 */
+	if (unlikely(slab_want_init_on_free(s)) && object)
+		*(void **)object = NULL;
 
-	if (unlikely(gfpflags & __GFP_ZERO) && object)
+	if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
 		memset(object, 0, s->object_size);
 
 	slab_post_alloc_hook(s, gfpflags, 1, &object);
@@ -3163,7 +3186,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	local_irq_enable();
 
 	/* Clear memory outside IRQ disabled fastpath loop */
-	if (unlikely(flags & __GFP_ZERO)) {
+	if (unlikely(slab_want_init_on_alloc(flags, s))) {
 		int j;
 
 		for (j = 0; j < i; j++)
diff --git a/net/core/sock.c b/net/core/sock.c
index 2b3701958486..8ed13d2487b2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1596,7 +1596,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
 		if (!sk)
 			return sk;
-		if (priority & __GFP_ZERO)
+		if (want_init_on_alloc(priority))
 			sk_prot_clear_nulls(sk, prot->obj_size);
 	} else
 		sk = kmalloc(prot->obj_size, priority);
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index c6cb2d9b2905..a1ffe2eb4d5f 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -160,6 +160,35 @@ config STACKLEAK_RUNTIME_DISABLE
 	  runtime to control kernel stack erasing for kernels built with
 	  CONFIG_GCC_PLUGIN_STACKLEAK.
 
+config INIT_ON_ALLOC_DEFAULT_ON
+	bool "Enable heap memory zeroing on allocation by default"
+	help
+	  This has the effect of setting "init_on_alloc=1" on the kernel
+	  command line. This can be disabled with "init_on_alloc=0".
+	  When "init_on_alloc" is enabled, all page allocator and slab
+	  allocator memory will be zeroed when allocated, eliminating
+	  many kinds of "uninitialized heap memory" flaws, especially
+	  heap content exposures. The performance impact varies by
+	  workload, but most cases see <1% impact. Some synthetic
+	  workloads have measured as high as 7%.
+
+config INIT_ON_FREE_DEFAULT_ON
+	bool "Enable heap memory zeroing on free by default"
+	help
+	  This has the effect of setting "init_on_free=1" on the kernel
+	  command line. This can be disabled with "init_on_free=0".
+	  Similar to "init_on_alloc", when "init_on_free" is enabled,
+	  all page allocator and slab allocator memory will be zeroed
+	  when freed, eliminating many kinds of "uninitialized heap memory"
+	  flaws, especially heap content exposures. The primary difference
+	  with "init_on_free" is that data lifetime in memory is reduced,
+	  as anything freed is wiped immediately, making live forensics or
+	  cold boot memory attacks unable to recover freed memory contents.
+	  The performance impact varies by workload, but is more expensive
+	  than "init_on_alloc" due to the negative cache effects of
+	  touching "cold" memory areas. Most cases see 3-5% impact. Some
+	  synthetic workloads have measured as high as 8%.
+
 endmenu
 
 endmenu
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 2/2] mm: init: report memory auto-initialization features at boot time
  2019-06-17 15:10 [PATCH v7 0/3] add init_on_alloc/init_on_free boot options Alexander Potapenko
  2019-06-17 15:10 ` [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 " Alexander Potapenko
@ 2019-06-17 15:10 ` Alexander Potapenko
  1 sibling, 0 replies; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-17 15:10 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter
  Cc: Alexander Potapenko, Kees Cook, Dmitry Vyukov, James Morris,
	Jann Horn, Kostya Serebryany, Laura Abbott, Mark Rutland,
	Masahiro Yamada, Matthew Wilcox, Nick Desaulniers, Randy Dunlap,
	Sandeep Patil, Serge E. Hallyn, Souptick Joarder, Marco Elver,
	Kaiwan N Billimoria, kernel-hardening, linux-mm,
	linux-security-module

Print the currently enabled stack and heap initialization modes.

Stack initialization is enabled by a config flag, while heap
initialization is configured at boot time with defaults being set
in the config. It's more convenient for the user to have all information
about these hardening measures in one place at boot, so the user can
reason about the expected behavior of the running system.

The possible options for stack are:
 - "all" for CONFIG_INIT_STACK_ALL;
 - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL;
 - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF;
 - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER;
 - "off" otherwise.

Depending on the values of init_on_alloc and init_on_free boottime
options we also report "heap alloc" and "heap free" as "on"/"off".

In the init_on_free mode initializing pages at boot time may take a
while, so print a notice about that as well. This depends on how much
memory is installed, the memory bandwidth, etc.
On a relatively modern x86 system, it takes about 0.75s/GB to wipe all
memory:

  [    0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on
  [    0.419765] mem auto-init: clearing system memory may take some time...
  [   12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved)

Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Kaiwan N Billimoria <kaiwan@kaiwantech.com>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
---
 v6:
 - update patch description, fixed message about clearing memory
 v7:
 - rebase the patch, add the Acked-by: tag;
 - more description updates as suggested by Kees;
 - make report_meminit() static.
---
 init/main.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/init/main.c b/init/main.c
index 66a196c5e4c3..ff5803b0841c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -520,6 +520,29 @@ static inline void initcall_debug_enable(void)
 }
 #endif
 
+/* Report memory auto-initialization states for this boot. */
+static void __init report_meminit(void)
+{
+	const char *stack;
+
+	if (IS_ENABLED(CONFIG_INIT_STACK_ALL))
+		stack = "all";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL))
+		stack = "byref_all";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF))
+		stack = "byref";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER))
+		stack = "__user";
+	else
+		stack = "off";
+
+	pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n",
+		stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off",
+		want_init_on_free() ? "on" : "off");
+	if (want_init_on_free())
+		pr_info("mem auto-init: clearing system memory may take some time...\n");
+}
+
 /*
  * Set up kernel memory allocators
  */
@@ -530,6 +553,7 @@ static void __init mm_init(void)
 	 * bigger than MAX_ORDER unless SPARSEMEM.
 	 */
 	page_ext_init_flatmem();
+	report_meminit();
 	mem_init();
 	kmem_cache_init();
 	pgtable_init();
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-17 15:10 ` [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 " Alexander Potapenko
@ 2019-06-17 22:10   ` Andrew Morton
  2019-06-18  5:07     ` Kees Cook
  2019-06-21  7:09   ` Michal Hocko
  2019-06-21 12:36   ` Qian Cai
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2019-06-17 22:10 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Christoph Lameter, Kees Cook, Masahiro Yamada, Michal Hocko,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening

On Mon, 17 Jun 2019 17:10:49 +0200 Alexander Potapenko <glider@google.com> wrote:

> Slowdown for the new features compared to init_on_free=0,
> init_on_alloc=0:
> 
> hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Sanity check time.  Is anyone really going to use this?  Seriously,
honestly, for real?  If "yes" then how did we determine that?

Also, a bit of a nit: "init_on_alloc" and "init_on_free" aren't very
well chosen names for the boot options - they could refer to any kernel
object at all, really.  init_pages_on_alloc would be better?  I don't think
this matters much - the boot options are already chaotic.  But still...



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-17 22:10   ` Andrew Morton
@ 2019-06-18  5:07     ` Kees Cook
  2019-06-18  5:19       ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Kees Cook @ 2019-06-18  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Potapenko, Christoph Lameter, Masahiro Yamada,
	Michal Hocko, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening

On Mon, Jun 17, 2019 at 03:10:27PM -0700, Andrew Morton wrote:
> On Mon, 17 Jun 2019 17:10:49 +0200 Alexander Potapenko <glider@google.com> wrote:
> 
> > Slowdown for the new features compared to init_on_free=0,
> > init_on_alloc=0:
> > 
> > hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> > hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
> 
> Sanity check time.  Is anyone really going to use this?  Seriously,
> honestly, for real?  If "yes" then how did we determine that?

Absolutely! This is expected to be on-by-default on Android and Chrome
OS. And it gives the opportunity for anyone else to use it under distros
too via the boot args. (The init_on_free feature is regularly requested
by folks where memory forensics is included in their thread models.)

As for the performance implications, the request during review was to
do that separately.

> Also, a bit of a nit: "init_on_alloc" and "init_on_free" aren't very
> well chosen names for the boot options - they could refer to any kernel
> object at all, really.  init_pages_on_alloc would be better?  I don't think
> this matters much - the boot options are already chaotic.  But still...

I agree; it's awkward. It covers both the page allocator and the slab
allocator, though, so naming it "page" seems not great. It's part of
a larger effort to auto-initialize all memory (stack auto-init has
been around in a few forms with the Clang support now in Linus's tree
for v5.2), and the feature has kind of ended up with the short name
of "meminit". As this is the "heap" side of "meminit", what about
"meminit.alloc=..." and "meminit.free=..." as alternative straw-men?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-18  5:07     ` Kees Cook
@ 2019-06-18  5:19       ` Andrew Morton
  2019-06-18  5:26         ` Kees Cook
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2019-06-18  5:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Potapenko, Christoph Lameter, Masahiro Yamada,
	Michal Hocko, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening

On Mon, 17 Jun 2019 22:07:41 -0700 Kees Cook <keescook@chromium.org> wrote:

> This is expected to be on-by-default on Android and Chrome
> OS. And it gives the opportunity for anyone else to use it under distros
> too via the boot args. (The init_on_free feature is regularly requested
> by folks where memory forensics is included in their thread models.)

Thanks.  I added the above to the changelog.  I assumed s/thread/threat/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-18  5:19       ` Andrew Morton
@ 2019-06-18  5:26         ` Kees Cook
  0 siblings, 0 replies; 18+ messages in thread
From: Kees Cook @ 2019-06-18  5:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Potapenko, Christoph Lameter, Masahiro Yamada,
	Michal Hocko, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening

On Mon, Jun 17, 2019 at 10:19:32PM -0700, Andrew Morton wrote:
> On Mon, 17 Jun 2019 22:07:41 -0700 Kees Cook <keescook@chromium.org> wrote:
> 
> > This is expected to be on-by-default on Android and Chrome
> > OS. And it gives the opportunity for anyone else to use it under distros
> > too via the boot args. (The init_on_free feature is regularly requested
> > by folks where memory forensics is included in their thread models.)
> 
> Thanks.  I added the above to the changelog.  I assumed s/thread/threat/

Heh whoops, yes, "threat" was intended. Thanks! :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-17 15:10 ` [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 " Alexander Potapenko
  2019-06-17 22:10   ` Andrew Morton
@ 2019-06-21  7:09   ` Michal Hocko
  2019-06-21  8:57     ` Alexander Potapenko
  2019-06-21 12:36   ` Qian Cai
  2 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2019-06-21  7:09 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening

On Mon 17-06-19 17:10:49, Alexander Potapenko wrote:
> The new options are needed to prevent possible information leaks and
> make control-flow bugs that depend on uninitialized values more
> deterministic.
> 
> init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> objects with zeroes. Initialization is done at allocation time at the
> places where checks for __GFP_ZERO are performed.
> 
> init_on_free=1 makes the kernel initialize freed pages and heap objects
> with zeroes upon their deletion. This helps to ensure sensitive data
> doesn't leak via use-after-free accesses.
> 
> Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
> returns zeroed memory. The two exceptions are slab caches with
> constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
> zero-initialized to preserve their semantics.
> 
> Both init_on_alloc and init_on_free default to zero, but those defaults
> can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
> CONFIG_INIT_ON_FREE_DEFAULT_ON.
> 
> Slowdown for the new features compared to init_on_free=0,
> init_on_alloc=0:
> 
> hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
> 
> Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
> 
> The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> baseline is within the standard error.
> 
> The new features are also going to pave the way for hardware memory
> tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
> hooks to set the tags for heap objects. With MTE, tagging will have the
> same cost as memory initialization.
> 
> Although init_on_free is rather costly, there are paranoid use-cases where
> in-memory data lifetime is desired to be minimized. There are various
> arguments for/against the realism of the associated threat models, but
> given that we'll need the infrastructre for MTE anyway, and there are
> people who want wipe-on-free behavior no matter what the performance cost,
> it seems reasonable to include it in this series.

Thanks for reworking the original implemenation. This looks much better!

> Signed-off-by: Alexander Potapenko <glider@google.com>
> Acked-by: Kees Cook <keescook@chromium.org>
> To: Andrew Morton <akpm@linux-foundation.org>
> To: Christoph Lameter <cl@linux.com>
> To: Kees Cook <keescook@chromium.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marco Elver <elver@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com

Acked-by: Michal Hocko <mhocko@suse.cz> # page allocator parts.

kmalloc based parts look good to me as well but I am not sure I fill
qualified to give my ack there without much more digging and I do not
have much time for that now.

[...]
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index fd5c95ff9251..2f75dd0d0d81 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  		arch_kexec_post_alloc_pages(page_address(pages), count,
>  					    gfp_mask);
>  
> -		if (gfp_mask & __GFP_ZERO)
> +		if (want_init_on_alloc(gfp_mask))
>  			for (i = 0; i < count; i++)
>  				clear_highpage(pages + i);
>  	}

I am not really sure I follow here. Why do we want to handle
want_init_on_alloc here? The allocated memory comes from the page
allocator and so it will get zeroed there. arch_kexec_post_alloc_pages
might touch the content there but is there any actual risk of any kind
of leak?

> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index 8c94c89a6f7e..e164012d3491 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
>  #endif
>  	spin_unlock_irqrestore(&pool->lock, flags);
>  
> -	if (mem_flags & __GFP_ZERO)
> +	if (want_init_on_alloc(mem_flags))
>  		memset(retval, 0, pool->size);
>  
>  	return retval;

Don't you miss dma_pool_free and want_init_on_free?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21  7:09   ` Michal Hocko
@ 2019-06-21  8:57     ` Alexander Potapenko
  2019-06-21  9:11       ` Michal Hocko
  2019-06-21 14:10       ` Alexander Potapenko
  0 siblings, 2 replies; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-21  8:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, Jun 21, 2019 at 9:09 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 17-06-19 17:10:49, Alexander Potapenko wrote:
> > The new options are needed to prevent possible information leaks and
> > make control-flow bugs that depend on uninitialized values more
> > deterministic.
> >
> > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> > objects with zeroes. Initialization is done at allocation time at the
> > places where checks for __GFP_ZERO are performed.
> >
> > init_on_free=1 makes the kernel initialize freed pages and heap objects
> > with zeroes upon their deletion. This helps to ensure sensitive data
> > doesn't leak via use-after-free accesses.
> >
> > Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
> > returns zeroed memory. The two exceptions are slab caches with
> > constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
> > zero-initialized to preserve their semantics.
> >
> > Both init_on_alloc and init_on_free default to zero, but those defaults
> > can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
> > CONFIG_INIT_ON_FREE_DEFAULT_ON.
> >
> > Slowdown for the new features compared to init_on_free=0,
> > init_on_alloc=0:
> >
> > hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> > hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
> >
> > Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> > Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> > Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> > Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
> >
> > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > baseline is within the standard error.
> >
> > The new features are also going to pave the way for hardware memory
> > tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
> > hooks to set the tags for heap objects. With MTE, tagging will have the
> > same cost as memory initialization.
> >
> > Although init_on_free is rather costly, there are paranoid use-cases where
> > in-memory data lifetime is desired to be minimized. There are various
> > arguments for/against the realism of the associated threat models, but
> > given that we'll need the infrastructre for MTE anyway, and there are
> > people who want wipe-on-free behavior no matter what the performance cost,
> > it seems reasonable to include it in this series.
>
> Thanks for reworking the original implemenation. This looks much better!
>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > Acked-by: Kees Cook <keescook@chromium.org>
> > To: Andrew Morton <akpm@linux-foundation.org>
> > To: Christoph Lameter <cl@linux.com>
> > To: Kees Cook <keescook@chromium.org>
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: James Morris <jmorris@namei.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > Cc: Kostya Serebryany <kcc@google.com>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Sandeep Patil <sspatil@android.com>
> > Cc: Laura Abbott <labbott@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Marco Elver <elver@google.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-hardening@lists.openwall.com
>
> Acked-by: Michal Hocko <mhocko@suse.cz> # page allocator parts.
>
> kmalloc based parts look good to me as well but I am not sure I fill
> qualified to give my ack there without much more digging and I do not
> have much time for that now.
>
> [...]
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index fd5c95ff9251..2f75dd0d0d81 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> >               arch_kexec_post_alloc_pages(page_address(pages), count,
> >                                           gfp_mask);
> >
> > -             if (gfp_mask & __GFP_ZERO)
> > +             if (want_init_on_alloc(gfp_mask))
> >                       for (i = 0; i < count; i++)
> >                               clear_highpage(pages + i);
> >       }
>
> I am not really sure I follow here. Why do we want to handle
> want_init_on_alloc here? The allocated memory comes from the page
> allocator and so it will get zeroed there. arch_kexec_post_alloc_pages
> might touch the content there but is there any actual risk of any kind
> of leak?
You're right, we don't want to initialize this memory if init_on_alloc is on.
We need something along the lines of:
  if (!static_branch_unlikely(&init_on_alloc))
    if (gfp_mask & __GFP_ZERO)
      // clear the pages

Another option would be to disable initialization in alloc_pages() using a flag.
>
> > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > index 8c94c89a6f7e..e164012d3491 100644
> > --- a/mm/dmapool.c
> > +++ b/mm/dmapool.c
> > @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> >  #endif
> >       spin_unlock_irqrestore(&pool->lock, flags);
> >
> > -     if (mem_flags & __GFP_ZERO)
> > +     if (want_init_on_alloc(mem_flags))
> >               memset(retval, 0, pool->size);
> >
> >       return retval;
>
> Don't you miss dma_pool_free and want_init_on_free?
Agreed.
I'll fix this and add tests for DMA pools as well.
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21  8:57     ` Alexander Potapenko
@ 2019-06-21  9:11       ` Michal Hocko
  2019-06-21  9:18         ` Alexander Potapenko
  2019-06-21 14:10       ` Alexander Potapenko
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2019-06-21  9:11 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri 21-06-19 10:57:35, Alexander Potapenko wrote:
> On Fri, Jun 21, 2019 at 9:09 AM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > > index fd5c95ff9251..2f75dd0d0d81 100644
> > > --- a/kernel/kexec_core.c
> > > +++ b/kernel/kexec_core.c
> > > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> > >               arch_kexec_post_alloc_pages(page_address(pages), count,
> > >                                           gfp_mask);
> > >
> > > -             if (gfp_mask & __GFP_ZERO)
> > > +             if (want_init_on_alloc(gfp_mask))
> > >                       for (i = 0; i < count; i++)
> > >                               clear_highpage(pages + i);
> > >       }
> >
> > I am not really sure I follow here. Why do we want to handle
> > want_init_on_alloc here? The allocated memory comes from the page
> > allocator and so it will get zeroed there. arch_kexec_post_alloc_pages
> > might touch the content there but is there any actual risk of any kind
> > of leak?
> You're right, we don't want to initialize this memory if init_on_alloc is on.
> We need something along the lines of:
>   if (!static_branch_unlikely(&init_on_alloc))
>     if (gfp_mask & __GFP_ZERO)
>       // clear the pages
> 
> Another option would be to disable initialization in alloc_pages() using a flag.

Or we can simply not care and keen the code the way it is. First of all
it seems that nobody actually does use __GFP_ZERO unless I have missed
soemthing
	- kimage_alloc_pages(KEXEC_CONTROL_MEMORY_GFP, order); # GFP_KERNEL | __GFP_NORETRY
	- kimage_alloc_pages(gfp_mask, 0);
		- kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
		- kimage_alloc_page(image, GFP_HIGHUSER, maddr);

but even if we actually had a user do we care about double intialization
for something kexec related? It is not any hot path AFAIR.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21  9:11       ` Michal Hocko
@ 2019-06-21  9:18         ` Alexander Potapenko
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-21  9:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, Jun 21, 2019 at 11:12 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 21-06-19 10:57:35, Alexander Potapenko wrote:
> > On Fri, Jun 21, 2019 at 9:09 AM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > > > index fd5c95ff9251..2f75dd0d0d81 100644
> > > > --- a/kernel/kexec_core.c
> > > > +++ b/kernel/kexec_core.c
> > > > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> > > >               arch_kexec_post_alloc_pages(page_address(pages), count,
> > > >                                           gfp_mask);
> > > >
> > > > -             if (gfp_mask & __GFP_ZERO)
> > > > +             if (want_init_on_alloc(gfp_mask))
> > > >                       for (i = 0; i < count; i++)
> > > >                               clear_highpage(pages + i);
> > > >       }
> > >
> > > I am not really sure I follow here. Why do we want to handle
> > > want_init_on_alloc here? The allocated memory comes from the page
> > > allocator and so it will get zeroed there. arch_kexec_post_alloc_pages
> > > might touch the content there but is there any actual risk of any kind
> > > of leak?
> > You're right, we don't want to initialize this memory if init_on_alloc is on.
> > We need something along the lines of:
> >   if (!static_branch_unlikely(&init_on_alloc))
> >     if (gfp_mask & __GFP_ZERO)
> >       // clear the pages
> >
> > Another option would be to disable initialization in alloc_pages() using a flag.
>
> Or we can simply not care and keen the code the way it is. First of all
> it seems that nobody actually does use __GFP_ZERO unless I have missed
> soemthing
>         - kimage_alloc_pages(KEXEC_CONTROL_MEMORY_GFP, order); # GFP_KERNEL | __GFP_NORETRY
>         - kimage_alloc_pages(gfp_mask, 0);
>                 - kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
>                 - kimage_alloc_page(image, GFP_HIGHUSER, maddr);
>
> but even if we actually had a user do we care about double intialization
> for something kexec related? It is not any hot path AFAIR.
Yes, sounds good. Spraying the code with too many checks for
init_on_alloc doesn't really look nice.

> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-17 15:10 ` [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 " Alexander Potapenko
  2019-06-17 22:10   ` Andrew Morton
  2019-06-21  7:09   ` Michal Hocko
@ 2019-06-21 12:36   ` Qian Cai
  2019-06-21 13:31     ` Alexander Potapenko
  2 siblings, 1 reply; 18+ messages in thread
From: Qian Cai @ 2019-06-21 12:36 UTC (permalink / raw)
  To: Alexander Potapenko, Andrew Morton, Christoph Lameter, Kees Cook
  Cc: Masahiro Yamada, Michal Hocko, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, Marco Elver, linux-mm, linux-security-module,
	kernel-hardening

On Mon, 2019-06-17 at 17:10 +0200, Alexander Potapenko wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8abe0af..50a3b104a491 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -136,6 +136,48 @@ unsigned long totalcma_pages __read_mostly;
>  
>  int percpu_pagelist_fraction;
>  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> +#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
> +DEFINE_STATIC_KEY_TRUE(init_on_alloc);
> +#else
> +DEFINE_STATIC_KEY_FALSE(init_on_alloc);
> +#endif
> +#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
> +DEFINE_STATIC_KEY_TRUE(init_on_free);
> +#else
> +DEFINE_STATIC_KEY_FALSE(init_on_free);
> +#endif
> +

There is a problem here running kernels built with clang,

[    0.000000] static_key_disable(): static key 'init_on_free+0x0/0x4' used
before call to jump_label_init()
[    0.000000] WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:314
early_init_on_free+0x1c0/0x200
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc5-next-20190620+
#11
[    0.000000] pstate: 60000089 (nZCv daIf -PAN -UAO)
[    0.000000] pc : early_init_on_free+0x1c0/0x200
[    0.000000] lr : early_init_on_free+0x1c0/0x200
[    0.000000] sp : ffff100012c07df0
[    0.000000] x29: ffff100012c07e20 x28: ffff1000110a01ec 
[    0.000000] x27: 0000000000000001 x26: ffff100011716f88 
[    0.000000] x25: ffff100010d367ae x24: ffff100010d367a5 
[    0.000000] x23: ffff100010d36afd x22: ffff100011716758 
[    0.000000] x21: 0000000000000000 x20: 0000000000000000 
[    0.000000] x19: 0000000000000000 x18: 000000000000002e 
[    0.000000] x17: 000000000000000f x16: 0000000000000040 
[    0.000000] x15: 0000000000000000 x14: 6d756a206f74206c 
[    0.000000] x13: 6c61632065726f66 x12: 6562206465737520 
[    0.000000] x11: 0000000000000000 x10: 0000000000000000 
[    0.000000] x9 : 0000000000000000 x8 : 0000000000000000 
[    0.000000] x7 : 73203a2928656c62 x6 : ffff1000144367ad 
[    0.000000] x5 : ffff100012c07b28 x4 : 000000000000000f 
[    0.000000] x3 : ffff1000101b36ec x2 : 0000000000000001 
[    0.000000] x1 : 0000000000000001 x0 : 000000000000005d 
[    0.000000] Call trace:
[    0.000000]  early_init_on_free+0x1c0/0x200
[    0.000000]  do_early_param+0xd0/0x104
[    0.000000]  parse_args+0x204/0x54c
[    0.000000]  parse_early_param+0x70/0x8c
[    0.000000]  setup_arch+0xa8/0x268
[    0.000000]  start_kernel+0x80/0x588
[    0.000000] random: get_random_bytes called from __warn+0x164/0x208 with
crng_init=0

> diff --git a/mm/slub.c b/mm/slub.c
> index cd04dbd2b5d0..9c4a8b9a955c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1279,6 +1279,12 @@ static int __init setup_slub_debug(char *str)
>  	if (*str == ',')
>  		slub_debug_slabs = str + 1;
>  out:
> +	if ((static_branch_unlikely(&init_on_alloc) ||
> +	     static_branch_unlikely(&init_on_free)) &&
> +	    (slub_debug & SLAB_POISON)) {
> +		pr_warn("disabling SLAB_POISON: can't be used together with
> memory auto-initialization\n");
> +		slub_debug &= ~SLAB_POISON;
> +	}
>  	return 1;
>  }

I don't think it is good idea to disable SLAB_POISON here as if people have
decided to enable SLUB_DEBUG later already, they probably care more to make sure
those additional checks with SLAB_POISON are still running to catch memory
corruption.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21 12:36   ` Qian Cai
@ 2019-06-21 13:31     ` Alexander Potapenko
  2019-06-21 13:36       ` Qian Cai
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-21 13:31 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	Michal Hocko, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, Jun 21, 2019 at 2:36 PM Qian Cai <cai@lca.pw> wrote:
>
> On Mon, 2019-06-17 at 17:10 +0200, Alexander Potapenko wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index d66bc8abe0af..50a3b104a491 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -136,6 +136,48 @@ unsigned long totalcma_pages __read_mostly;
> >
> >  int percpu_pagelist_fraction;
> >  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> > +#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
> > +DEFINE_STATIC_KEY_TRUE(init_on_alloc);
> > +#else
> > +DEFINE_STATIC_KEY_FALSE(init_on_alloc);
> > +#endif
> > +#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
> > +DEFINE_STATIC_KEY_TRUE(init_on_free);
> > +#else
> > +DEFINE_STATIC_KEY_FALSE(init_on_free);
> > +#endif
> > +
>
> There is a problem here running kernels built with clang,
>
> [    0.000000] static_key_disable(): static key 'init_on_free+0x0/0x4' used
> before call to jump_label_init()
> [    0.000000] WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:314
> early_init_on_free+0x1c0/0x200
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc5-next-20190620+
> #11
> [    0.000000] pstate: 60000089 (nZCv daIf -PAN -UAO)
> [    0.000000] pc : early_init_on_free+0x1c0/0x200
> [    0.000000] lr : early_init_on_free+0x1c0/0x200
> [    0.000000] sp : ffff100012c07df0
> [    0.000000] x29: ffff100012c07e20 x28: ffff1000110a01ec
> [    0.000000] x27: 0000000000000001 x26: ffff100011716f88
> [    0.000000] x25: ffff100010d367ae x24: ffff100010d367a5
> [    0.000000] x23: ffff100010d36afd x22: ffff100011716758
> [    0.000000] x21: 0000000000000000 x20: 0000000000000000
> [    0.000000] x19: 0000000000000000 x18: 000000000000002e
> [    0.000000] x17: 000000000000000f x16: 0000000000000040
> [    0.000000] x15: 0000000000000000 x14: 6d756a206f74206c
> [    0.000000] x13: 6c61632065726f66 x12: 6562206465737520
> [    0.000000] x11: 0000000000000000 x10: 0000000000000000
> [    0.000000] x9 : 0000000000000000 x8 : 0000000000000000
> [    0.000000] x7 : 73203a2928656c62 x6 : ffff1000144367ad
> [    0.000000] x5 : ffff100012c07b28 x4 : 000000000000000f
> [    0.000000] x3 : ffff1000101b36ec x2 : 0000000000000001
> [    0.000000] x1 : 0000000000000001 x0 : 000000000000005d
> [    0.000000] Call trace:
> [    0.000000]  early_init_on_free+0x1c0/0x200
> [    0.000000]  do_early_param+0xd0/0x104
> [    0.000000]  parse_args+0x204/0x54c
> [    0.000000]  parse_early_param+0x70/0x8c
> [    0.000000]  setup_arch+0xa8/0x268
> [    0.000000]  start_kernel+0x80/0x588
> [    0.000000] random: get_random_bytes called from __warn+0x164/0x208 with
> crng_init=0
>
> > diff --git a/mm/slub.c b/mm/slub.c
> > index cd04dbd2b5d0..9c4a8b9a955c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1279,6 +1279,12 @@ static int __init setup_slub_debug(char *str)
> >       if (*str == ',')
> >               slub_debug_slabs = str + 1;
> >  out:
> > +     if ((static_branch_unlikely(&init_on_alloc) ||
> > +          static_branch_unlikely(&init_on_free)) &&
> > +         (slub_debug & SLAB_POISON)) {
> > +             pr_warn("disabling SLAB_POISON: can't be used together with
> > memory auto-initialization\n");
> > +             slub_debug &= ~SLAB_POISON;
> > +     }
> >       return 1;
> >  }
>
> I don't think it is good idea to disable SLAB_POISON here as if people have
> decided to enable SLUB_DEBUG later already, they probably care more to make sure
> those additional checks with SLAB_POISON are still running to catch memory
> corruption.
The problem is that freed buffers can't be both poisoned and zeroed at
the same time.
Do you think we need to disable memory initialization in that case instead?


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21 13:31     ` Alexander Potapenko
@ 2019-06-21 13:36       ` Qian Cai
  0 siblings, 0 replies; 18+ messages in thread
From: Qian Cai @ 2019-06-21 13:36 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	Michal Hocko, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, 2019-06-21 at 15:31 +0200, Alexander Potapenko wrote:
> On Fri, Jun 21, 2019 at 2:36 PM Qian Cai <cai@lca.pw> wrote:
> > 
> > On Mon, 2019-06-17 at 17:10 +0200, Alexander Potapenko wrote:
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index d66bc8abe0af..50a3b104a491 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -136,6 +136,48 @@ unsigned long totalcma_pages __read_mostly;
> > > 
> > >  int percpu_pagelist_fraction;
> > >  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> > > +#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
> > > +DEFINE_STATIC_KEY_TRUE(init_on_alloc);
> > > +#else
> > > +DEFINE_STATIC_KEY_FALSE(init_on_alloc);
> > > +#endif
> > > +#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
> > > +DEFINE_STATIC_KEY_TRUE(init_on_free);
> > > +#else
> > > +DEFINE_STATIC_KEY_FALSE(init_on_free);
> > > +#endif
> > > +
> > 
> > There is a problem here running kernels built with clang,
> > 
> > [    0.000000] static_key_disable(): static key 'init_on_free+0x0/0x4' used
> > before call to jump_label_init()
> > [    0.000000] WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:314
> > early_init_on_free+0x1c0/0x200
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc5-next-
> > 20190620+
> > #11
> > [    0.000000] pstate: 60000089 (nZCv daIf -PAN -UAO)
> > [    0.000000] pc : early_init_on_free+0x1c0/0x200
> > [    0.000000] lr : early_init_on_free+0x1c0/0x200
> > [    0.000000] sp : ffff100012c07df0
> > [    0.000000] x29: ffff100012c07e20 x28: ffff1000110a01ec
> > [    0.000000] x27: 0000000000000001 x26: ffff100011716f88
> > [    0.000000] x25: ffff100010d367ae x24: ffff100010d367a5
> > [    0.000000] x23: ffff100010d36afd x22: ffff100011716758
> > [    0.000000] x21: 0000000000000000 x20: 0000000000000000
> > [    0.000000] x19: 0000000000000000 x18: 000000000000002e
> > [    0.000000] x17: 000000000000000f x16: 0000000000000040
> > [    0.000000] x15: 0000000000000000 x14: 6d756a206f74206c
> > [    0.000000] x13: 6c61632065726f66 x12: 6562206465737520
> > [    0.000000] x11: 0000000000000000 x10: 0000000000000000
> > [    0.000000] x9 : 0000000000000000 x8 : 0000000000000000
> > [    0.000000] x7 : 73203a2928656c62 x6 : ffff1000144367ad
> > [    0.000000] x5 : ffff100012c07b28 x4 : 000000000000000f
> > [    0.000000] x3 : ffff1000101b36ec x2 : 0000000000000001
> > [    0.000000] x1 : 0000000000000001 x0 : 000000000000005d
> > [    0.000000] Call trace:
> > [    0.000000]  early_init_on_free+0x1c0/0x200
> > [    0.000000]  do_early_param+0xd0/0x104
> > [    0.000000]  parse_args+0x204/0x54c
> > [    0.000000]  parse_early_param+0x70/0x8c
> > [    0.000000]  setup_arch+0xa8/0x268
> > [    0.000000]  start_kernel+0x80/0x588
> > [    0.000000] random: get_random_bytes called from __warn+0x164/0x208 with
> > crng_init=0
> > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index cd04dbd2b5d0..9c4a8b9a955c 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1279,6 +1279,12 @@ static int __init setup_slub_debug(char *str)
> > >       if (*str == ',')
> > >               slub_debug_slabs = str + 1;
> > >  out:
> > > +     if ((static_branch_unlikely(&init_on_alloc) ||
> > > +          static_branch_unlikely(&init_on_free)) &&
> > > +         (slub_debug & SLAB_POISON)) {
> > > +             pr_warn("disabling SLAB_POISON: can't be used together with
> > > memory auto-initialization\n");
> > > +             slub_debug &= ~SLAB_POISON;
> > > +     }
> > >       return 1;
> > >  }
> > 
> > I don't think it is good idea to disable SLAB_POISON here as if people have
> > decided to enable SLUB_DEBUG later already, they probably care more to make
> > sure
> > those additional checks with SLAB_POISON are still running to catch memory
> > corruption.
> 
> The problem is that freed buffers can't be both poisoned and zeroed at
> the same time.
> Do you think we need to disable memory initialization in that case instead?

Yes, disable init_on_free|alloc and keep SLAB_POISON sounds good to me.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21  8:57     ` Alexander Potapenko
  2019-06-21  9:11       ` Michal Hocko
@ 2019-06-21 14:10       ` Alexander Potapenko
  2019-06-21 15:12         ` Michal Hocko
  1 sibling, 1 reply; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-21 14:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, Jun 21, 2019 at 10:57 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Fri, Jun 21, 2019 at 9:09 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Mon 17-06-19 17:10:49, Alexander Potapenko wrote:
> > > The new options are needed to prevent possible information leaks and
> > > make control-flow bugs that depend on uninitialized values more
> > > deterministic.
> > >
> > > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> > > objects with zeroes. Initialization is done at allocation time at the
> > > places where checks for __GFP_ZERO are performed.
> > >
> > > init_on_free=1 makes the kernel initialize freed pages and heap objects
> > > with zeroes upon their deletion. This helps to ensure sensitive data
> > > doesn't leak via use-after-free accesses.
> > >
> > > Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
> > > returns zeroed memory. The two exceptions are slab caches with
> > > constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
> > > zero-initialized to preserve their semantics.
> > >
> > > Both init_on_alloc and init_on_free default to zero, but those defaults
> > > can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
> > > CONFIG_INIT_ON_FREE_DEFAULT_ON.
> > >
> > > Slowdown for the new features compared to init_on_free=0,
> > > init_on_alloc=0:
> > >
> > > hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> > > hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
> > >
> > > Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> > > Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> > > Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> > > Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
> > >
> > > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > > baseline is within the standard error.
> > >
> > > The new features are also going to pave the way for hardware memory
> > > tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
> > > hooks to set the tags for heap objects. With MTE, tagging will have the
> > > same cost as memory initialization.
> > >
> > > Although init_on_free is rather costly, there are paranoid use-cases where
> > > in-memory data lifetime is desired to be minimized. There are various
> > > arguments for/against the realism of the associated threat models, but
> > > given that we'll need the infrastructre for MTE anyway, and there are
> > > people who want wipe-on-free behavior no matter what the performance cost,
> > > it seems reasonable to include it in this series.
> >
> > Thanks for reworking the original implemenation. This looks much better!
> >
> > > Signed-off-by: Alexander Potapenko <glider@google.com>
> > > Acked-by: Kees Cook <keescook@chromium.org>
> > > To: Andrew Morton <akpm@linux-foundation.org>
> > > To: Christoph Lameter <cl@linux.com>
> > > To: Kees Cook <keescook@chromium.org>
> > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > Cc: Michal Hocko <mhocko@kernel.org>
> > > Cc: James Morris <jmorris@namei.org>
> > > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > > Cc: Kostya Serebryany <kcc@google.com>
> > > Cc: Dmitry Vyukov <dvyukov@google.com>
> > > Cc: Sandeep Patil <sspatil@android.com>
> > > Cc: Laura Abbott <labbott@redhat.com>
> > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > Cc: Jann Horn <jannh@google.com>
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Marco Elver <elver@google.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-security-module@vger.kernel.org
> > > Cc: kernel-hardening@lists.openwall.com
> >
> > Acked-by: Michal Hocko <mhocko@suse.cz> # page allocator parts.
> >
> > kmalloc based parts look good to me as well but I am not sure I fill
> > qualified to give my ack there without much more digging and I do not
> > have much time for that now.
> >
> > [...]
> > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > > index fd5c95ff9251..2f75dd0d0d81 100644
> > > --- a/kernel/kexec_core.c
> > > +++ b/kernel/kexec_core.c
> > > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> > >               arch_kexec_post_alloc_pages(page_address(pages), count,
> > >                                           gfp_mask);
> > >
> > > -             if (gfp_mask & __GFP_ZERO)
> > > +             if (want_init_on_alloc(gfp_mask))
> > >                       for (i = 0; i < count; i++)
> > >                               clear_highpage(pages + i);
> > >       }
> >
> > I am not really sure I follow here. Why do we want to handle
> > want_init_on_alloc here? The allocated memory comes from the page
> > allocator and so it will get zeroed there. arch_kexec_post_alloc_pages
> > might touch the content there but is there any actual risk of any kind
> > of leak?
> You're right, we don't want to initialize this memory if init_on_alloc is on.
> We need something along the lines of:
>   if (!static_branch_unlikely(&init_on_alloc))
>     if (gfp_mask & __GFP_ZERO)
>       // clear the pages
>
> Another option would be to disable initialization in alloc_pages() using a flag.
> >
> > > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > > index 8c94c89a6f7e..e164012d3491 100644
> > > --- a/mm/dmapool.c
> > > +++ b/mm/dmapool.c
> > > @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> > >  #endif
> > >       spin_unlock_irqrestore(&pool->lock, flags);
> > >
> > > -     if (mem_flags & __GFP_ZERO)
> > > +     if (want_init_on_alloc(mem_flags))
> > >               memset(retval, 0, pool->size);
> > >
> > >       return retval;
> >
> > Don't you miss dma_pool_free and want_init_on_free?
> Agreed.
> I'll fix this and add tests for DMA pools as well.
This doesn't seem to be easy though. One needs a real DMA-capable
device to allocate using DMA pools.
On the other hand, what happens to a DMA pool when it's destroyed,
isn't it wiped by pagealloc?

I'm inclined towards not touching mm/dmapool.c in this patch series,
as it is probably orthogonal to the idea of hardening the
heap/pagealloc.
> > --
> > Michal Hocko
> > SUSE Labs
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21 14:10       ` Alexander Potapenko
@ 2019-06-21 15:12         ` Michal Hocko
  2019-06-21 15:24           ` Alexander Potapenko
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2019-06-21 15:12 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri 21-06-19 16:10:19, Alexander Potapenko wrote:
> On Fri, Jun 21, 2019 at 10:57 AM Alexander Potapenko <glider@google.com> wrote:
[...]
> > > > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > > > index 8c94c89a6f7e..e164012d3491 100644
> > > > --- a/mm/dmapool.c
> > > > +++ b/mm/dmapool.c
> > > > @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> > > >  #endif
> > > >       spin_unlock_irqrestore(&pool->lock, flags);
> > > >
> > > > -     if (mem_flags & __GFP_ZERO)
> > > > +     if (want_init_on_alloc(mem_flags))
> > > >               memset(retval, 0, pool->size);
> > > >
> > > >       return retval;
> > >
> > > Don't you miss dma_pool_free and want_init_on_free?
> > Agreed.
> > I'll fix this and add tests for DMA pools as well.
> This doesn't seem to be easy though. One needs a real DMA-capable
> device to allocate using DMA pools.
> On the other hand, what happens to a DMA pool when it's destroyed,
> isn't it wiped by pagealloc?

Yes it should be returned to the page allocator AFAIR. But it is when we
are returning an object to the pool when you want to wipe the data, no?
Why cannot you do it along the already existing poisoning?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21 15:12         ` Michal Hocko
@ 2019-06-21 15:24           ` Alexander Potapenko
  2019-06-21 15:54             ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Potapenko @ 2019-06-21 15:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, Jun 21, 2019 at 5:12 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 21-06-19 16:10:19, Alexander Potapenko wrote:
> > On Fri, Jun 21, 2019 at 10:57 AM Alexander Potapenko <glider@google.com> wrote:
> [...]
> > > > > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > > > > index 8c94c89a6f7e..e164012d3491 100644
> > > > > --- a/mm/dmapool.c
> > > > > +++ b/mm/dmapool.c
> > > > > @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> > > > >  #endif
> > > > >       spin_unlock_irqrestore(&pool->lock, flags);
> > > > >
> > > > > -     if (mem_flags & __GFP_ZERO)
> > > > > +     if (want_init_on_alloc(mem_flags))
> > > > >               memset(retval, 0, pool->size);
> > > > >
> > > > >       return retval;
> > > >
> > > > Don't you miss dma_pool_free and want_init_on_free?
> > > Agreed.
> > > I'll fix this and add tests for DMA pools as well.
> > This doesn't seem to be easy though. One needs a real DMA-capable
> > device to allocate using DMA pools.
> > On the other hand, what happens to a DMA pool when it's destroyed,
> > isn't it wiped by pagealloc?
>
> Yes it should be returned to the page allocator AFAIR. But it is when we
> are returning an object to the pool when you want to wipe the data, no?
My concern was that dma allocation is something orthogonal to heap and
page allocator.
I also don't know how many other allocators are left overboard, e.g.
we don't do anything to lib/genalloc.c yet.

> Why cannot you do it along the already existing poisoning?
I can sure keep these bits.
Any idea how the correct behavior of dma_pool_alloc/free can be tested?
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-06-21 15:24           ` Alexander Potapenko
@ 2019-06-21 15:54             ` Michal Hocko
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2019-06-21 15:54 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri 21-06-19 17:24:21, Alexander Potapenko wrote:
> On Fri, Jun 21, 2019 at 5:12 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 21-06-19 16:10:19, Alexander Potapenko wrote:
> > > On Fri, Jun 21, 2019 at 10:57 AM Alexander Potapenko <glider@google.com> wrote:
> > [...]
> > > > > > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > > > > > index 8c94c89a6f7e..e164012d3491 100644
> > > > > > --- a/mm/dmapool.c
> > > > > > +++ b/mm/dmapool.c
> > > > > > @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> > > > > >  #endif
> > > > > >       spin_unlock_irqrestore(&pool->lock, flags);
> > > > > >
> > > > > > -     if (mem_flags & __GFP_ZERO)
> > > > > > +     if (want_init_on_alloc(mem_flags))
> > > > > >               memset(retval, 0, pool->size);
> > > > > >
> > > > > >       return retval;
> > > > >
> > > > > Don't you miss dma_pool_free and want_init_on_free?
> > > > Agreed.
> > > > I'll fix this and add tests for DMA pools as well.
> > > This doesn't seem to be easy though. One needs a real DMA-capable
> > > device to allocate using DMA pools.
> > > On the other hand, what happens to a DMA pool when it's destroyed,
> > > isn't it wiped by pagealloc?
> >
> > Yes it should be returned to the page allocator AFAIR. But it is when we
> > are returning an object to the pool when you want to wipe the data, no?
> My concern was that dma allocation is something orthogonal to heap and
> page allocator.
> I also don't know how many other allocators are left overboard, e.g.
> we don't do anything to lib/genalloc.c yet.

Well, that really depends what would you like to achieve by this
functionality. There are likely to be all sorts of allocators on top of
the core ones (e.g. mempool allocator). The question is whether you
really want to cover them all. Are they security relevant?

> > Why cannot you do it along the already existing poisoning?
> I can sure keep these bits.
> Any idea how the correct behavior of dma_pool_alloc/free can be tested?

Well, I would say that you have to rely on the review process here more
than any specific testing. In any case other allocators can be handled
incrementally. This is not all or nothing kinda stuff. I have pointed
out dma_pool because it only addresses one half of the work and it was
not clear why. If you want to drop dma_pool then this will be fine by
me. As this is a hardening feature you want to get coverage as large as
possible rather than 100%.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-06-21 15:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-17 15:10 [PATCH v7 0/3] add init_on_alloc/init_on_free boot options Alexander Potapenko
2019-06-17 15:10 ` [PATCH v7 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 " Alexander Potapenko
2019-06-17 22:10   ` Andrew Morton
2019-06-18  5:07     ` Kees Cook
2019-06-18  5:19       ` Andrew Morton
2019-06-18  5:26         ` Kees Cook
2019-06-21  7:09   ` Michal Hocko
2019-06-21  8:57     ` Alexander Potapenko
2019-06-21  9:11       ` Michal Hocko
2019-06-21  9:18         ` Alexander Potapenko
2019-06-21 14:10       ` Alexander Potapenko
2019-06-21 15:12         ` Michal Hocko
2019-06-21 15:24           ` Alexander Potapenko
2019-06-21 15:54             ` Michal Hocko
2019-06-21 12:36   ` Qian Cai
2019-06-21 13:31     ` Alexander Potapenko
2019-06-21 13:36       ` Qian Cai
2019-06-17 15:10 ` [PATCH v7 2/2] mm: init: report memory auto-initialization features at boot time Alexander Potapenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).