All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] RFC: add init_on_alloc/init_on_free boot options
@ 2019-05-14 14:35 Alexander Potapenko
  2019-05-14 14:35   ` Alexander Potapenko
                   ` (3 more replies)
  0 siblings, 4 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook; +Cc: kernel-hardening

Provide init_on_alloc and init_on_free boot options.

These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.

Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AOU]B is initialized with zeroes.

Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.

Alexander Potapenko (4):
  mm: security: introduce init_on_alloc=1 and init_on_free=1 boot
    options
  lib: introduce test_meminit module
  gfp: mm: introduce __GFP_NO_AUTOINIT
  net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations

 .../admin-guide/kernel-parameters.txt         |   8 +
 drivers/infiniband/core/uverbs_ioctl.c        |   2 +-
 include/linux/gfp.h                           |  13 +-
 include/linux/mm.h                            |  22 ++
 include/net/sock.h                            |   5 +
 kernel/kexec_core.c                           |   5 +-
 lib/Kconfig.debug                             |   8 +
 lib/Makefile                                  |   1 +
 lib/test_meminit.c                            | 205 ++++++++++++++++++
 mm/dmapool.c                                  |   2 +-
 mm/page_alloc.c                               |  68 +++++-
 mm/slab.c                                     |  18 +-
 mm/slab.h                                     |  16 ++
 mm/slob.c                                     |  23 +-
 mm/slub.c                                     |  28 ++-
 net/core/sock.c                               |  31 ++-
 net/unix/af_unix.c                            |  13 +-
 security/Kconfig.hardening                    |  14 ++
 18 files changed, 443 insertions(+), 39 deletions(-)
 create mode 100644 lib/test_meminit.c

-- 
2.21.0.1020.gf2820cf01a-goog

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-14 14:35 [PATCH v2 0/4] RFC: add init_on_alloc/init_on_free boot options Alexander Potapenko
@ 2019-05-14 14:35   ` Alexander Potapenko
  2019-05-14 14:35   ` Alexander Potapenko
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, linux-mm, linux-security-module

The new options are needed to prevent possible information leaks and
make control-flow bugs that depend on uninitialized values more
deterministic.

init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.

init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.

Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The only exception is slab caches with
constructors. Those are never zero-initialized to preserve their semantics.

For SLOB allocator init_on_free=1 also implies init_on_alloc=1 behavior,
i.e. objects are zeroed at both allocation and deallocation time.
This is done because SLOB may otherwise return multiple freelist pointers
in the allocated object. For SLAB and SLUB enabling either init_on_alloc
or init_on_free leads to one-time initialization of the object.

Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.

Slowdown for the new features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - unconditionally initialize pages in kernel_init_free_pages()
  - comment from Randy Dunlap: drop 'default false' lines from Kconfig.hardening
---
 .../admin-guide/kernel-parameters.txt         |  8 +++
 drivers/infiniband/core/uverbs_ioctl.c        |  2 +-
 include/linux/mm.h                            | 22 ++++++
 kernel/kexec_core.c                           |  2 +-
 mm/dmapool.c                                  |  2 +-
 mm/page_alloc.c                               | 68 ++++++++++++++++---
 mm/slab.c                                     | 16 ++++-
 mm/slab.h                                     | 16 +++++
 mm/slob.c                                     | 22 +++++-
 mm/slub.c                                     | 27 ++++++--
 net/core/sock.c                               |  2 +-
 security/Kconfig.hardening                    | 14 ++++
 12 files changed, 178 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 08df58805703..cece9a56ddb1 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1673,6 +1673,14 @@
 
 	initrd=		[BOOT] Specify the location of the initial ramdisk
 
+	init_on_alloc=	[MM] Fill newly allocated pages and heap objects with
+			zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
+	init_on_free=	[MM] Fill freed pages and heap objects with zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
+
 	init_pkru=	[x86] Specify the default memory protection keys rights
 			register contents for all processes.  0x55555554 by
 			default (disallow access to all but pkey 0).  Can
diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index 829b0c6944d8..61758201d9b2 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
 	pbundle->internal_used =
 		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
-	if (flags & __GFP_ZERO)
+	if (want_init_on_alloc(flags))
 		memset(res, 0, size);
 	return res;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 083d7b4863ed..18d96f1d07c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2610,6 +2610,28 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+static inline bool want_init_on_alloc(gfp_t flags)
+{
+	if (static_branch_unlikely(&init_on_alloc))
+		return true;
+	return flags & __GFP_ZERO;
+}
+
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_free);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_free);
+#endif
+static inline bool want_init_on_free(void)
+{
+	return static_branch_unlikely(&init_on_free);
+}
+
 extern bool _debug_pagealloc_enabled;
 
 static inline bool debug_pagealloc_enabled(void)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index fd5c95ff9251..2f75dd0d0d81 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		arch_kexec_post_alloc_pages(page_address(pages), count,
 					    gfp_mask);
 
-		if (gfp_mask & __GFP_ZERO)
+		if (want_init_on_alloc(gfp_mask))
 			for (i = 0; i < count; i++)
 				clear_highpage(pages + i);
 	}
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 76a160083506..493d151067cb 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 #endif
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (mem_flags & __GFP_ZERO)
+	if (want_init_on_alloc(mem_flags))
 		memset(retval, 0, pool->size);
 
 	return retval;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59661106da16..463c681a3633 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -133,6 +133,48 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_free);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_free);
+#endif
+
+static int __init early_init_on_alloc(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result)
+		static_branch_enable(&init_on_alloc);
+	else
+		static_branch_disable(&init_on_alloc);
+	return ret;
+}
+early_param("init_on_alloc", early_init_on_alloc);
+
+static int __init early_init_on_free(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result)
+		static_branch_enable(&init_on_free);
+	else
+		static_branch_disable(&init_on_free);
+	return ret;
+}
+early_param("init_on_free", early_init_on_free);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
@@ -1092,6 +1134,14 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return ret;
 }
 
+static void kernel_init_free_pages(struct page *page, int numpages)
+{
+	int i;
+
+	for (i = 0; i < numpages; i++)
+		clear_highpage(page + i);
+}
+
 static __always_inline bool free_pages_prepare(struct page *page,
 					unsigned int order, bool check_free)
 {
@@ -1144,9 +1194,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	}
 	arch_free_page(page, order);
 	kernel_poison_pages(page, 1 << order, 0);
+	if (want_init_on_free())
+		kernel_init_free_pages(page, 1 << order);
 	if (debug_pagealloc_enabled())
 		kernel_map_pages(page, 1 << order, 0);
-
 	kasan_free_nondeferred_pages(page, order);
 
 	return true;
@@ -1452,8 +1503,10 @@ meminit_pfn_in_nid(unsigned long pfn, int node,
 void __init memblock_free_pages(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
-	if (early_page_uninitialised(pfn))
+	if (early_page_uninitialised(pfn)) {
+		kernel_init_free_pages(page, 1 << order);
 		return;
+	}
 	__free_pages_core(page, order);
 }
 
@@ -1971,8 +2024,8 @@ static inline int check_new_page(struct page *page)
 
 static inline bool free_pages_prezeroed(void)
 {
-	return IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
-		page_poisoning_enabled();
+	return (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
+		page_poisoning_enabled()) || want_init_on_free();
 }
 
 #ifdef CONFIG_DEBUG_VM
@@ -2026,13 +2079,10 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
 							unsigned int alloc_flags)
 {
-	int i;
-
 	post_alloc_hook(page, order, gfp_flags);
 
-	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
-		for (i = 0; i < (1 << order); i++)
-			clear_highpage(page + i);
+	if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags))
+		kernel_init_free_pages(page, 1 << order);
 
 	if (order && (gfp_flags & __GFP_COMP))
 		prep_compound_page(page, order);
diff --git a/mm/slab.c b/mm/slab.c
index 284ab737faee..d00e9de26a45 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1855,6 +1855,14 @@ static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
 
 	cachep->num = 0;
 
+	/*
+	 * If slab auto-initialization on free is enabled, store the freelist
+	 * off-slab, so that its contents don't end up in one of the allocated
+	 * objects.
+	 */
+	if (unlikely(slab_want_init_on_free(cachep)))
+		return false;
+
 	if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU)
 		return false;
 
@@ -3294,7 +3302,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 
-	if (unlikely(flags & __GFP_ZERO) && ptr)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
 		memset(ptr, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &ptr);
@@ -3351,7 +3359,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
 
-	if (unlikely(flags & __GFP_ZERO) && objp)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
 		memset(objp, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &objp);
@@ -3472,6 +3480,8 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
 	struct array_cache *ac = cpu_cache_get(cachep);
 
 	check_irq_off();
+	if (unlikely(slab_want_init_on_free(cachep)))
+		memset(objp, 0, cachep->object_size);
 	kmemleak_free_recursive(objp, cachep->flags);
 	objp = cache_free_debugcheck(cachep, objp, caller);
 
@@ -3559,7 +3569,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
 
 	/* Clear memory outside IRQ disabled section */
-	if (unlikely(flags & __GFP_ZERO))
+	if (unlikely(slab_want_init_on_alloc(flags, s)))
 		for (i = 0; i < size; i++)
 			memset(p[i], 0, s->object_size);
 
diff --git a/mm/slab.h b/mm/slab.h
index 43ac818b8592..24ae887359b8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -524,4 +524,20 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
 static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
+static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_alloc))
+		return !(c->ctor);
+	else
+		return flags & __GFP_ZERO;
+}
+
+static inline bool slab_want_init_on_free(struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_free))
+		return !(c->ctor);
+	else
+		return false;
+}
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slob.c b/mm/slob.c
index 307c2c9feb44..351d3dfee000 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -212,6 +212,19 @@ static void slob_free_pages(void *b, int order)
 	free_pages((unsigned long)b, order);
 }
 
+/*
+ * init_on_free=1 also implies initialization at allocation time.
+ * This is because newly allocated objects may contain freelist pointers
+ * somewhere in the middle.
+ */
+static inline bool slob_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_alloc) ||
+	    static_branch_unlikely(&init_on_free))
+		return c ? (!c->ctor) : true;
+	return flags & __GFP_ZERO;
+}
+
 /*
  * Allocate a slob block within a given slob_page sp.
  */
@@ -330,8 +343,6 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
-	if (unlikely(gfp & __GFP_ZERO))
-		memset(b, 0, size);
 	return b;
 }
 
@@ -366,6 +377,9 @@ static void slob_free(void *block, int size)
 		return;
 	}
 
+	if (unlikely(want_init_on_free()))
+		memset(block, 0, size);
+
 	if (!slob_page_free(sp)) {
 		/* This slob page is about to become partially free. Easy! */
 		sp->units = units;
@@ -461,6 +475,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 	}
 
 	kmemleak_alloc(ret, size, 1, gfp);
+	if (unlikely(slob_want_init_on_alloc(gfp, 0)))
+		memset(ret, 0, size);
 	return ret;
 }
 
@@ -559,6 +575,8 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 		WARN_ON_ONCE(flags & __GFP_ZERO);
 		c->ctor(b);
 	}
+	if (unlikely(slob_want_init_on_alloc(flags, c)))
+		memset(b, 0, c->size);
 
 	kmemleak_alloc_recursive(b, c->size, 1, c->flags, flags);
 	return b;
diff --git a/mm/slub.c b/mm/slub.c
index 6b28cd2b5a58..01424e910800 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1423,6 +1423,19 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s, void *x)
 static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 					   void **head, void **tail)
 {
+
+	void *object;
+	void *next = *head;
+	void *old_tail = *tail ? *tail : *head;
+
+	if (slab_want_init_on_free(s))
+		do {
+			object = next;
+			next = get_freepointer(s, object);
+			memset(object, 0, s->size);
+			set_freepointer(s, object, next);
+		} while (object != old_tail);
+
 /*
  * Compiler cannot detect this function can be removed if slab_free_hook()
  * evaluates to nothing.  Thus, catch all relevant config debug options here.
@@ -1432,9 +1445,7 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 	defined(CONFIG_DEBUG_OBJECTS_FREE) ||	\
 	defined(CONFIG_KASAN)
 
-	void *object;
-	void *next = *head;
-	void *old_tail = *tail ? *tail : *head;
+	next = *head;
 
 	/* Head and tail of the reconstructed freelist */
 	*head = NULL;
@@ -2740,8 +2751,14 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		prefetch_freepointer(s, next_object);
 		stat(s, ALLOC_FASTPATH);
 	}
+	/*
+	 * If the object has been wiped upon free, make sure it's fully
+	 * initialized by zeroing out freelist pointer.
+	 */
+	if (slab_want_init_on_free(s))
+		*(void **)object = 0;
 
-	if (unlikely(gfpflags & __GFP_ZERO) && object)
+	if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
 		memset(object, 0, s->object_size);
 
 	slab_post_alloc_hook(s, gfpflags, 1, &object);
@@ -3163,7 +3180,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	local_irq_enable();
 
 	/* Clear memory outside IRQ disabled fastpath loop */
-	if (unlikely(flags & __GFP_ZERO)) {
+	if (unlikely(slab_want_init_on_alloc(flags, s))) {
 		int j;
 
 		for (j = 0; j < i; j++)
diff --git a/net/core/sock.c b/net/core/sock.c
index 75b1c950b49f..9ceb90c875bc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1602,7 +1602,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
 		if (!sk)
 			return sk;
-		if (priority & __GFP_ZERO)
+		if (want_init_on_alloc(priority))
 			sk_prot_clear_nulls(sk, prot->obj_size);
 	} else
 		sk = kmalloc(prot->obj_size, priority);
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0a1d4ca314f4..87883e3e3c2a 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -159,6 +159,20 @@ config STACKLEAK_RUNTIME_DISABLE
 	  runtime to control kernel stack erasing for kernels built with
 	  CONFIG_GCC_PLUGIN_STACKLEAK.
 
+config INIT_ON_ALLOC_DEFAULT_ON
+	bool "Set init_on_alloc=1 by default"
+	help
+	  Enable init_on_alloc=1 by default, making the kernel initialize every
+	  page and heap allocation with zeroes.
+	  init_on_alloc can be overridden via command line.
+
+config INIT_ON_FREE_DEFAULT_ON
+	bool "Set init_on_free=1 by default"
+	help
+	  Enable init_on_free=1 by default, making the kernel initialize freed
+	  pages and slab memory with zeroes.
+	  init_on_free can be overridden via command line.
+
 endmenu
 
 endmenu
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
@ 2019-05-14 14:35   ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, linux-mm, linux-security-module

The new options are needed to prevent possible information leaks and
make control-flow bugs that depend on uninitialized values more
deterministic.

init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.

init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.

Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The only exception is slab caches with
constructors. Those are never zero-initialized to preserve their semantics.

For SLOB allocator init_on_free=1 also implies init_on_alloc=1 behavior,
i.e. objects are zeroed at both allocation and deallocation time.
This is done because SLOB may otherwise return multiple freelist pointers
in the allocated object. For SLAB and SLUB enabling either init_on_alloc
or init_on_free leads to one-time initialization of the object.

Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.

Slowdown for the new features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - unconditionally initialize pages in kernel_init_free_pages()
  - comment from Randy Dunlap: drop 'default false' lines from Kconfig.hardening
---
 .../admin-guide/kernel-parameters.txt         |  8 +++
 drivers/infiniband/core/uverbs_ioctl.c        |  2 +-
 include/linux/mm.h                            | 22 ++++++
 kernel/kexec_core.c                           |  2 +-
 mm/dmapool.c                                  |  2 +-
 mm/page_alloc.c                               | 68 ++++++++++++++++---
 mm/slab.c                                     | 16 ++++-
 mm/slab.h                                     | 16 +++++
 mm/slob.c                                     | 22 +++++-
 mm/slub.c                                     | 27 ++++++--
 net/core/sock.c                               |  2 +-
 security/Kconfig.hardening                    | 14 ++++
 12 files changed, 178 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 08df58805703..cece9a56ddb1 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1673,6 +1673,14 @@
 
 	initrd=		[BOOT] Specify the location of the initial ramdisk
 
+	init_on_alloc=	[MM] Fill newly allocated pages and heap objects with
+			zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
+	init_on_free=	[MM] Fill freed pages and heap objects with zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
+
 	init_pkru=	[x86] Specify the default memory protection keys rights
 			register contents for all processes.  0x55555554 by
 			default (disallow access to all but pkey 0).  Can
diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index 829b0c6944d8..61758201d9b2 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
 	pbundle->internal_used =
 		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
-	if (flags & __GFP_ZERO)
+	if (want_init_on_alloc(flags))
 		memset(res, 0, size);
 	return res;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 083d7b4863ed..18d96f1d07c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2610,6 +2610,28 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+static inline bool want_init_on_alloc(gfp_t flags)
+{
+	if (static_branch_unlikely(&init_on_alloc))
+		return true;
+	return flags & __GFP_ZERO;
+}
+
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_free);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_free);
+#endif
+static inline bool want_init_on_free(void)
+{
+	return static_branch_unlikely(&init_on_free);
+}
+
 extern bool _debug_pagealloc_enabled;
 
 static inline bool debug_pagealloc_enabled(void)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index fd5c95ff9251..2f75dd0d0d81 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		arch_kexec_post_alloc_pages(page_address(pages), count,
 					    gfp_mask);
 
-		if (gfp_mask & __GFP_ZERO)
+		if (want_init_on_alloc(gfp_mask))
 			for (i = 0; i < count; i++)
 				clear_highpage(pages + i);
 	}
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 76a160083506..493d151067cb 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 #endif
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (mem_flags & __GFP_ZERO)
+	if (want_init_on_alloc(mem_flags))
 		memset(retval, 0, pool->size);
 
 	return retval;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59661106da16..463c681a3633 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -133,6 +133,48 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_free);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_free);
+#endif
+
+static int __init early_init_on_alloc(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result)
+		static_branch_enable(&init_on_alloc);
+	else
+		static_branch_disable(&init_on_alloc);
+	return ret;
+}
+early_param("init_on_alloc", early_init_on_alloc);
+
+static int __init early_init_on_free(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result)
+		static_branch_enable(&init_on_free);
+	else
+		static_branch_disable(&init_on_free);
+	return ret;
+}
+early_param("init_on_free", early_init_on_free);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
@@ -1092,6 +1134,14 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return ret;
 }
 
+static void kernel_init_free_pages(struct page *page, int numpages)
+{
+	int i;
+
+	for (i = 0; i < numpages; i++)
+		clear_highpage(page + i);
+}
+
 static __always_inline bool free_pages_prepare(struct page *page,
 					unsigned int order, bool check_free)
 {
@@ -1144,9 +1194,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	}
 	arch_free_page(page, order);
 	kernel_poison_pages(page, 1 << order, 0);
+	if (want_init_on_free())
+		kernel_init_free_pages(page, 1 << order);
 	if (debug_pagealloc_enabled())
 		kernel_map_pages(page, 1 << order, 0);
-
 	kasan_free_nondeferred_pages(page, order);
 
 	return true;
@@ -1452,8 +1503,10 @@ meminit_pfn_in_nid(unsigned long pfn, int node,
 void __init memblock_free_pages(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
-	if (early_page_uninitialised(pfn))
+	if (early_page_uninitialised(pfn)) {
+		kernel_init_free_pages(page, 1 << order);
 		return;
+	}
 	__free_pages_core(page, order);
 }
 
@@ -1971,8 +2024,8 @@ static inline int check_new_page(struct page *page)
 
 static inline bool free_pages_prezeroed(void)
 {
-	return IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
-		page_poisoning_enabled();
+	return (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
+		page_poisoning_enabled()) || want_init_on_free();
 }
 
 #ifdef CONFIG_DEBUG_VM
@@ -2026,13 +2079,10 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
 							unsigned int alloc_flags)
 {
-	int i;
-
 	post_alloc_hook(page, order, gfp_flags);
 
-	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
-		for (i = 0; i < (1 << order); i++)
-			clear_highpage(page + i);
+	if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags))
+		kernel_init_free_pages(page, 1 << order);
 
 	if (order && (gfp_flags & __GFP_COMP))
 		prep_compound_page(page, order);
diff --git a/mm/slab.c b/mm/slab.c
index 284ab737faee..d00e9de26a45 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1855,6 +1855,14 @@ static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
 
 	cachep->num = 0;
 
+	/*
+	 * If slab auto-initialization on free is enabled, store the freelist
+	 * off-slab, so that its contents don't end up in one of the allocated
+	 * objects.
+	 */
+	if (unlikely(slab_want_init_on_free(cachep)))
+		return false;
+
 	if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU)
 		return false;
 
@@ -3294,7 +3302,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 
-	if (unlikely(flags & __GFP_ZERO) && ptr)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
 		memset(ptr, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &ptr);
@@ -3351,7 +3359,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
 
-	if (unlikely(flags & __GFP_ZERO) && objp)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
 		memset(objp, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &objp);
@@ -3472,6 +3480,8 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
 	struct array_cache *ac = cpu_cache_get(cachep);
 
 	check_irq_off();
+	if (unlikely(slab_want_init_on_free(cachep)))
+		memset(objp, 0, cachep->object_size);
 	kmemleak_free_recursive(objp, cachep->flags);
 	objp = cache_free_debugcheck(cachep, objp, caller);
 
@@ -3559,7 +3569,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
 
 	/* Clear memory outside IRQ disabled section */
-	if (unlikely(flags & __GFP_ZERO))
+	if (unlikely(slab_want_init_on_alloc(flags, s)))
 		for (i = 0; i < size; i++)
 			memset(p[i], 0, s->object_size);
 
diff --git a/mm/slab.h b/mm/slab.h
index 43ac818b8592..24ae887359b8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -524,4 +524,20 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
 static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
+static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_alloc))
+		return !(c->ctor);
+	else
+		return flags & __GFP_ZERO;
+}
+
+static inline bool slab_want_init_on_free(struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_free))
+		return !(c->ctor);
+	else
+		return false;
+}
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slob.c b/mm/slob.c
index 307c2c9feb44..351d3dfee000 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -212,6 +212,19 @@ static void slob_free_pages(void *b, int order)
 	free_pages((unsigned long)b, order);
 }
 
+/*
+ * init_on_free=1 also implies initialization at allocation time.
+ * This is because newly allocated objects may contain freelist pointers
+ * somewhere in the middle.
+ */
+static inline bool slob_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_alloc) ||
+	    static_branch_unlikely(&init_on_free))
+		return c ? (!c->ctor) : true;
+	return flags & __GFP_ZERO;
+}
+
 /*
  * Allocate a slob block within a given slob_page sp.
  */
@@ -330,8 +343,6 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
-	if (unlikely(gfp & __GFP_ZERO))
-		memset(b, 0, size);
 	return b;
 }
 
@@ -366,6 +377,9 @@ static void slob_free(void *block, int size)
 		return;
 	}
 
+	if (unlikely(want_init_on_free()))
+		memset(block, 0, size);
+
 	if (!slob_page_free(sp)) {
 		/* This slob page is about to become partially free. Easy! */
 		sp->units = units;
@@ -461,6 +475,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 	}
 
 	kmemleak_alloc(ret, size, 1, gfp);
+	if (unlikely(slob_want_init_on_alloc(gfp, 0)))
+		memset(ret, 0, size);
 	return ret;
 }
 
@@ -559,6 +575,8 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 		WARN_ON_ONCE(flags & __GFP_ZERO);
 		c->ctor(b);
 	}
+	if (unlikely(slob_want_init_on_alloc(flags, c)))
+		memset(b, 0, c->size);
 
 	kmemleak_alloc_recursive(b, c->size, 1, c->flags, flags);
 	return b;
diff --git a/mm/slub.c b/mm/slub.c
index 6b28cd2b5a58..01424e910800 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1423,6 +1423,19 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s, void *x)
 static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 					   void **head, void **tail)
 {
+
+	void *object;
+	void *next = *head;
+	void *old_tail = *tail ? *tail : *head;
+
+	if (slab_want_init_on_free(s))
+		do {
+			object = next;
+			next = get_freepointer(s, object);
+			memset(object, 0, s->size);
+			set_freepointer(s, object, next);
+		} while (object != old_tail);
+
 /*
  * Compiler cannot detect this function can be removed if slab_free_hook()
  * evaluates to nothing.  Thus, catch all relevant config debug options here.
@@ -1432,9 +1445,7 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 	defined(CONFIG_DEBUG_OBJECTS_FREE) ||	\
 	defined(CONFIG_KASAN)
 
-	void *object;
-	void *next = *head;
-	void *old_tail = *tail ? *tail : *head;
+	next = *head;
 
 	/* Head and tail of the reconstructed freelist */
 	*head = NULL;
@@ -2740,8 +2751,14 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		prefetch_freepointer(s, next_object);
 		stat(s, ALLOC_FASTPATH);
 	}
+	/*
+	 * If the object has been wiped upon free, make sure it's fully
+	 * initialized by zeroing out freelist pointer.
+	 */
+	if (slab_want_init_on_free(s))
+		*(void **)object = 0;
 
-	if (unlikely(gfpflags & __GFP_ZERO) && object)
+	if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
 		memset(object, 0, s->object_size);
 
 	slab_post_alloc_hook(s, gfpflags, 1, &object);
@@ -3163,7 +3180,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	local_irq_enable();
 
 	/* Clear memory outside IRQ disabled fastpath loop */
-	if (unlikely(flags & __GFP_ZERO)) {
+	if (unlikely(slab_want_init_on_alloc(flags, s))) {
 		int j;
 
 		for (j = 0; j < i; j++)
diff --git a/net/core/sock.c b/net/core/sock.c
index 75b1c950b49f..9ceb90c875bc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1602,7 +1602,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
 		if (!sk)
 			return sk;
-		if (priority & __GFP_ZERO)
+		if (want_init_on_alloc(priority))
 			sk_prot_clear_nulls(sk, prot->obj_size);
 	} else
 		sk = kmalloc(prot->obj_size, priority);
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0a1d4ca314f4..87883e3e3c2a 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -159,6 +159,20 @@ config STACKLEAK_RUNTIME_DISABLE
 	  runtime to control kernel stack erasing for kernels built with
 	  CONFIG_GCC_PLUGIN_STACKLEAK.
 
+config INIT_ON_ALLOC_DEFAULT_ON
+	bool "Set init_on_alloc=1 by default"
+	help
+	  Enable init_on_alloc=1 by default, making the kernel initialize every
+	  page and heap allocation with zeroes.
+	  init_on_alloc can be overridden via command line.
+
+config INIT_ON_FREE_DEFAULT_ON
+	bool "Set init_on_free=1 by default"
+	help
+	  Enable init_on_free=1 by default, making the kernel initialize freed
+	  pages and slab memory with zeroes.
+	  init_on_free can be overridden via command line.
+
 endmenu
 
 endmenu
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 2/4] lib: introduce test_meminit module
  2019-05-14 14:35 [PATCH v2 0/4] RFC: add init_on_alloc/init_on_free boot options Alexander Potapenko
@ 2019-05-14 14:35   ` Alexander Potapenko
  2019-05-14 14:35   ` Alexander Potapenko
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Jann Horn, linux-mm,
	linux-security-module

Add tests for heap and pagealloc initialization.
These can be used to check init_on_alloc and init_on_free implementations
as well as other approaches to initialization.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Kees Cook <keescook@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 lib/Kconfig.debug  |   8 ++
 lib/Makefile       |   1 +
 lib/test_meminit.c | 205 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 214 insertions(+)
 create mode 100644 lib/test_meminit.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d695ec1477f3..6c3fc68a4a77 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2020,6 +2020,14 @@ config TEST_STACKINIT
 
 	  If unsure, say N.
 
+config TEST_MEMINIT
+	tristate "Test level of heap/page initialization"
+	help
+	  Test if the kernel is zero-initializing heap and page allocations.
+	  This can be useful to test init_on_alloc and init_on_free features.
+
+	  If unsure, say N.
+
 endif # RUNTIME_TESTING_MENU
 
 config MEMTEST
diff --git a/lib/Makefile b/lib/Makefile
index 83d7df2661ff..29c5afbe9882 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o
 obj-$(CONFIG_TEST_MEMCAT_P) += test_memcat_p.o
 obj-$(CONFIG_TEST_OBJAGG) += test_objagg.o
 obj-$(CONFIG_TEST_STACKINIT) += test_stackinit.o
+obj-$(CONFIG_TEST_MEMINIT) += test_meminit.o
 
 obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/
 
diff --git a/lib/test_meminit.c b/lib/test_meminit.c
new file mode 100644
index 000000000000..67d759498030
--- /dev/null
+++ b/lib/test_meminit.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for SL[AOU]B/page initialization at alloc/free time.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define GARBAGE_INT (0x09A7BA9E)
+#define GARBAGE_BYTE (0x9E)
+
+#define REPORT_FAILURES_IN_FN() \
+	do {	\
+		if (failures)	\
+			pr_info("%s failed %d out of %d times\n",	\
+				__func__, failures, num_tests);		\
+		else		\
+			pr_info("all %d tests in %s passed\n",		\
+				num_tests, __func__);			\
+	} while (0)
+
+/* Calculate the number of uninitialized bytes in the buffer. */
+static int count_nonzero_bytes(void *ptr, size_t size)
+{
+	int i, ret = 0;
+	unsigned char *p = (unsigned char *)ptr;
+
+	for (i = 0; i < size; i++)
+		if (p[i])
+			ret++;
+	return ret;
+}
+
+static void fill_with_garbage(void *ptr, size_t size)
+{
+	unsigned int *p = (unsigned int *)ptr;
+	int i = 0;
+
+	while (size >= sizeof(*p)) {
+		p[i] = GARBAGE_INT;
+		i++;
+		size -= sizeof(*p);
+	}
+	if (size)
+		memset(&p[i], GARBAGE_BYTE, size);
+}
+
+static int __init do_alloc_pages_order(int order, int *total_failures)
+{
+	struct page *page;
+	void *buf;
+	size_t size = PAGE_SIZE << order;
+
+	page = alloc_pages(GFP_KERNEL, order);
+	buf = page_address(page);
+	fill_with_garbage(buf, size);
+	__free_pages(page, order);
+
+	page = alloc_pages(GFP_KERNEL, order);
+	buf = page_address(page);
+	if (count_nonzero_bytes(buf, size))
+		(*total_failures)++;
+	fill_with_garbage(buf, size);
+	__free_pages(page, order);
+	return 1;
+}
+
+static int __init test_pages(int *total_failures)
+{
+	int failures = 0, num_tests = 0;
+	int i;
+
+	for (i = 0; i < 10; i++)
+		num_tests += do_alloc_pages_order(i, &failures);
+
+	REPORT_FAILURES_IN_FN();
+	*total_failures += failures;
+	return num_tests;
+}
+
+static int __init do_kmalloc_size(size_t size, int *total_failures)
+{
+	void *buf;
+
+	buf = kmalloc(size, GFP_KERNEL);
+	fill_with_garbage(buf, size);
+	kfree(buf);
+
+	buf = kmalloc(size, GFP_KERNEL);
+	if (count_nonzero_bytes(buf, size))
+		(*total_failures)++;
+	fill_with_garbage(buf, size);
+	kfree(buf);
+	return 1;
+}
+
+static int __init do_vmalloc_size(size_t size, int *total_failures)
+{
+	void *buf;
+
+	buf = vmalloc(size);
+	fill_with_garbage(buf, size);
+	vfree(buf);
+
+	buf = vmalloc(size);
+	if (count_nonzero_bytes(buf, size))
+		(*total_failures)++;
+	fill_with_garbage(buf, size);
+	vfree(buf);
+	return 1;
+}
+
+static int __init test_kvmalloc(int *total_failures)
+{
+	int failures = 0, num_tests = 0;
+	int i, size;
+
+	for (i = 0; i < 20; i++) {
+		size = 1 << i;
+		num_tests += do_kmalloc_size(size, &failures);
+		num_tests += do_vmalloc_size(size, &failures);
+	}
+
+	REPORT_FAILURES_IN_FN();
+	*total_failures += failures;
+	return num_tests;
+}
+
+#define CTOR_BYTES 4
+/* Initialize the first 4 bytes of the object. */
+void some_ctor(void *obj)
+{
+	memset(obj, 'A', CTOR_BYTES);
+}
+
+static int __init do_kmem_cache_size(size_t size, bool want_ctor,
+				     int *total_failures)
+{
+	struct kmem_cache *c;
+	void *buf;
+	int iter, bytes = 0;
+	int fail = 0;
+
+	c = kmem_cache_create("test_cache", size, 1, 0,
+			      want_ctor ? some_ctor : NULL);
+	for (iter = 0; iter < 10; iter++) {
+		buf = kmem_cache_alloc(c, GFP_KERNEL);
+		if (!want_ctor || iter == 0)
+			bytes = count_nonzero_bytes(buf, size);
+		if (want_ctor) {
+			/*
+			 * Newly initialized memory must be initialized using
+			 * the constructor.
+			 */
+			if (iter == 0 && bytes < CTOR_BYTES)
+				fail = 1;
+		} else {
+			if (bytes)
+				fail = 1;
+		}
+		fill_with_garbage(buf, size);
+		kmem_cache_free(c, buf);
+	}
+	kmem_cache_destroy(c);
+
+	*total_failures += fail;
+	return 1;
+}
+
+static int __init test_kmemcache(int *total_failures)
+{
+	int failures = 0, num_tests = 0;
+	int i, size;
+
+	for (i = 0; i < 10; i++) {
+		size = 4 << i;
+		num_tests += do_kmem_cache_size(size, false, &failures);
+		num_tests += do_kmem_cache_size(size, true, &failures);
+	}
+	REPORT_FAILURES_IN_FN();
+	*total_failures += failures;
+	return num_tests;
+}
+
+static int __init test_meminit_init(void)
+{
+	int failures = 0, num_tests = 0;
+
+	num_tests += test_pages(&failures);
+	num_tests += test_kvmalloc(&failures);
+	num_tests += test_kmemcache(&failures);
+
+	if (failures == 0)
+		pr_info("all %d tests passed!\n", num_tests);
+	else
+		pr_info("failures: %d out of %d\n", failures, num_tests);
+
+	return failures ? -EINVAL : 0;
+}
+module_init(test_meminit_init);
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 2/4] lib: introduce test_meminit module
@ 2019-05-14 14:35   ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Jann Horn, linux-mm,
	linux-security-module

Add tests for heap and pagealloc initialization.
These can be used to check init_on_alloc and init_on_free implementations
as well as other approaches to initialization.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Kees Cook <keescook@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 lib/Kconfig.debug  |   8 ++
 lib/Makefile       |   1 +
 lib/test_meminit.c | 205 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 214 insertions(+)
 create mode 100644 lib/test_meminit.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d695ec1477f3..6c3fc68a4a77 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2020,6 +2020,14 @@ config TEST_STACKINIT
 
 	  If unsure, say N.
 
+config TEST_MEMINIT
+	tristate "Test level of heap/page initialization"
+	help
+	  Test if the kernel is zero-initializing heap and page allocations.
+	  This can be useful to test init_on_alloc and init_on_free features.
+
+	  If unsure, say N.
+
 endif # RUNTIME_TESTING_MENU
 
 config MEMTEST
diff --git a/lib/Makefile b/lib/Makefile
index 83d7df2661ff..29c5afbe9882 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o
 obj-$(CONFIG_TEST_MEMCAT_P) += test_memcat_p.o
 obj-$(CONFIG_TEST_OBJAGG) += test_objagg.o
 obj-$(CONFIG_TEST_STACKINIT) += test_stackinit.o
+obj-$(CONFIG_TEST_MEMINIT) += test_meminit.o
 
 obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/
 
diff --git a/lib/test_meminit.c b/lib/test_meminit.c
new file mode 100644
index 000000000000..67d759498030
--- /dev/null
+++ b/lib/test_meminit.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for SL[AOU]B/page initialization at alloc/free time.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define GARBAGE_INT (0x09A7BA9E)
+#define GARBAGE_BYTE (0x9E)
+
+#define REPORT_FAILURES_IN_FN() \
+	do {	\
+		if (failures)	\
+			pr_info("%s failed %d out of %d times\n",	\
+				__func__, failures, num_tests);		\
+		else		\
+			pr_info("all %d tests in %s passed\n",		\
+				num_tests, __func__);			\
+	} while (0)
+
+/* Calculate the number of uninitialized bytes in the buffer. */
+static int count_nonzero_bytes(void *ptr, size_t size)
+{
+	int i, ret = 0;
+	unsigned char *p = (unsigned char *)ptr;
+
+	for (i = 0; i < size; i++)
+		if (p[i])
+			ret++;
+	return ret;
+}
+
+static void fill_with_garbage(void *ptr, size_t size)
+{
+	unsigned int *p = (unsigned int *)ptr;
+	int i = 0;
+
+	while (size >= sizeof(*p)) {
+		p[i] = GARBAGE_INT;
+		i++;
+		size -= sizeof(*p);
+	}
+	if (size)
+		memset(&p[i], GARBAGE_BYTE, size);
+}
+
+static int __init do_alloc_pages_order(int order, int *total_failures)
+{
+	struct page *page;
+	void *buf;
+	size_t size = PAGE_SIZE << order;
+
+	page = alloc_pages(GFP_KERNEL, order);
+	buf = page_address(page);
+	fill_with_garbage(buf, size);
+	__free_pages(page, order);
+
+	page = alloc_pages(GFP_KERNEL, order);
+	buf = page_address(page);
+	if (count_nonzero_bytes(buf, size))
+		(*total_failures)++;
+	fill_with_garbage(buf, size);
+	__free_pages(page, order);
+	return 1;
+}
+
+static int __init test_pages(int *total_failures)
+{
+	int failures = 0, num_tests = 0;
+	int i;
+
+	for (i = 0; i < 10; i++)
+		num_tests += do_alloc_pages_order(i, &failures);
+
+	REPORT_FAILURES_IN_FN();
+	*total_failures += failures;
+	return num_tests;
+}
+
+static int __init do_kmalloc_size(size_t size, int *total_failures)
+{
+	void *buf;
+
+	buf = kmalloc(size, GFP_KERNEL);
+	fill_with_garbage(buf, size);
+	kfree(buf);
+
+	buf = kmalloc(size, GFP_KERNEL);
+	if (count_nonzero_bytes(buf, size))
+		(*total_failures)++;
+	fill_with_garbage(buf, size);
+	kfree(buf);
+	return 1;
+}
+
+static int __init do_vmalloc_size(size_t size, int *total_failures)
+{
+	void *buf;
+
+	buf = vmalloc(size);
+	fill_with_garbage(buf, size);
+	vfree(buf);
+
+	buf = vmalloc(size);
+	if (count_nonzero_bytes(buf, size))
+		(*total_failures)++;
+	fill_with_garbage(buf, size);
+	vfree(buf);
+	return 1;
+}
+
+static int __init test_kvmalloc(int *total_failures)
+{
+	int failures = 0, num_tests = 0;
+	int i, size;
+
+	for (i = 0; i < 20; i++) {
+		size = 1 << i;
+		num_tests += do_kmalloc_size(size, &failures);
+		num_tests += do_vmalloc_size(size, &failures);
+	}
+
+	REPORT_FAILURES_IN_FN();
+	*total_failures += failures;
+	return num_tests;
+}
+
+#define CTOR_BYTES 4
+/* Initialize the first 4 bytes of the object. */
+void some_ctor(void *obj)
+{
+	memset(obj, 'A', CTOR_BYTES);
+}
+
+static int __init do_kmem_cache_size(size_t size, bool want_ctor,
+				     int *total_failures)
+{
+	struct kmem_cache *c;
+	void *buf;
+	int iter, bytes = 0;
+	int fail = 0;
+
+	c = kmem_cache_create("test_cache", size, 1, 0,
+			      want_ctor ? some_ctor : NULL);
+	for (iter = 0; iter < 10; iter++) {
+		buf = kmem_cache_alloc(c, GFP_KERNEL);
+		if (!want_ctor || iter == 0)
+			bytes = count_nonzero_bytes(buf, size);
+		if (want_ctor) {
+			/*
+			 * Newly initialized memory must be initialized using
+			 * the constructor.
+			 */
+			if (iter == 0 && bytes < CTOR_BYTES)
+				fail = 1;
+		} else {
+			if (bytes)
+				fail = 1;
+		}
+		fill_with_garbage(buf, size);
+		kmem_cache_free(c, buf);
+	}
+	kmem_cache_destroy(c);
+
+	*total_failures += fail;
+	return 1;
+}
+
+static int __init test_kmemcache(int *total_failures)
+{
+	int failures = 0, num_tests = 0;
+	int i, size;
+
+	for (i = 0; i < 10; i++) {
+		size = 4 << i;
+		num_tests += do_kmem_cache_size(size, false, &failures);
+		num_tests += do_kmem_cache_size(size, true, &failures);
+	}
+	REPORT_FAILURES_IN_FN();
+	*total_failures += failures;
+	return num_tests;
+}
+
+static int __init test_meminit_init(void)
+{
+	int failures = 0, num_tests = 0;
+
+	num_tests += test_pages(&failures);
+	num_tests += test_kvmalloc(&failures);
+	num_tests += test_kmemcache(&failures);
+
+	if (failures == 0)
+		pr_info("all %d tests passed!\n", num_tests);
+	else
+		pr_info("failures: %d out of %d\n", failures, num_tests);
+
+	return failures ? -EINVAL : 0;
+}
+module_init(test_meminit_init);
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-14 14:35 [PATCH v2 0/4] RFC: add init_on_alloc/init_on_free boot options Alexander Potapenko
@ 2019-05-14 14:35   ` Alexander Potapenko
  2019-05-14 14:35   ` Alexander Potapenko
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, Souptick Joarder, Matthew Wilcox, linux-mm,
	linux-security-module

When passed to an allocator (either pagealloc or SL[AOU]B),
__GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
init_on_alloc boot option is enabled. This can be useful in the cases
newly allocated memory is going to be initialized by the caller right
away.

__GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
where init_on_free implies init_on_alloc.

__GFP_NO_AUTOINIT basically defeats the hardening against information
leaks provided by init_on_alloc, so one should use it with caution.

This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
Doing so is safe, because the heap allocators initialize the pages they
receive before passing memory to the callers.

Slowdown for the initialization features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)

Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Kees Cook <keescook@chromium.org>
To: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - renamed __GFP_NOINIT to __GFP_NO_AUTOINIT, updated patch
    name/description
---
 include/linux/gfp.h | 13 +++++++++----
 include/linux/mm.h  |  2 +-
 kernel/kexec_core.c |  3 ++-
 mm/slab.c           |  2 +-
 mm/slob.c           |  3 ++-
 mm/slub.c           |  1 +
 6 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..e1a83bd0ca67 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@ struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NO_AUTOINIT	0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -208,16 +209,20 @@ struct vm_area_struct;
  * %__GFP_COMP address compound page metadata.
  *
  * %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NO_AUTOINIT requests non-initialized memory from the underlying
+ * allocator.
  */
-#define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
-#define __GFP_COMP	((__force gfp_t)___GFP_COMP)
-#define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOWARN		((__force gfp_t)___GFP_NOWARN)
+#define __GFP_COMP		((__force gfp_t)___GFP_COMP)
+#define __GFP_ZERO		((__force gfp_t)___GFP_ZERO)
+#define __GFP_NO_AUTOINIT	((__force gfp_t)___GFP_NO_AUTOINIT)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 18d96f1d07c5..ce6c63396002 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2618,7 +2618,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_on_alloc))
-		return true;
+		return !(flags & __GFP_NO_AUTOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 2f75dd0d0d81..7fc37bacac79 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -302,7 +302,8 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *pages;
 
-	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
+	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NO_AUTOINIT,
+			    order);
 	if (pages) {
 		unsigned int count, i;
 
diff --git a/mm/slab.c b/mm/slab.c
index d00e9de26a45..1089461fc22b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	int nr_pages;
 
-	flags |= cachep->allocflags;
+	flags |= (cachep->allocflags | __GFP_NO_AUTOINIT);
 
 	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page) {
diff --git a/mm/slob.c b/mm/slob.c
index 351d3dfee000..d505f36aa398 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
+	gfp |= __GFP_NO_AUTOINIT;
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
 		page = __alloc_pages_node(node, gfp, order);
@@ -221,7 +222,7 @@ static inline bool slob_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
 {
 	if (static_branch_unlikely(&init_on_alloc) ||
 	    static_branch_unlikely(&init_on_free))
-		return c ? (!c->ctor) : true;
+		return c ? (!c->ctor) : !(flags & __GFP_NO_AUTOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index 01424e910800..0aa306f5769a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1495,6 +1495,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
+	flags |= __GFP_NO_AUTOINIT;
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
@ 2019-05-14 14:35   ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, Souptick Joarder, Matthew Wilcox, linux-mm,
	linux-security-module

When passed to an allocator (either pagealloc or SL[AOU]B),
__GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
init_on_alloc boot option is enabled. This can be useful in the cases
newly allocated memory is going to be initialized by the caller right
away.

__GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
where init_on_free implies init_on_alloc.

__GFP_NO_AUTOINIT basically defeats the hardening against information
leaks provided by init_on_alloc, so one should use it with caution.

This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
Doing so is safe, because the heap allocators initialize the pages they
receive before passing memory to the callers.

Slowdown for the initialization features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)

Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Kees Cook <keescook@chromium.org>
To: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - renamed __GFP_NOINIT to __GFP_NO_AUTOINIT, updated patch
    name/description
---
 include/linux/gfp.h | 13 +++++++++----
 include/linux/mm.h  |  2 +-
 kernel/kexec_core.c |  3 ++-
 mm/slab.c           |  2 +-
 mm/slob.c           |  3 ++-
 mm/slub.c           |  1 +
 6 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..e1a83bd0ca67 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@ struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NO_AUTOINIT	0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -208,16 +209,20 @@ struct vm_area_struct;
  * %__GFP_COMP address compound page metadata.
  *
  * %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NO_AUTOINIT requests non-initialized memory from the underlying
+ * allocator.
  */
-#define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
-#define __GFP_COMP	((__force gfp_t)___GFP_COMP)
-#define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOWARN		((__force gfp_t)___GFP_NOWARN)
+#define __GFP_COMP		((__force gfp_t)___GFP_COMP)
+#define __GFP_ZERO		((__force gfp_t)___GFP_ZERO)
+#define __GFP_NO_AUTOINIT	((__force gfp_t)___GFP_NO_AUTOINIT)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 18d96f1d07c5..ce6c63396002 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2618,7 +2618,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_on_alloc))
-		return true;
+		return !(flags & __GFP_NO_AUTOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 2f75dd0d0d81..7fc37bacac79 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -302,7 +302,8 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *pages;
 
-	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
+	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NO_AUTOINIT,
+			    order);
 	if (pages) {
 		unsigned int count, i;
 
diff --git a/mm/slab.c b/mm/slab.c
index d00e9de26a45..1089461fc22b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	int nr_pages;
 
-	flags |= cachep->allocflags;
+	flags |= (cachep->allocflags | __GFP_NO_AUTOINIT);
 
 	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page) {
diff --git a/mm/slob.c b/mm/slob.c
index 351d3dfee000..d505f36aa398 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
+	gfp |= __GFP_NO_AUTOINIT;
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
 		page = __alloc_pages_node(node, gfp, order);
@@ -221,7 +222,7 @@ static inline bool slob_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
 {
 	if (static_branch_unlikely(&init_on_alloc) ||
 	    static_branch_unlikely(&init_on_free))
-		return c ? (!c->ctor) : true;
+		return c ? (!c->ctor) : !(flags & __GFP_NO_AUTOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index 01424e910800..0aa306f5769a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1495,6 +1495,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
+	flags |= __GFP_NO_AUTOINIT;
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
  2019-05-14 14:35 [PATCH v2 0/4] RFC: add init_on_alloc/init_on_free boot options Alexander Potapenko
@ 2019-05-14 14:35   ` Alexander Potapenko
  2019-05-14 14:35   ` Alexander Potapenko
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, linux-mm, linux-security-module

Add sock_alloc_send_pskb_noinit(), which is similar to
sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
This helps reduce the slowdown on hackbench in the init_on_alloc mode
from 6.84% to 3.45%.

Slowdown for the initialization features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)

Linux build with -j12, init_on_free=1:  +8.34% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.13% sys time (st.err 0.47%)
Linux build with -j12, init_on_alloc=1: -0.04% wall time (st.err 0.46%)
Linux build with -j12, init_on_alloc=1: +0.50% sys time (st.err 0.45%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - changed __GFP_NOINIT to __GFP_NO_AUTOINIT
---
 include/net/sock.h |  5 +++++
 net/core/sock.c    | 29 +++++++++++++++++++++++++----
 net/unix/af_unix.c | 13 +++++++------
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 4d208c0f9c14..0dcb90a0c14d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1626,6 +1626,11 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				     unsigned long data_len, int noblock,
 				     int *errcode, int max_page_order);
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order);
 void *sock_kmalloc(struct sock *sk, int size, gfp_t priority);
 void sock_kfree_s(struct sock *sk, void *mem, int size);
 void sock_kzfree_s(struct sock *sk, void *mem, int size);
diff --git a/net/core/sock.c b/net/core/sock.c
index 9ceb90c875bc..7c24b70b7069 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2192,9 +2192,11 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
  *	Generic send/receive buffer handlers
  */
 
-struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
-				     unsigned long data_len, int noblock,
-				     int *errcode, int max_page_order)
+struct sk_buff *sock_alloc_send_pskb_internal(struct sock *sk,
+					      unsigned long header_len,
+					      unsigned long data_len,
+					      int noblock, int *errcode,
+					      int max_page_order, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	long timeo;
@@ -2223,7 +2225,7 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 		timeo = sock_wait_for_wmem(sk, timeo);
 	}
 	skb = alloc_skb_with_frags(header_len, data_len, max_page_order,
-				   errcode, sk->sk_allocation);
+				   errcode, sk->sk_allocation | gfp);
 	if (skb)
 		skb_set_owner_w(skb, sk);
 	return skb;
@@ -2234,8 +2236,27 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 	*errcode = err;
 	return NULL;
 }
+
+struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
+				     unsigned long data_len, int noblock,
+				     int *errcode, int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/0);
+}
 EXPORT_SYMBOL(sock_alloc_send_pskb);
 
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/__GFP_NO_AUTOINIT);
+}
+EXPORT_SYMBOL(sock_alloc_send_pskb_noinit);
+
 struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 				    int noblock, int *errcode)
 {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e68d7454f2e3..a4c15620b66d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1627,9 +1627,9 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
 	}
 
-	skb = sock_alloc_send_pskb(sk, len - data_len, data_len,
-				   msg->msg_flags & MSG_DONTWAIT, &err,
-				   PAGE_ALLOC_COSTLY_ORDER);
+	skb = sock_alloc_send_pskb_noinit(sk, len - data_len, data_len,
+					  msg->msg_flags & MSG_DONTWAIT, &err,
+					  PAGE_ALLOC_COSTLY_ORDER);
 	if (skb == NULL)
 		goto out;
 
@@ -1824,9 +1824,10 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 
 		data_len = min_t(size_t, size, PAGE_ALIGN(data_len));
 
-		skb = sock_alloc_send_pskb(sk, size - data_len, data_len,
-					   msg->msg_flags & MSG_DONTWAIT, &err,
-					   get_order(UNIX_SKB_FRAGS_SZ));
+		skb = sock_alloc_send_pskb_noinit(sk, size - data_len, data_len,
+						  msg->msg_flags & MSG_DONTWAIT,
+						  &err,
+						  get_order(UNIX_SKB_FRAGS_SZ));
 		if (!skb)
 			goto out_err;
 
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
@ 2019-05-14 14:35   ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-14 14:35 UTC (permalink / raw)
  To: akpm, cl, keescook
  Cc: kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, linux-mm, linux-security-module

Add sock_alloc_send_pskb_noinit(), which is similar to
sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
This helps reduce the slowdown on hackbench in the init_on_alloc mode
from 6.84% to 3.45%.

Slowdown for the initialization features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)

Linux build with -j12, init_on_free=1:  +8.34% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.13% sys time (st.err 0.47%)
Linux build with -j12, init_on_alloc=1: -0.04% wall time (st.err 0.46%)
Linux build with -j12, init_on_alloc=1: +0.50% sys time (st.err 0.45%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - changed __GFP_NOINIT to __GFP_NO_AUTOINIT
---
 include/net/sock.h |  5 +++++
 net/core/sock.c    | 29 +++++++++++++++++++++++++----
 net/unix/af_unix.c | 13 +++++++------
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 4d208c0f9c14..0dcb90a0c14d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1626,6 +1626,11 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				     unsigned long data_len, int noblock,
 				     int *errcode, int max_page_order);
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order);
 void *sock_kmalloc(struct sock *sk, int size, gfp_t priority);
 void sock_kfree_s(struct sock *sk, void *mem, int size);
 void sock_kzfree_s(struct sock *sk, void *mem, int size);
diff --git a/net/core/sock.c b/net/core/sock.c
index 9ceb90c875bc..7c24b70b7069 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2192,9 +2192,11 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
  *	Generic send/receive buffer handlers
  */
 
-struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
-				     unsigned long data_len, int noblock,
-				     int *errcode, int max_page_order)
+struct sk_buff *sock_alloc_send_pskb_internal(struct sock *sk,
+					      unsigned long header_len,
+					      unsigned long data_len,
+					      int noblock, int *errcode,
+					      int max_page_order, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	long timeo;
@@ -2223,7 +2225,7 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 		timeo = sock_wait_for_wmem(sk, timeo);
 	}
 	skb = alloc_skb_with_frags(header_len, data_len, max_page_order,
-				   errcode, sk->sk_allocation);
+				   errcode, sk->sk_allocation | gfp);
 	if (skb)
 		skb_set_owner_w(skb, sk);
 	return skb;
@@ -2234,8 +2236,27 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 	*errcode = err;
 	return NULL;
 }
+
+struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
+				     unsigned long data_len, int noblock,
+				     int *errcode, int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/0);
+}
 EXPORT_SYMBOL(sock_alloc_send_pskb);
 
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/__GFP_NO_AUTOINIT);
+}
+EXPORT_SYMBOL(sock_alloc_send_pskb_noinit);
+
 struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 				    int noblock, int *errcode)
 {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e68d7454f2e3..a4c15620b66d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1627,9 +1627,9 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
 	}
 
-	skb = sock_alloc_send_pskb(sk, len - data_len, data_len,
-				   msg->msg_flags & MSG_DONTWAIT, &err,
-				   PAGE_ALLOC_COSTLY_ORDER);
+	skb = sock_alloc_send_pskb_noinit(sk, len - data_len, data_len,
+					  msg->msg_flags & MSG_DONTWAIT, &err,
+					  PAGE_ALLOC_COSTLY_ORDER);
 	if (skb == NULL)
 		goto out;
 
@@ -1824,9 +1824,10 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 
 		data_len = min_t(size_t, size, PAGE_ALIGN(data_len));
 
-		skb = sock_alloc_send_pskb(sk, size - data_len, data_len,
-					   msg->msg_flags & MSG_DONTWAIT, &err,
-					   get_order(UNIX_SKB_FRAGS_SZ));
+		skb = sock_alloc_send_pskb_noinit(sk, size - data_len, data_len,
+						  msg->msg_flags & MSG_DONTWAIT,
+						  &err,
+						  get_order(UNIX_SKB_FRAGS_SZ));
 		if (!skb)
 			goto out_err;
 
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 2/4] lib: introduce test_meminit module
  2019-05-14 14:35   ` Alexander Potapenko
  (?)
@ 2019-05-16  1:02   ` Kees Cook
  2019-05-17 15:51       ` Alexander Potapenko
  -1 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2019-05-16  1:02 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, kernel-hardening, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Jann Horn, linux-mm,
	linux-security-module

On Tue, May 14, 2019 at 04:35:35PM +0200, Alexander Potapenko wrote:
> Add tests for heap and pagealloc initialization.
> These can be used to check init_on_alloc and init_on_free implementations
> as well as other approaches to initialization.

This is nice! Easy way to test the results. It might be helpful to show
here what to expect when loading this module:

with either init_on_alloc=1 or init_on_free=1, I happily see:

	test_meminit: all 10 tests in test_pages passed
	test_meminit: all 40 tests in test_kvmalloc passed
	test_meminit: all 20 tests in test_kmemcache passed
	test_meminit: all 70 tests passed!

and without:

	test_meminit: test_pages failed 10 out of 10 times
	test_meminit: test_kvmalloc failed 40 out of 40 times
	test_meminit: test_kmemcache failed 10 out of 20 times
	test_meminit: failures: 60 out of 70


> 
> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>

note below...

> [...]
> diff --git a/lib/test_meminit.c b/lib/test_meminit.c
> new file mode 100644
> index 000000000000..67d759498030
> --- /dev/null
> +++ b/lib/test_meminit.c
> @@ -0,0 +1,205 @@
> +// SPDX-License-Identifier: GPL-2.0
> [...]
> +module_init(test_meminit_init);

I get a warning at build about missing the license:

WARNING: modpost: missing MODULE_LICENSE() in lib/test_meminit.o

So, following the SPDX line, just add:

MODULE_LICENSE("GPL");

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-14 14:35   ` Alexander Potapenko
  (?)
@ 2019-05-16 16:19   ` Kees Cook
  2019-05-16 16:42       ` Alexander Potapenko
  -1 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2019-05-16 16:19 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, kernel-hardening, Masahiro Yamada, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, linux-mm, linux-security-module

On Tue, May 14, 2019 at 04:35:34PM +0200, Alexander Potapenko wrote:
> Slowdown for the new features compared to init_on_free=0,
> init_on_alloc=0:
> 
> hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

I wonder if the patch series should be reorganized to introduce
__GFP_NO_AUTOINIT first, so that when the commit with benchmarks appears,
we get the "final" numbers...

> Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

I'm working on reproducing these benchmarks. I'd really like to narrow
down the +24% number here. But it does 

> The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> baseline is within the standard error.

I think the use of static keys here is really great: this is available
by default for anyone that wants to turn it on.

I'm thinking, given the configuable nature of this, it'd be worth adding
a little more detail at boot time. I think maybe a separate patch could
be added to describe the kernel's memory auto-initialization features,
and add something like this to mm_init():

+void __init report_meminit(void)
+{
+	const char *stack;
+
+	if (IS_ENABLED(CONFIG_INIT_STACK_ALL))
+		stack = "all";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL))
+		stack = "byref_all";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF))
+		stack = "byref";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER))
+		stack = "__user";
+	else
+		stack = "off";
+
+	/* Report memory auto-initialization states for this boot. */
+	pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n",
+		stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off",
+		want_init_on_free() ? "on" : "off");
+}

To get a boot line like:

	mem auto-init: stack:off, heap alloc:off, heap free:on

And one other thought I had was that in the init_on_free=1 case, there is
a large pause at boot while memory is being cleared. I think it'd be handy
to include a comment about that, just to keep people from being surprised:

diff --git a/init/main.c b/init/main.c
index cf0c3948ce0e..aea278392338 100644
--- a/init/main.c
+++ b/init/main.c
@@ -529,6 +529,8 @@ static void __init mm_init(void)
 	 * bigger than MAX_ORDER unless SPARSEMEM.
 	 */
 	page_ext_init_flatmem();
+	if (want_init_on_free())
+		pr_info("Clearing system memory ...\n");
 	mem_init();
 	kmem_cache_init();
 	pgtable_init();

Beyond these thoughts, I think this series is in good shape.

Andrew (or anyone else) do you have any concerns about this?

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-16 16:19   ` Kees Cook
@ 2019-05-16 16:42       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-16 16:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

From: Kees Cook <keescook@chromium.org>
Date: Thu, May 16, 2019 at 6:20 PM
To: Alexander Potapenko
Cc: <akpm@linux-foundation.org>, <cl@linux.com>,
<kernel-hardening@lists.openwall.com>, Masahiro Yamada, James Morris,
Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn, Mark Rutland,
<linux-mm@kvack.org>, <linux-security-module@vger.kernel.org>

> On Tue, May 14, 2019 at 04:35:34PM +0200, Alexander Potapenko wrote:
> > Slowdown for the new features compared to init_on_free=0,
> > init_on_alloc=0:
> >
> > hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> > hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
>
> I wonder if the patch series should be reorganized to introduce
> __GFP_NO_AUTOINIT first, so that when the commit with benchmarks appears,
> we get the "final" numbers...
>
> > Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> > Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> > Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> > Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
>
> I'm working on reproducing these benchmarks. I'd really like to narrow
> down the +24% number here. But it does
I suspect the slowdown of init_on_free is bigger than that of
PAX_SANITIZE_MEMORY, as we've set the goal to have fully zeroed memory
at alloc time.
If we want a mode that only wipes the user data upon free() but
doesn't eliminate all uninit memory, then we can make it faster.
> > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > baseline is within the standard error.
>
> I think the use of static keys here is really great: this is available
> by default for anyone that wants to turn it on.
>
> I'm thinking, given the configuable nature of this, it'd be worth adding
> a little more detail at boot time. I think maybe a separate patch could
> be added to describe the kernel's memory auto-initialization features,
> and add something like this to mm_init():
>
> +void __init report_meminit(void)
> +{
> +       const char *stack;
> +
> +       if (IS_ENABLED(CONFIG_INIT_STACK_ALL))
> +               stack = "all";
> +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL))
> +               stack = "byref_all";
> +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF))
> +               stack = "byref";
> +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER))
> +               stack = "__user";
> +       else
> +               stack = "off";
> +
> +       /* Report memory auto-initialization states for this boot. */
> +       pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n",
> +               stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off",
> +               want_init_on_free() ? "on" : "off");
> +}
>
> To get a boot line like:
>
>         mem auto-init: stack:off, heap alloc:off, heap free:on
For stack there's no binary on/off, as you can potentially build half
of the kernel with stack instrumentation and another half without it.
We could make the instrumentation insert a static global flag into
each translation unit, but this won't give us any interesting info.

> And one other thought I had was that in the init_on_free=1 case, there is
> a large pause at boot while memory is being cleared. I think it'd be handy
> to include a comment about that, just to keep people from being surprised:
>
> diff --git a/init/main.c b/init/main.c
> index cf0c3948ce0e..aea278392338 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -529,6 +529,8 @@ static void __init mm_init(void)
>          * bigger than MAX_ORDER unless SPARSEMEM.
>          */
>         page_ext_init_flatmem();
> +       if (want_init_on_free())
> +               pr_info("Clearing system memory ...\n");
>         mem_init();
>         kmem_cache_init();
>         pgtable_init();
>
> Beyond these thoughts, I think this series is in good shape.
>
> Andrew (or anyone else) do you have any concerns about this?
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
@ 2019-05-16 16:42       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-16 16:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

From: Kees Cook <keescook@chromium.org>
Date: Thu, May 16, 2019 at 6:20 PM
To: Alexander Potapenko
Cc: <akpm@linux-foundation.org>, <cl@linux.com>,
<kernel-hardening@lists.openwall.com>, Masahiro Yamada, James Morris,
Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn, Mark Rutland,
<linux-mm@kvack.org>, <linux-security-module@vger.kernel.org>

> On Tue, May 14, 2019 at 04:35:34PM +0200, Alexander Potapenko wrote:
> > Slowdown for the new features compared to init_on_free=0,
> > init_on_alloc=0:
> >
> > hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> > hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
>
> I wonder if the patch series should be reorganized to introduce
> __GFP_NO_AUTOINIT first, so that when the commit with benchmarks appears,
> we get the "final" numbers...
>
> > Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> > Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> > Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> > Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
>
> I'm working on reproducing these benchmarks. I'd really like to narrow
> down the +24% number here. But it does
I suspect the slowdown of init_on_free is bigger than that of
PAX_SANITIZE_MEMORY, as we've set the goal to have fully zeroed memory
at alloc time.
If we want a mode that only wipes the user data upon free() but
doesn't eliminate all uninit memory, then we can make it faster.
> > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > baseline is within the standard error.
>
> I think the use of static keys here is really great: this is available
> by default for anyone that wants to turn it on.
>
> I'm thinking, given the configuable nature of this, it'd be worth adding
> a little more detail at boot time. I think maybe a separate patch could
> be added to describe the kernel's memory auto-initialization features,
> and add something like this to mm_init():
>
> +void __init report_meminit(void)
> +{
> +       const char *stack;
> +
> +       if (IS_ENABLED(CONFIG_INIT_STACK_ALL))
> +               stack = "all";
> +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL))
> +               stack = "byref_all";
> +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF))
> +               stack = "byref";
> +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER))
> +               stack = "__user";
> +       else
> +               stack = "off";
> +
> +       /* Report memory auto-initialization states for this boot. */
> +       pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n",
> +               stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off",
> +               want_init_on_free() ? "on" : "off");
> +}
>
> To get a boot line like:
>
>         mem auto-init: stack:off, heap alloc:off, heap free:on
For stack there's no binary on/off, as you can potentially build half
of the kernel with stack instrumentation and another half without it.
We could make the instrumentation insert a static global flag into
each translation unit, but this won't give us any interesting info.

> And one other thought I had was that in the init_on_free=1 case, there is
> a large pause at boot while memory is being cleared. I think it'd be handy
> to include a comment about that, just to keep people from being surprised:
>
> diff --git a/init/main.c b/init/main.c
> index cf0c3948ce0e..aea278392338 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -529,6 +529,8 @@ static void __init mm_init(void)
>          * bigger than MAX_ORDER unless SPARSEMEM.
>          */
>         page_ext_init_flatmem();
> +       if (want_init_on_free())
> +               pr_info("Clearing system memory ...\n");
>         mem_init();
>         kmem_cache_init();
>         pgtable_init();
>
> Beyond these thoughts, I think this series is in good shape.
>
> Andrew (or anyone else) do you have any concerns about this?
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
  2019-05-14 14:35   ` Alexander Potapenko
  (?)
@ 2019-05-16 16:53   ` Kees Cook
  2019-05-17  0:26     ` Kees Cook
  -1 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2019-05-16 16:53 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, kernel-hardening, Masahiro Yamada, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, linux-mm, linux-security-module

On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
> Add sock_alloc_send_pskb_noinit(), which is similar to
> sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
> This helps reduce the slowdown on hackbench in the init_on_alloc mode
> from 6.84% to 3.45%.

Out of curiosity, why the creation of the new function over adding a
gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
are only 6 callers, and this change already updates 2 of those.)

> Slowdown for the initialization features compared to init_on_free=0,
> init_on_alloc=0:
> 
> hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
> hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)

In the commit log it might be worth mentioning that this is only
changing the init_on_alloc case (in case it's not already obvious to
folks). Perhaps there needs to be a split of __GFP_NO_AUTOINIT into
__GFP_NO_AUTO_ALLOC_INIT and __GFP_NO_AUTO_FREE_INIT? Right now __GFP_NO_AUTOINIT is only checked for init_on_alloc:

static inline bool want_init_on_alloc(gfp_t flags)
{
        if (static_branch_unlikely(&init_on_alloc))
                return !(flags & __GFP_NO_AUTOINIT);
        return flags & __GFP_ZERO;
}
...
static inline bool want_init_on_free(void)
{
        return static_branch_unlikely(&init_on_free);
}

On a related note, it might be nice to add an exclusion list to
the kmem_cache_create() cases, since it seems likely that further
tuning will be needed there. For example, with the init_on_free-similar
PAX_MEMORY_SANITIZE changes in the last public release of PaX/grsecurity,
the following were excluded from wipe-on-free:

	buffer_head
	names_cache
	mm_struct
	vm_area_struct
	anon_vma
	anon_vma_chain
	skbuff_head_cache
	skbuff_fclone_cache

Adding these and others (with details on why they were selected),
might improve init_on_free performance further without trading too
much coverage.

Having a kernel param with a comma-separated list of cache names and
the logic to add __GFP_NO_AUTOINIT at creation time would be a nice
(and cheap!) debug feature to let folks tune things for their specific
workloads, if they choose to. (And it could maybe also know what "none"
meant, to actually remove the built-in exclusions, similar to what
PaX's "pax_sanitize_slab=full" does.)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-16 16:42       ` Alexander Potapenko
  (?)
@ 2019-05-16 17:03       ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2019-05-16 17:03 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Thu, May 16, 2019 at 06:42:37PM +0200, Alexander Potapenko wrote:
> I suspect the slowdown of init_on_free is bigger than that of
> PAX_SANITIZE_MEMORY, as we've set the goal to have fully zeroed memory
> at alloc time.
> If we want a mode that only wipes the user data upon free() but
> doesn't eliminate all uninit memory, then we can make it faster.

Yeah, I sent a separate email that discusses this a bit more.

I think the goals of init_on_alloc and init_on_free are likely a bit
different. Given init_on_alloc's much more cache-friendly performance,
I think that it's likely the right way forward for getting to fully zeroed
memory at alloc time. (Though I note that it already includes exclusions:
such tradeoffs won't be unique to init_on_free.)

init_on_free appears to give us similar coverage (but also reduces the
lifetime of unused data), but isn't cache-friendly so it looks to need
a lot more tuning/trade-offs. (Not that we shouldn't include it! It'll
just need a bit more care to be reasonable.)

> > +void __init report_meminit(void)
> > +{
> > +       const char *stack;
> > +
> > +       if (IS_ENABLED(CONFIG_INIT_STACK_ALL))
> > +               stack = "all";
> > +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL))
> > +               stack = "byref_all";
> > +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF))
> > +               stack = "byref";
> > +       else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER))
> > +               stack = "__user";
> > +       else
> > +               stack = "off";
> > +
> > +       /* Report memory auto-initialization states for this boot. */
> > +       pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n",
> > +               stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off",
> > +               want_init_on_free() ? "on" : "off");
> > +}
> >
> > To get a boot line like:
> >
> >         mem auto-init: stack:off, heap alloc:off, heap free:on
> For stack there's no binary on/off, as you can potentially build half
> of the kernel with stack instrumentation and another half without it.
> We could make the instrumentation insert a static global flag into
> each translation unit, but this won't give us any interesting info.

Well, yes, that's technically true, but I think reporting the kernel
config here would make sense. If someone intentionally bypasses the
stack auto-init for portions of the kernel, we can't meaningfully report
it here. There will be exceptions for stack auto-init and heap auto-init.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
  2019-05-16 16:53   ` Kees Cook
@ 2019-05-17  0:26     ` Kees Cook
  2019-05-17  8:49         ` Alexander Potapenko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2019-05-17  0:26 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, kernel-hardening, Masahiro Yamada, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, linux-mm, linux-security-module

On Thu, May 16, 2019 at 09:53:01AM -0700, Kees Cook wrote:
> On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
> > Add sock_alloc_send_pskb_noinit(), which is similar to
> > sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
> > This helps reduce the slowdown on hackbench in the init_on_alloc mode
> > from 6.84% to 3.45%.
> 
> Out of curiosity, why the creation of the new function over adding a
> gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
> are only 6 callers, and this change already updates 2 of those.)
> 
> > Slowdown for the initialization features compared to init_on_free=0,
> > init_on_alloc=0:
> > 
> > hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
> > hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)

So I've run some of my own wall-clock timings of kernel builds (which
should be an pretty big "worst case" situation, and I see much smaller
performance changes:

everything off
	Run times: 289.18 288.61 289.66 287.71 287.67
	Min: 287.67 Max: 289.66 Mean: 288.57 Std Dev: 0.79
		baseline

init_on_alloc=1
	Run times: 289.72 286.95 287.87 287.34 287.35
	Min: 286.95 Max: 289.72 Mean: 287.85 Std Dev: 0.98
		0.25% faster (within the std dev noise)

init_on_free=1
	Run times: 303.26 301.44 301.19 301.55 301.39
	Min: 301.19 Max: 303.26 Mean: 301.77 Std Dev: 0.75
		4.57% slower

init_on_free=1 with the PAX_MEMORY_SANITIZE slabs excluded:
	Run times: 299.19 299.85 298.95 298.23 298.64
	Min: 298.23 Max: 299.85 Mean: 298.97 Std Dev: 0.55
		3.60% slower

So the tuning certainly improved things by 1%. My perf numbers don't
show the 24% hit you were seeing at all, though.

> In the commit log it might be worth mentioning that this is only
> changing the init_on_alloc case (in case it's not already obvious to
> folks). Perhaps there needs to be a split of __GFP_NO_AUTOINIT into
> __GFP_NO_AUTO_ALLOC_INIT and __GFP_NO_AUTO_FREE_INIT? Right now
> __GFP_NO_AUTOINIT is only checked for init_on_alloc:

I was obviously crazy here. :) GFP isn't present for free(), but a SLAB
flag works (as was done in PAX_MEMORY_SANITIZE). I'll send the patch I
used for the above timing test.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches
  2019-05-14 14:35   ` Alexander Potapenko
  (?)
  (?)
@ 2019-05-17  0:50   ` Kees Cook
  2019-05-17  8:34       ` Alexander Potapenko
  2019-05-20  6:10       ` Mathias Krause
  -1 siblings, 2 replies; 52+ messages in thread
From: Kees Cook @ 2019-05-17  0:50 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, kernel-hardening, Masahiro Yamada, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, linux-mm, linux-security-module

In order to improve the init_on_free performance, some frequently
freed caches with less sensitive contents can be excluded from the
init_on_free behavior.

This patch is modified from Brad Spengler/PaX Team's code in the
last public patch of grsecurity/PaX based on my understanding of the
code. Changes or omissions from the original code are mine and don't
reflect the original grsecurity/PaX code.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/buffer.c          | 2 +-
 fs/dcache.c          | 3 ++-
 include/linux/slab.h | 3 +++
 kernel/fork.c        | 6 ++++--
 mm/rmap.c            | 5 +++--
 mm/slab.h            | 5 +++--
 net/core/skbuff.c    | 6 ++++--
 7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0faa41fb4c88..04a85bd4cf2e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3457,7 +3457,7 @@ void __init buffer_init(void)
 	bh_cachep = kmem_cache_create("buffer_head",
 			sizeof(struct buffer_head), 0,
 				(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
-				SLAB_MEM_SPREAD),
+				SLAB_MEM_SPREAD|SLAB_NO_FREE_INIT),
 				NULL);
 
 	/*
diff --git a/fs/dcache.c b/fs/dcache.c
index 8136bda27a1f..323b039accba 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3139,7 +3139,8 @@ void __init vfs_caches_init_early(void)
 void __init vfs_caches_init(void)
 {
 	names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NO_FREE_INIT, 0,
+			PATH_MAX, NULL);
 
 	dcache_init();
 	inode_init();
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9449b19c5f10..7eba9ad8830d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -92,6 +92,9 @@
 /* Avoid kmemleak tracing */
 #define SLAB_NOLEAKTRACE	((slab_flags_t __force)0x00800000U)
 
+/* Exclude slab from zero-on-free when init_on_free is enabled */
+#define SLAB_NO_FREE_INIT	((slab_flags_t __force)0x01000000U)
+
 /* Fault injection mark */
 #ifdef CONFIG_FAILSLAB
 # define SLAB_FAILSLAB		((slab_flags_t __force)0x02000000U)
diff --git a/kernel/fork.c b/kernel/fork.c
index b4cba953040a..9868585f5520 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2550,11 +2550,13 @@ void __init proc_caches_init(void)
 
 	mm_cachep = kmem_cache_create_usercopy("mm_struct",
 			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|
+			SLAB_NO_FREE_INIT,
 			offsetof(struct mm_struct, saved_auxv),
 			sizeof_field(struct mm_struct, saved_auxv),
 			NULL);
-	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
+	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT|
+						    SLAB_NO_FREE_INIT);
 	mmap_init();
 	nsproxy_cache_init();
 }
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..b7b8013eeb0a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -432,10 +432,11 @@ static void anon_vma_ctor(void *data)
 void __init anon_vma_init(void)
 {
 	anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
-			0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
+			0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT|
+			SLAB_NO_FREE_INIT,
 			anon_vma_ctor);
 	anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
-			SLAB_PANIC|SLAB_ACCOUNT);
+			SLAB_PANIC|SLAB_ACCOUNT|SLAB_NO_FREE_INIT);
 }
 
 /*
diff --git a/mm/slab.h b/mm/slab.h
index 24ae887359b8..f95b4f03c57b 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -129,7 +129,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
 /* Legal flag mask for kmem_cache_create(), for various configurations */
 #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
 			 SLAB_CACHE_DMA32 | SLAB_PANIC | \
-			 SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
+			 SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
+			 SLAB_NO_FREE_INIT)
 
 #if defined(CONFIG_DEBUG_SLAB)
 #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
@@ -535,7 +536,7 @@ static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
 static inline bool slab_want_init_on_free(struct kmem_cache *c)
 {
 	if (static_branch_unlikely(&init_on_free))
-		return !(c->ctor);
+		return !(c->ctor) && ((c->flags & SLAB_NO_FREE_INIT) == 0);
 	else
 		return false;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e89be6282693..b65902d2c042 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3981,14 +3981,16 @@ void __init skb_init(void)
 	skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache",
 					      sizeof(struct sk_buff),
 					      0,
-					      SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+					      SLAB_HWCACHE_ALIGN|SLAB_PANIC|
+					      SLAB_NO_FREE_INIT,
 					      offsetof(struct sk_buff, cb),
 					      sizeof_field(struct sk_buff, cb),
 					      NULL);
 	skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
 						sizeof(struct sk_buff_fclones),
 						0,
-						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+						SLAB_HWCACHE_ALIGN|SLAB_PANIC|
+						SLAB_NO_FREE_INIT,
 						NULL);
 	skb_extensions_init();
 }
-- 
2.17.1


-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-14 14:35   ` Alexander Potapenko
  (?)
  (?)
@ 2019-05-17  1:26   ` Kees Cook
  2019-05-17 14:38       ` Alexander Potapenko
  -1 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2019-05-17  1:26 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, kernel-hardening, Masahiro Yamada, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, linux-mm, linux-security-module

On Tue, May 14, 2019 at 04:35:34PM +0200, Alexander Potapenko wrote:
> [...]
> diff --git a/mm/slab.h b/mm/slab.h
> index 43ac818b8592..24ae887359b8 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -524,4 +524,20 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
> [...]
> +static inline bool slab_want_init_on_free(struct kmem_cache *c)
> +{
> +	if (static_branch_unlikely(&init_on_free))
> +		return !(c->ctor);

BTW, why is this checking for c->ctor here? Shouldn't it not matter for
the free case?

> +	else
> +		return false;
> +}

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches
  2019-05-17  0:50   ` [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches Kees Cook
@ 2019-05-17  8:34       ` Alexander Potapenko
  2019-05-20  6:10       ` Mathias Krause
  1 sibling, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17  8:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 2:50 AM Kees Cook <keescook@chromium.org> wrote:
>
> In order to improve the init_on_free performance, some frequently
> freed caches with less sensitive contents can be excluded from the
> init_on_free behavior.
Did you see any notable performance improvement with this patch?
A similar one gave me only 1-2% on the parallel Linux build.
> This patch is modified from Brad Spengler/PaX Team's code in the
> last public patch of grsecurity/PaX based on my understanding of the
> code. Changes or omissions from the original code are mine and don't
> reflect the original grsecurity/PaX code.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/buffer.c          | 2 +-
>  fs/dcache.c          | 3 ++-
>  include/linux/slab.h | 3 +++
>  kernel/fork.c        | 6 ++++--
>  mm/rmap.c            | 5 +++--
>  mm/slab.h            | 5 +++--
>  net/core/skbuff.c    | 6 ++++--
>  7 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 0faa41fb4c88..04a85bd4cf2e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3457,7 +3457,7 @@ void __init buffer_init(void)
>         bh_cachep = kmem_cache_create("buffer_head",
>                         sizeof(struct buffer_head), 0,
>                                 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
> -                               SLAB_MEM_SPREAD),
> +                               SLAB_MEM_SPREAD|SLAB_NO_FREE_INIT),
>                                 NULL);
>
>         /*
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 8136bda27a1f..323b039accba 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -3139,7 +3139,8 @@ void __init vfs_caches_init_early(void)
>  void __init vfs_caches_init(void)
>  {
>         names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NO_FREE_INIT, 0,
> +                       PATH_MAX, NULL);
>
>         dcache_init();
>         inode_init();
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9449b19c5f10..7eba9ad8830d 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -92,6 +92,9 @@
>  /* Avoid kmemleak tracing */
>  #define SLAB_NOLEAKTRACE       ((slab_flags_t __force)0x00800000U)
>
> +/* Exclude slab from zero-on-free when init_on_free is enabled */
> +#define SLAB_NO_FREE_INIT      ((slab_flags_t __force)0x01000000U)
> +
>  /* Fault injection mark */
>  #ifdef CONFIG_FAILSLAB
>  # define SLAB_FAILSLAB         ((slab_flags_t __force)0x02000000U)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b4cba953040a..9868585f5520 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2550,11 +2550,13 @@ void __init proc_caches_init(void)
>
>         mm_cachep = kmem_cache_create_usercopy("mm_struct",
>                         mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         offsetof(struct mm_struct, saved_auxv),
>                         sizeof_field(struct mm_struct, saved_auxv),
>                         NULL);
> -       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
> +       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT|
> +                                                   SLAB_NO_FREE_INIT);
>         mmap_init();
>         nsproxy_cache_init();
>  }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e5dfe2ae6b0d..b7b8013eeb0a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -432,10 +432,11 @@ static void anon_vma_ctor(void *data)
>  void __init anon_vma_init(void)
>  {
>         anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
> -                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
> +                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         anon_vma_ctor);
>         anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
> -                       SLAB_PANIC|SLAB_ACCOUNT);
> +                       SLAB_PANIC|SLAB_ACCOUNT|SLAB_NO_FREE_INIT);
>  }
>
>  /*
> diff --git a/mm/slab.h b/mm/slab.h
> index 24ae887359b8..f95b4f03c57b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -129,7 +129,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>  /* Legal flag mask for kmem_cache_create(), for various configurations */
>  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>                          SLAB_CACHE_DMA32 | SLAB_PANIC | \
> -                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
> +                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
> +                        SLAB_NO_FREE_INIT)
>
>  #if defined(CONFIG_DEBUG_SLAB)
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> @@ -535,7 +536,7 @@ static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
>  static inline bool slab_want_init_on_free(struct kmem_cache *c)
>  {
>         if (static_branch_unlikely(&init_on_free))
> -               return !(c->ctor);
> +               return !(c->ctor) && ((c->flags & SLAB_NO_FREE_INIT) == 0);
>         else
>                 return false;
>  }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e89be6282693..b65902d2c042 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3981,14 +3981,16 @@ void __init skb_init(void)
>         skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache",
>                                               sizeof(struct sk_buff),
>                                               0,
> -                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                             SLAB_NO_FREE_INIT,
>                                               offsetof(struct sk_buff, cb),
>                                               sizeof_field(struct sk_buff, cb),
>                                               NULL);
>         skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
>                                                 sizeof(struct sk_buff_fclones),
>                                                 0,
> -                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                               SLAB_NO_FREE_INIT,
>                                                 NULL);
>         skb_extensions_init();
>  }
> --
> 2.17.1
>
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches
@ 2019-05-17  8:34       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17  8:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 2:50 AM Kees Cook <keescook@chromium.org> wrote:
>
> In order to improve the init_on_free performance, some frequently
> freed caches with less sensitive contents can be excluded from the
> init_on_free behavior.
Did you see any notable performance improvement with this patch?
A similar one gave me only 1-2% on the parallel Linux build.
> This patch is modified from Brad Spengler/PaX Team's code in the
> last public patch of grsecurity/PaX based on my understanding of the
> code. Changes or omissions from the original code are mine and don't
> reflect the original grsecurity/PaX code.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/buffer.c          | 2 +-
>  fs/dcache.c          | 3 ++-
>  include/linux/slab.h | 3 +++
>  kernel/fork.c        | 6 ++++--
>  mm/rmap.c            | 5 +++--
>  mm/slab.h            | 5 +++--
>  net/core/skbuff.c    | 6 ++++--
>  7 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 0faa41fb4c88..04a85bd4cf2e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3457,7 +3457,7 @@ void __init buffer_init(void)
>         bh_cachep = kmem_cache_create("buffer_head",
>                         sizeof(struct buffer_head), 0,
>                                 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
> -                               SLAB_MEM_SPREAD),
> +                               SLAB_MEM_SPREAD|SLAB_NO_FREE_INIT),
>                                 NULL);
>
>         /*
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 8136bda27a1f..323b039accba 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -3139,7 +3139,8 @@ void __init vfs_caches_init_early(void)
>  void __init vfs_caches_init(void)
>  {
>         names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NO_FREE_INIT, 0,
> +                       PATH_MAX, NULL);
>
>         dcache_init();
>         inode_init();
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9449b19c5f10..7eba9ad8830d 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -92,6 +92,9 @@
>  /* Avoid kmemleak tracing */
>  #define SLAB_NOLEAKTRACE       ((slab_flags_t __force)0x00800000U)
>
> +/* Exclude slab from zero-on-free when init_on_free is enabled */
> +#define SLAB_NO_FREE_INIT      ((slab_flags_t __force)0x01000000U)
> +
>  /* Fault injection mark */
>  #ifdef CONFIG_FAILSLAB
>  # define SLAB_FAILSLAB         ((slab_flags_t __force)0x02000000U)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b4cba953040a..9868585f5520 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2550,11 +2550,13 @@ void __init proc_caches_init(void)
>
>         mm_cachep = kmem_cache_create_usercopy("mm_struct",
>                         mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         offsetof(struct mm_struct, saved_auxv),
>                         sizeof_field(struct mm_struct, saved_auxv),
>                         NULL);
> -       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
> +       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT|
> +                                                   SLAB_NO_FREE_INIT);
>         mmap_init();
>         nsproxy_cache_init();
>  }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e5dfe2ae6b0d..b7b8013eeb0a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -432,10 +432,11 @@ static void anon_vma_ctor(void *data)
>  void __init anon_vma_init(void)
>  {
>         anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
> -                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
> +                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         anon_vma_ctor);
>         anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
> -                       SLAB_PANIC|SLAB_ACCOUNT);
> +                       SLAB_PANIC|SLAB_ACCOUNT|SLAB_NO_FREE_INIT);
>  }
>
>  /*
> diff --git a/mm/slab.h b/mm/slab.h
> index 24ae887359b8..f95b4f03c57b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -129,7 +129,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>  /* Legal flag mask for kmem_cache_create(), for various configurations */
>  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>                          SLAB_CACHE_DMA32 | SLAB_PANIC | \
> -                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
> +                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
> +                        SLAB_NO_FREE_INIT)
>
>  #if defined(CONFIG_DEBUG_SLAB)
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> @@ -535,7 +536,7 @@ static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
>  static inline bool slab_want_init_on_free(struct kmem_cache *c)
>  {
>         if (static_branch_unlikely(&init_on_free))
> -               return !(c->ctor);
> +               return !(c->ctor) && ((c->flags & SLAB_NO_FREE_INIT) == 0);
>         else
>                 return false;
>  }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e89be6282693..b65902d2c042 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3981,14 +3981,16 @@ void __init skb_init(void)
>         skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache",
>                                               sizeof(struct sk_buff),
>                                               0,
> -                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                             SLAB_NO_FREE_INIT,
>                                               offsetof(struct sk_buff, cb),
>                                               sizeof_field(struct sk_buff, cb),
>                                               NULL);
>         skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
>                                                 sizeof(struct sk_buff_fclones),
>                                                 0,
> -                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                               SLAB_NO_FREE_INIT,
>                                                 NULL);
>         skb_extensions_init();
>  }
> --
> 2.17.1
>
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
  2019-05-17  0:26     ` Kees Cook
@ 2019-05-17  8:49         ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17  8:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 2:26 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, May 16, 2019 at 09:53:01AM -0700, Kees Cook wrote:
> > On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
> > > Add sock_alloc_send_pskb_noinit(), which is similar to
> > > sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
> > > This helps reduce the slowdown on hackbench in the init_on_alloc mode
> > > from 6.84% to 3.45%.
> >
> > Out of curiosity, why the creation of the new function over adding a
> > gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
> > are only 6 callers, and this change already updates 2 of those.)
> >
> > > Slowdown for the initialization features compared to init_on_free=0,
> > > init_on_alloc=0:
> > >
> > > hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
> > > hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)
>
> So I've run some of my own wall-clock timings of kernel builds (which
> should be an pretty big "worst case" situation, and I see much smaller
> performance changes:
How many cores were you using? I suspect the numbers may vary a bit
depending on that.
> everything off
>         Run times: 289.18 288.61 289.66 287.71 287.67
>         Min: 287.67 Max: 289.66 Mean: 288.57 Std Dev: 0.79
>                 baseline
>
> init_on_alloc=1
>         Run times: 289.72 286.95 287.87 287.34 287.35
>         Min: 286.95 Max: 289.72 Mean: 287.85 Std Dev: 0.98
>                 0.25% faster (within the std dev noise)
>
> init_on_free=1
>         Run times: 303.26 301.44 301.19 301.55 301.39
>         Min: 301.19 Max: 303.26 Mean: 301.77 Std Dev: 0.75
>                 4.57% slower
>
> init_on_free=1 with the PAX_MEMORY_SANITIZE slabs excluded:
>         Run times: 299.19 299.85 298.95 298.23 298.64
>         Min: 298.23 Max: 299.85 Mean: 298.97 Std Dev: 0.55
>                 3.60% slower
>
> So the tuning certainly improved things by 1%. My perf numbers don't
> show the 24% hit you were seeing at all, though.
Note that 24% is the _sys_ time slowdown. The wall time slowdown seen
in this case was 8.34%

> > In the commit log it might be worth mentioning that this is only
> > changing the init_on_alloc case (in case it's not already obvious to
> > folks). Perhaps there needs to be a split of __GFP_NO_AUTOINIT into
> > __GFP_NO_AUTO_ALLOC_INIT and __GFP_NO_AUTO_FREE_INIT? Right now
> > __GFP_NO_AUTOINIT is only checked for init_on_alloc:
>
> I was obviously crazy here. :) GFP isn't present for free(), but a SLAB
> flag works (as was done in PAX_MEMORY_SANITIZE). I'll send the patch I
> used for the above timing test.
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
@ 2019-05-17  8:49         ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17  8:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 2:26 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, May 16, 2019 at 09:53:01AM -0700, Kees Cook wrote:
> > On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
> > > Add sock_alloc_send_pskb_noinit(), which is similar to
> > > sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
> > > This helps reduce the slowdown on hackbench in the init_on_alloc mode
> > > from 6.84% to 3.45%.
> >
> > Out of curiosity, why the creation of the new function over adding a
> > gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
> > are only 6 callers, and this change already updates 2 of those.)
> >
> > > Slowdown for the initialization features compared to init_on_free=0,
> > > init_on_alloc=0:
> > >
> > > hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
> > > hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)
>
> So I've run some of my own wall-clock timings of kernel builds (which
> should be an pretty big "worst case" situation, and I see much smaller
> performance changes:
How many cores were you using? I suspect the numbers may vary a bit
depending on that.
> everything off
>         Run times: 289.18 288.61 289.66 287.71 287.67
>         Min: 287.67 Max: 289.66 Mean: 288.57 Std Dev: 0.79
>                 baseline
>
> init_on_alloc=1
>         Run times: 289.72 286.95 287.87 287.34 287.35
>         Min: 286.95 Max: 289.72 Mean: 287.85 Std Dev: 0.98
>                 0.25% faster (within the std dev noise)
>
> init_on_free=1
>         Run times: 303.26 301.44 301.19 301.55 301.39
>         Min: 301.19 Max: 303.26 Mean: 301.77 Std Dev: 0.75
>                 4.57% slower
>
> init_on_free=1 with the PAX_MEMORY_SANITIZE slabs excluded:
>         Run times: 299.19 299.85 298.95 298.23 298.64
>         Min: 298.23 Max: 299.85 Mean: 298.97 Std Dev: 0.55
>                 3.60% slower
>
> So the tuning certainly improved things by 1%. My perf numbers don't
> show the 24% hit you were seeing at all, though.
Note that 24% is the _sys_ time slowdown. The wall time slowdown seen
in this case was 8.34%

> > In the commit log it might be worth mentioning that this is only
> > changing the init_on_alloc case (in case it's not already obvious to
> > folks). Perhaps there needs to be a split of __GFP_NO_AUTOINIT into
> > __GFP_NO_AUTO_ALLOC_INIT and __GFP_NO_AUTO_FREE_INIT? Right now
> > __GFP_NO_AUTOINIT is only checked for init_on_alloc:
>
> I was obviously crazy here. :) GFP isn't present for free(), but a SLAB
> flag works (as was done in PAX_MEMORY_SANITIZE). I'll send the patch I
> used for the above timing test.
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-14 14:35   ` Alexander Potapenko
  (?)
@ 2019-05-17 12:59   ` Michal Hocko
  2019-05-17 13:18       ` Alexander Potapenko
  -1 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2019-05-17 12:59 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, keescook, kernel-hardening, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, linux-mm, linux-security-module

[It would be great to keep people involved in the previous version in the
CC list]

On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> When passed to an allocator (either pagealloc or SL[AOU]B),
> __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> init_on_alloc boot option is enabled. This can be useful in the cases
> newly allocated memory is going to be initialized by the caller right
> away.
> 
> __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> where init_on_free implies init_on_alloc.
> 
> __GFP_NO_AUTOINIT basically defeats the hardening against information
> leaks provided by init_on_alloc, so one should use it with caution.
> 
> This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> Doing so is safe, because the heap allocators initialize the pages they
> receive before passing memory to the callers.

I still do not like the idea of a new gfp flag as explained in the
previous email. People will simply use it incorectly or arbitrarily.
We have that juicy experience from the past.

Freeing a memory is an opt-in feature and the slab allocator can already
tell many (with constructor or GFP_ZERO) do not need it.

So can we go without this gfp thing and see whether somebody actually
finds a performance problem with the feature enabled and think about
what can we do about it rather than add this maint. nightmare from the
very beginning?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-17 12:59   ` Michal Hocko
@ 2019-05-17 13:18       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 13:18 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
<mhocko@kernel.org> wrote:
>
> [It would be great to keep people involved in the previous version in the
> CC list]
Yes, I've been trying to keep everyone in the loop, but your email
fell through the cracks.
Sorry about that.
> On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > When passed to an allocator (either pagealloc or SL[AOU]B),
> > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > init_on_alloc boot option is enabled. This can be useful in the cases
> > newly allocated memory is going to be initialized by the caller right
> > away.
> >
> > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > where init_on_free implies init_on_alloc.
> >
> > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > leaks provided by init_on_alloc, so one should use it with caution.
> >
> > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > Doing so is safe, because the heap allocators initialize the pages they
> > receive before passing memory to the callers.
>
> I still do not like the idea of a new gfp flag as explained in the
> previous email. People will simply use it incorectly or arbitrarily.
> We have that juicy experience from the past.

Just to preserve some context, here's the previous email:
https://patchwork.kernel.org/patch/10907595/
(plus the patch removing GFP_TEMPORARY for the curious ones:
https://lwn.net/Articles/729145/)

> Freeing a memory is an opt-in feature and the slab allocator can already
> tell many (with constructor or GFP_ZERO) do not need it.
Sorry, I didn't understand this piece. Could you please elaborate?

> So can we go without this gfp thing and see whether somebody actually
> finds a performance problem with the feature enabled and think about
> what can we do about it rather than add this maint. nightmare from the
> very beginning?

There were two reasons to introduce this flag initially.
The first was double initialization of pages allocated for SLUB.
However the benchmark results provided in this and the previous patch
don't show any noticeable difference - most certainly because the cost
of initializing the page is amortized.
The second one was to fine-tune hackbench, for which the slowdown
drops by a factor of 2.
But optimizing a mitigation for certain benchmarks is a questionable
measure, so maybe we could really go without it.

Kees, what do you think?
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
@ 2019-05-17 13:18       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 13:18 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
<mhocko@kernel.org> wrote:
>
> [It would be great to keep people involved in the previous version in the
> CC list]
Yes, I've been trying to keep everyone in the loop, but your email
fell through the cracks.
Sorry about that.
> On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > When passed to an allocator (either pagealloc or SL[AOU]B),
> > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > init_on_alloc boot option is enabled. This can be useful in the cases
> > newly allocated memory is going to be initialized by the caller right
> > away.
> >
> > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > where init_on_free implies init_on_alloc.
> >
> > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > leaks provided by init_on_alloc, so one should use it with caution.
> >
> > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > Doing so is safe, because the heap allocators initialize the pages they
> > receive before passing memory to the callers.
>
> I still do not like the idea of a new gfp flag as explained in the
> previous email. People will simply use it incorectly or arbitrarily.
> We have that juicy experience from the past.

Just to preserve some context, here's the previous email:
https://patchwork.kernel.org/patch/10907595/
(plus the patch removing GFP_TEMPORARY for the curious ones:
https://lwn.net/Articles/729145/)

> Freeing a memory is an opt-in feature and the slab allocator can already
> tell many (with constructor or GFP_ZERO) do not need it.
Sorry, I didn't understand this piece. Could you please elaborate?

> So can we go without this gfp thing and see whether somebody actually
> finds a performance problem with the feature enabled and think about
> what can we do about it rather than add this maint. nightmare from the
> very beginning?

There were two reasons to introduce this flag initially.
The first was double initialization of pages allocated for SLUB.
However the benchmark results provided in this and the previous patch
don't show any noticeable difference - most certainly because the cost
of initializing the page is amortized.
The second one was to fine-tune hackbench, for which the slowdown
drops by a factor of 2.
But optimizing a mitigation for certain benchmarks is a questionable
measure, so maybe we could really go without it.

Kees, what do you think?
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-17 13:18       ` Alexander Potapenko
  (?)
@ 2019-05-17 13:25       ` Michal Hocko
  2019-05-17 13:37           ` Alexander Potapenko
  -1 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2019-05-17 13:25 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Kees Cook, Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri 17-05-19 15:18:19, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
> <mhocko@kernel.org> wrote:
> >
> > [It would be great to keep people involved in the previous version in the
> > CC list]
> Yes, I've been trying to keep everyone in the loop, but your email
> fell through the cracks.
> Sorry about that.

No problem

> > On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > > When passed to an allocator (either pagealloc or SL[AOU]B),
> > > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > > init_on_alloc boot option is enabled. This can be useful in the cases
> > > newly allocated memory is going to be initialized by the caller right
> > > away.
> > >
> > > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > where init_on_free implies init_on_alloc.
> > >
> > > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > > leaks provided by init_on_alloc, so one should use it with caution.
> > >
> > > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > > Doing so is safe, because the heap allocators initialize the pages they
> > > receive before passing memory to the callers.
> >
> > I still do not like the idea of a new gfp flag as explained in the
> > previous email. People will simply use it incorectly or arbitrarily.
> > We have that juicy experience from the past.
> 
> Just to preserve some context, here's the previous email:
> https://patchwork.kernel.org/patch/10907595/
> (plus the patch removing GFP_TEMPORARY for the curious ones:
> https://lwn.net/Articles/729145/)

Not only. GFP_REPEAT being another one and probably others I cannot
remember from the top of my head.

> > Freeing a memory is an opt-in feature and the slab allocator can already
> > tell many (with constructor or GFP_ZERO) do not need it.
> Sorry, I didn't understand this piece. Could you please elaborate?

The allocator can assume that caches with a constructor will initialize
the object so additional zeroying is not needed. GFP_ZERO should be self
explanatory.

> > So can we go without this gfp thing and see whether somebody actually
> > finds a performance problem with the feature enabled and think about
> > what can we do about it rather than add this maint. nightmare from the
> > very beginning?
> 
> There were two reasons to introduce this flag initially.
> The first was double initialization of pages allocated for SLUB.

Could you elaborate please?

> However the benchmark results provided in this and the previous patch
> don't show any noticeable difference - most certainly because the cost
> of initializing the page is amortized.

> The second one was to fine-tune hackbench, for which the slowdown
> drops by a factor of 2.
> But optimizing a mitigation for certain benchmarks is a questionable
> measure, so maybe we could really go without it.

Agreed. Over optimization based on an artificial workloads tend to be
dubious IMHO.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-17 13:25       ` Michal Hocko
@ 2019-05-17 13:37           ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 13:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri, May 17, 2019 at 3:25 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 17-05-19 15:18:19, Alexander Potapenko wrote:
> > On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
> > <mhocko@kernel.org> wrote:
> > >
> > > [It would be great to keep people involved in the previous version in the
> > > CC list]
> > Yes, I've been trying to keep everyone in the loop, but your email
> > fell through the cracks.
> > Sorry about that.
>
> No problem
>
> > > On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > > > When passed to an allocator (either pagealloc or SL[AOU]B),
> > > > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > > > init_on_alloc boot option is enabled. This can be useful in the cases
> > > > newly allocated memory is going to be initialized by the caller right
> > > > away.
> > > >
> > > > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > > where init_on_free implies init_on_alloc.
> > > >
> > > > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > > > leaks provided by init_on_alloc, so one should use it with caution.
> > > >
> > > > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > > > Doing so is safe, because the heap allocators initialize the pages they
> > > > receive before passing memory to the callers.
> > >
> > > I still do not like the idea of a new gfp flag as explained in the
> > > previous email. People will simply use it incorectly or arbitrarily.
> > > We have that juicy experience from the past.
> >
> > Just to preserve some context, here's the previous email:
> > https://patchwork.kernel.org/patch/10907595/
> > (plus the patch removing GFP_TEMPORARY for the curious ones:
> > https://lwn.net/Articles/729145/)
>
> Not only. GFP_REPEAT being another one and probably others I cannot
> remember from the top of my head.
>
> > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > tell many (with constructor or GFP_ZERO) do not need it.
> > Sorry, I didn't understand this piece. Could you please elaborate?
>
> The allocator can assume that caches with a constructor will initialize
> the object so additional zeroying is not needed. GFP_ZERO should be self
> explanatory.
Ah, I see. We already do that, see the want_init_on_alloc()
implementation here: https://patchwork.kernel.org/patch/10943087/
> > > So can we go without this gfp thing and see whether somebody actually
> > > finds a performance problem with the feature enabled and think about
> > > what can we do about it rather than add this maint. nightmare from the
> > > very beginning?
> >
> > There were two reasons to introduce this flag initially.
> > The first was double initialization of pages allocated for SLUB.
>
> Could you elaborate please?
When the kernel allocates an object from SLUB, and SLUB happens to be
short on free pages, it requests some from the page allocator.
Those pages are initialized by the page allocator and split into
objects. Finally SLUB initializes one of the available objects and
returns it back to the kernel.
Therefore the object is initialized twice for the first time (when it
comes directly from the page allocator).
This cost is however amortized by SLUB reusing the object after it's been freed.

> > However the benchmark results provided in this and the previous patch
> > don't show any noticeable difference - most certainly because the cost
> > of initializing the page is amortized.
>
> > The second one was to fine-tune hackbench, for which the slowdown
> > drops by a factor of 2.
> > But optimizing a mitigation for certain benchmarks is a questionable
> > measure, so maybe we could really go without it.
>
> Agreed. Over optimization based on an artificial workloads tend to be
> dubious IMHO.
>
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
@ 2019-05-17 13:37           ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 13:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri, May 17, 2019 at 3:25 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 17-05-19 15:18:19, Alexander Potapenko wrote:
> > On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
> > <mhocko@kernel.org> wrote:
> > >
> > > [It would be great to keep people involved in the previous version in the
> > > CC list]
> > Yes, I've been trying to keep everyone in the loop, but your email
> > fell through the cracks.
> > Sorry about that.
>
> No problem
>
> > > On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > > > When passed to an allocator (either pagealloc or SL[AOU]B),
> > > > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > > > init_on_alloc boot option is enabled. This can be useful in the cases
> > > > newly allocated memory is going to be initialized by the caller right
> > > > away.
> > > >
> > > > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > > where init_on_free implies init_on_alloc.
> > > >
> > > > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > > > leaks provided by init_on_alloc, so one should use it with caution.
> > > >
> > > > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > > > Doing so is safe, because the heap allocators initialize the pages they
> > > > receive before passing memory to the callers.
> > >
> > > I still do not like the idea of a new gfp flag as explained in the
> > > previous email. People will simply use it incorectly or arbitrarily.
> > > We have that juicy experience from the past.
> >
> > Just to preserve some context, here's the previous email:
> > https://patchwork.kernel.org/patch/10907595/
> > (plus the patch removing GFP_TEMPORARY for the curious ones:
> > https://lwn.net/Articles/729145/)
>
> Not only. GFP_REPEAT being another one and probably others I cannot
> remember from the top of my head.
>
> > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > tell many (with constructor or GFP_ZERO) do not need it.
> > Sorry, I didn't understand this piece. Could you please elaborate?
>
> The allocator can assume that caches with a constructor will initialize
> the object so additional zeroying is not needed. GFP_ZERO should be self
> explanatory.
Ah, I see. We already do that, see the want_init_on_alloc()
implementation here: https://patchwork.kernel.org/patch/10943087/
> > > So can we go without this gfp thing and see whether somebody actually
> > > finds a performance problem with the feature enabled and think about
> > > what can we do about it rather than add this maint. nightmare from the
> > > very beginning?
> >
> > There were two reasons to introduce this flag initially.
> > The first was double initialization of pages allocated for SLUB.
>
> Could you elaborate please?
When the kernel allocates an object from SLUB, and SLUB happens to be
short on free pages, it requests some from the page allocator.
Those pages are initialized by the page allocator and split into
objects. Finally SLUB initializes one of the available objects and
returns it back to the kernel.
Therefore the object is initialized twice for the first time (when it
comes directly from the page allocator).
This cost is however amortized by SLUB reusing the object after it's been freed.

> > However the benchmark results provided in this and the previous patch
> > don't show any noticeable difference - most certainly because the cost
> > of initializing the page is amortized.
>
> > The second one was to fine-tune hackbench, for which the slowdown
> > drops by a factor of 2.
> > But optimizing a mitigation for certain benchmarks is a questionable
> > measure, so maybe we could really go without it.
>
> Agreed. Over optimization based on an artificial workloads tend to be
> dubious IMHO.
>
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
  2019-05-17  8:49         ` Alexander Potapenko
@ 2019-05-17 13:50           ` Alexander Potapenko
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 13:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 10:49 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Fri, May 17, 2019 at 2:26 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Thu, May 16, 2019 at 09:53:01AM -0700, Kees Cook wrote:
> > > On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
> > > > Add sock_alloc_send_pskb_noinit(), which is similar to
> > > > sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
> > > > This helps reduce the slowdown on hackbench in the init_on_alloc mode
> > > > from 6.84% to 3.45%.
> > >
> > > Out of curiosity, why the creation of the new function over adding a
> > > gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
> > > are only 6 callers, and this change already updates 2 of those.)
> > >
> > > > Slowdown for the initialization features compared to init_on_free=0,
> > > > init_on_alloc=0:
> > > >
> > > > hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
> > > > hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)
> >
> > So I've run some of my own wall-clock timings of kernel builds (which
> > should be an pretty big "worst case" situation, and I see much smaller
> > performance changes:
> How many cores were you using? I suspect the numbers may vary a bit
> depending on that.
> > everything off
> >         Run times: 289.18 288.61 289.66 287.71 287.67
> >         Min: 287.67 Max: 289.66 Mean: 288.57 Std Dev: 0.79
> >                 baseline
> >
> > init_on_alloc=1
> >         Run times: 289.72 286.95 287.87 287.34 287.35
> >         Min: 286.95 Max: 289.72 Mean: 287.85 Std Dev: 0.98
> >                 0.25% faster (within the std dev noise)
> >
> > init_on_free=1
> >         Run times: 303.26 301.44 301.19 301.55 301.39
> >         Min: 301.19 Max: 303.26 Mean: 301.77 Std Dev: 0.75
> >                 4.57% slower
> >
> > init_on_free=1 with the PAX_MEMORY_SANITIZE slabs excluded:
> >         Run times: 299.19 299.85 298.95 298.23 298.64
> >         Min: 298.23 Max: 299.85 Mean: 298.97 Std Dev: 0.55
> >                 3.60% slower
> >
> > So the tuning certainly improved things by 1%. My perf numbers don't
> > show the 24% hit you were seeing at all, though.
> Note that 24% is the _sys_ time slowdown. The wall time slowdown seen
> in this case was 8.34%
I've collected more stats running QEMU with different numbers of cores.
The slowdown values of init_on_free compared to baseline are:
2 CPUs - 5.94% for wall time (20.08% for sys time)
6 CPUs - 7.43% for wall time (23.55% for sys time)
12 CPUs - 8.41% for wall time (24.25% for sys time)
24 CPUs - 9.49% for wall time (17.98% for sys time)

I'm building a defconfig of some fixed KMSAN tree with Clang, but that
shouldn't matter much.

> > > In the commit log it might be worth mentioning that this is only
> > > changing the init_on_alloc case (in case it's not already obvious to
> > > folks). Perhaps there needs to be a split of __GFP_NO_AUTOINIT into
> > > __GFP_NO_AUTO_ALLOC_INIT and __GFP_NO_AUTO_FREE_INIT? Right now
> > > __GFP_NO_AUTOINIT is only checked for init_on_alloc:
> >
> > I was obviously crazy here. :) GFP isn't present for free(), but a SLAB
> > flag works (as was done in PAX_MEMORY_SANITIZE). I'll send the patch I
> > used for the above timing test.
> >
> > --
> > Kees Cook
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
@ 2019-05-17 13:50           ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 13:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 10:49 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Fri, May 17, 2019 at 2:26 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Thu, May 16, 2019 at 09:53:01AM -0700, Kees Cook wrote:
> > > On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
> > > > Add sock_alloc_send_pskb_noinit(), which is similar to
> > > > sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
> > > > This helps reduce the slowdown on hackbench in the init_on_alloc mode
> > > > from 6.84% to 3.45%.
> > >
> > > Out of curiosity, why the creation of the new function over adding a
> > > gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
> > > are only 6 callers, and this change already updates 2 of those.)
> > >
> > > > Slowdown for the initialization features compared to init_on_free=0,
> > > > init_on_alloc=0:
> > > >
> > > > hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
> > > > hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)
> >
> > So I've run some of my own wall-clock timings of kernel builds (which
> > should be an pretty big "worst case" situation, and I see much smaller
> > performance changes:
> How many cores were you using? I suspect the numbers may vary a bit
> depending on that.
> > everything off
> >         Run times: 289.18 288.61 289.66 287.71 287.67
> >         Min: 287.67 Max: 289.66 Mean: 288.57 Std Dev: 0.79
> >                 baseline
> >
> > init_on_alloc=1
> >         Run times: 289.72 286.95 287.87 287.34 287.35
> >         Min: 286.95 Max: 289.72 Mean: 287.85 Std Dev: 0.98
> >                 0.25% faster (within the std dev noise)
> >
> > init_on_free=1
> >         Run times: 303.26 301.44 301.19 301.55 301.39
> >         Min: 301.19 Max: 303.26 Mean: 301.77 Std Dev: 0.75
> >                 4.57% slower
> >
> > init_on_free=1 with the PAX_MEMORY_SANITIZE slabs excluded:
> >         Run times: 299.19 299.85 298.95 298.23 298.64
> >         Min: 298.23 Max: 299.85 Mean: 298.97 Std Dev: 0.55
> >                 3.60% slower
> >
> > So the tuning certainly improved things by 1%. My perf numbers don't
> > show the 24% hit you were seeing at all, though.
> Note that 24% is the _sys_ time slowdown. The wall time slowdown seen
> in this case was 8.34%
I've collected more stats running QEMU with different numbers of cores.
The slowdown values of init_on_free compared to baseline are:
2 CPUs - 5.94% for wall time (20.08% for sys time)
6 CPUs - 7.43% for wall time (23.55% for sys time)
12 CPUs - 8.41% for wall time (24.25% for sys time)
24 CPUs - 9.49% for wall time (17.98% for sys time)

I'm building a defconfig of some fixed KMSAN tree with Clang, but that
shouldn't matter much.

> > > In the commit log it might be worth mentioning that this is only
> > > changing the init_on_alloc case (in case it's not already obvious to
> > > folks). Perhaps there needs to be a split of __GFP_NO_AUTOINIT into
> > > __GFP_NO_AUTO_ALLOC_INIT and __GFP_NO_AUTO_FREE_INIT? Right now
> > > __GFP_NO_AUTOINIT is only checked for init_on_alloc:
> >
> > I was obviously crazy here. :) GFP isn't present for free(), but a SLAB
> > flag works (as was done in PAX_MEMORY_SANITIZE). I'll send the patch I
> > used for the above timing test.
> >
> > --
> > Kees Cook
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-17 13:37           ` Alexander Potapenko
  (?)
@ 2019-05-17 14:01           ` Michal Hocko
  2019-05-17 16:27             ` Kees Cook
  -1 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2019-05-17 14:01 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Kees Cook, Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 3:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 17-05-19 15:18:19, Alexander Potapenko wrote:
> > > On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
> > > <mhocko@kernel.org> wrote:
> > > >
> > > > [It would be great to keep people involved in the previous version in the
> > > > CC list]
> > > Yes, I've been trying to keep everyone in the loop, but your email
> > > fell through the cracks.
> > > Sorry about that.
> >
> > No problem
> >
> > > > On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > > > > When passed to an allocator (either pagealloc or SL[AOU]B),
> > > > > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > > > > init_on_alloc boot option is enabled. This can be useful in the cases
> > > > > newly allocated memory is going to be initialized by the caller right
> > > > > away.
> > > > >
> > > > > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > > > where init_on_free implies init_on_alloc.
> > > > >
> > > > > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > > > > leaks provided by init_on_alloc, so one should use it with caution.
> > > > >
> > > > > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > > > > Doing so is safe, because the heap allocators initialize the pages they
> > > > > receive before passing memory to the callers.
> > > >
> > > > I still do not like the idea of a new gfp flag as explained in the
> > > > previous email. People will simply use it incorectly or arbitrarily.
> > > > We have that juicy experience from the past.
> > >
> > > Just to preserve some context, here's the previous email:
> > > https://patchwork.kernel.org/patch/10907595/
> > > (plus the patch removing GFP_TEMPORARY for the curious ones:
> > > https://lwn.net/Articles/729145/)
> >
> > Not only. GFP_REPEAT being another one and probably others I cannot
> > remember from the top of my head.
> >
> > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > Sorry, I didn't understand this piece. Could you please elaborate?
> >
> > The allocator can assume that caches with a constructor will initialize
> > the object so additional zeroying is not needed. GFP_ZERO should be self
> > explanatory.
> Ah, I see. We already do that, see the want_init_on_alloc()
> implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > So can we go without this gfp thing and see whether somebody actually
> > > > finds a performance problem with the feature enabled and think about
> > > > what can we do about it rather than add this maint. nightmare from the
> > > > very beginning?
> > >
> > > There were two reasons to introduce this flag initially.
> > > The first was double initialization of pages allocated for SLUB.
> >
> > Could you elaborate please?
> When the kernel allocates an object from SLUB, and SLUB happens to be
> short on free pages, it requests some from the page allocator.
> Those pages are initialized by the page allocator

... when the feature is enabled ...

> and split into objects. Finally SLUB initializes one of the available
> objects and returns it back to the kernel.
> Therefore the object is initialized twice for the first time (when it
> comes directly from the page allocator).
> This cost is however amortized by SLUB reusing the object after it's been freed.

OK, I see what you mean now. Is there any way to special case the page
allocation for this feature? E.g. your implementation tries to make this
zeroying special but why cannot you simply do this


struct page *
____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
							nodemask_t *nodemask)
{
	//current implementation
}

struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
							nodemask_t *nodemask)
{
	if (your_feature_enabled)
		gfp_mask |= __GFP_ZERO;
	return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
					nodemask);
}

and use ____alloc_pages_nodemask from the slab or other internal
allocators?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-14 14:35   ` Alexander Potapenko
                     ` (2 preceding siblings ...)
  (?)
@ 2019-05-17 14:04   ` Michal Hocko
  2019-05-17 14:11       ` Alexander Potapenko
  -1 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2019-05-17 14:04 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, cl, keescook, kernel-hardening, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, linux-mm,
	linux-security-module

On Tue 14-05-19 16:35:34, Alexander Potapenko wrote:
> The new options are needed to prevent possible information leaks and
> make control-flow bugs that depend on uninitialized values more
> deterministic.
> 
> init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> objects with zeroes. Initialization is done at allocation time at the
> places where checks for __GFP_ZERO are performed.
> 
> init_on_free=1 makes the kernel initialize freed pages and heap objects
> with zeroes upon their deletion. This helps to ensure sensitive data
> doesn't leak via use-after-free accesses.

Why do we need both? The later is more robust because even free memory
cannot be sniffed and the overhead might be shifted from the allocation
context (e.g. to RCU) but why cannot we stick to a single model?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-17 14:04   ` Michal Hocko
@ 2019-05-17 14:11       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 14:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 4:04 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 14-05-19 16:35:34, Alexander Potapenko wrote:
> > The new options are needed to prevent possible information leaks and
> > make control-flow bugs that depend on uninitialized values more
> > deterministic.
> >
> > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> > objects with zeroes. Initialization is done at allocation time at the
> > places where checks for __GFP_ZERO are performed.
> >
> > init_on_free=1 makes the kernel initialize freed pages and heap objects
> > with zeroes upon their deletion. This helps to ensure sensitive data
> > doesn't leak via use-after-free accesses.
>
> Why do we need both? The later is more robust because even free memory
> cannot be sniffed and the overhead might be shifted from the allocation
> context (e.g. to RCU) but why cannot we stick to a single model?
init_on_free appears to be slower because of cache effects. It's
several % in the best case vs. <1% for init_on_alloc.

> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
@ 2019-05-17 14:11       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 14:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 4:04 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 14-05-19 16:35:34, Alexander Potapenko wrote:
> > The new options are needed to prevent possible information leaks and
> > make control-flow bugs that depend on uninitialized values more
> > deterministic.
> >
> > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> > objects with zeroes. Initialization is done at allocation time at the
> > places where checks for __GFP_ZERO are performed.
> >
> > init_on_free=1 makes the kernel initialize freed pages and heap objects
> > with zeroes upon their deletion. This helps to ensure sensitive data
> > doesn't leak via use-after-free accesses.
>
> Why do we need both? The later is more robust because even free memory
> cannot be sniffed and the overhead might be shifted from the allocation
> context (e.g. to RCU) but why cannot we stick to a single model?
init_on_free appears to be slower because of cache effects. It's
several % in the best case vs. <1% for init_on_alloc.

> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-17 14:11       ` Alexander Potapenko
  (?)
@ 2019-05-17 14:20       ` Michal Hocko
  2019-05-17 16:36         ` Kees Cook
  -1 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2019-05-17 14:20 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri 17-05-19 16:11:32, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 4:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 14-05-19 16:35:34, Alexander Potapenko wrote:
> > > The new options are needed to prevent possible information leaks and
> > > make control-flow bugs that depend on uninitialized values more
> > > deterministic.
> > >
> > > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> > > objects with zeroes. Initialization is done at allocation time at the
> > > places where checks for __GFP_ZERO are performed.
> > >
> > > init_on_free=1 makes the kernel initialize freed pages and heap objects
> > > with zeroes upon their deletion. This helps to ensure sensitive data
> > > doesn't leak via use-after-free accesses.
> >
> > Why do we need both? The later is more robust because even free memory
> > cannot be sniffed and the overhead might be shifted from the allocation
> > context (e.g. to RCU) but why cannot we stick to a single model?
> init_on_free appears to be slower because of cache effects. It's
> several % in the best case vs. <1% for init_on_alloc.

This doesn't really explain why we need both.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-17  1:26   ` Kees Cook
@ 2019-05-17 14:38       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 14:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 3:26 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, May 14, 2019 at 04:35:34PM +0200, Alexander Potapenko wrote:
> > [...]
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 43ac818b8592..24ae887359b8 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -524,4 +524,20 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
> > [...]
> > +static inline bool slab_want_init_on_free(struct kmem_cache *c)
> > +{
> > +     if (static_branch_unlikely(&init_on_free))
> > +             return !(c->ctor);
>
> BTW, why is this checking for c->ctor here? Shouldn't it not matter for
> the free case?
It does matter, see e.g. the handling of __OBJECT_POISON in slub.c
If we just return true here, the kernel crashes.
> > +     else
> > +             return false;
> > +}
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
@ 2019-05-17 14:38       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 14:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 3:26 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, May 14, 2019 at 04:35:34PM +0200, Alexander Potapenko wrote:
> > [...]
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 43ac818b8592..24ae887359b8 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -524,4 +524,20 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
> > [...]
> > +static inline bool slab_want_init_on_free(struct kmem_cache *c)
> > +{
> > +     if (static_branch_unlikely(&init_on_free))
> > +             return !(c->ctor);
>
> BTW, why is this checking for c->ctor here? Shouldn't it not matter for
> the free case?
It does matter, see e.g. the handling of __OBJECT_POISON in slub.c
If we just return true here, the kernel crashes.
> > +     else
> > +             return false;
> > +}
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 2/4] lib: introduce test_meminit module
  2019-05-16  1:02   ` Kees Cook
@ 2019-05-17 15:51       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 15:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Jann Horn,
	Linux Memory Management List, linux-security-module

On Thu, May 16, 2019 at 3:02 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, May 14, 2019 at 04:35:35PM +0200, Alexander Potapenko wrote:
> > Add tests for heap and pagealloc initialization.
> > These can be used to check init_on_alloc and init_on_free implementations
> > as well as other approaches to initialization.
>
> This is nice! Easy way to test the results. It might be helpful to show
> here what to expect when loading this module:
Do you want me to add the expected output to the patch description?
> with either init_on_alloc=1 or init_on_free=1, I happily see:
>
>         test_meminit: all 10 tests in test_pages passed
>         test_meminit: all 40 tests in test_kvmalloc passed
>         test_meminit: all 20 tests in test_kmemcache passed
>         test_meminit: all 70 tests passed!
>
> and without:
>
>         test_meminit: test_pages failed 10 out of 10 times
>         test_meminit: test_kvmalloc failed 40 out of 40 times
>         test_meminit: test_kmemcache failed 10 out of 20 times
>         test_meminit: failures: 60 out of 70
>
>
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Tested-by: Kees Cook <keescook@chromium.org>
>
> note below...
>
> > [...]
> > diff --git a/lib/test_meminit.c b/lib/test_meminit.c
> > new file mode 100644
> > index 000000000000..67d759498030
> > --- /dev/null
> > +++ b/lib/test_meminit.c
> > @@ -0,0 +1,205 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > [...]
> > +module_init(test_meminit_init);
>
> I get a warning at build about missing the license:
>
> WARNING: modpost: missing MODULE_LICENSE() in lib/test_meminit.o
>
> So, following the SPDX line, just add:
>
> MODULE_LICENSE("GPL");
Will do, thanks!
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 2/4] lib: introduce test_meminit module
@ 2019-05-17 15:51       ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-17 15:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Jann Horn,
	Linux Memory Management List, linux-security-module

On Thu, May 16, 2019 at 3:02 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, May 14, 2019 at 04:35:35PM +0200, Alexander Potapenko wrote:
> > Add tests for heap and pagealloc initialization.
> > These can be used to check init_on_alloc and init_on_free implementations
> > as well as other approaches to initialization.
>
> This is nice! Easy way to test the results. It might be helpful to show
> here what to expect when loading this module:
Do you want me to add the expected output to the patch description?
> with either init_on_alloc=1 or init_on_free=1, I happily see:
>
>         test_meminit: all 10 tests in test_pages passed
>         test_meminit: all 40 tests in test_kvmalloc passed
>         test_meminit: all 20 tests in test_kmemcache passed
>         test_meminit: all 70 tests passed!
>
> and without:
>
>         test_meminit: test_pages failed 10 out of 10 times
>         test_meminit: test_kvmalloc failed 40 out of 40 times
>         test_meminit: test_kmemcache failed 10 out of 20 times
>         test_meminit: failures: 60 out of 70
>
>
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Tested-by: Kees Cook <keescook@chromium.org>
>
> note below...
>
> > [...]
> > diff --git a/lib/test_meminit.c b/lib/test_meminit.c
> > new file mode 100644
> > index 000000000000..67d759498030
> > --- /dev/null
> > +++ b/lib/test_meminit.c
> > @@ -0,0 +1,205 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > [...]
> > +module_init(test_meminit_init);
>
> I get a warning at build about missing the license:
>
> WARNING: modpost: missing MODULE_LICENSE() in lib/test_meminit.o
>
> So, following the SPDX line, just add:
>
> MODULE_LICENSE("GPL");
Will do, thanks!
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches
  2019-05-17  8:34       ` Alexander Potapenko
  (?)
@ 2019-05-17 15:59       ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2019-05-17 15:59 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 10:34:26AM +0200, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 2:50 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > In order to improve the init_on_free performance, some frequently
> > freed caches with less sensitive contents can be excluded from the
> > init_on_free behavior.
> Did you see any notable performance improvement with this patch?
> A similar one gave me only 1-2% on the parallel Linux build.

Yup, that's in the other thread. I saw similar. But 1-2% on a 5% hit is
a lot. ;)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations
  2019-05-17  8:49         ` Alexander Potapenko
  (?)
  (?)
@ 2019-05-17 16:13         ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2019-05-17 16:13 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 10:49:03AM +0200, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 2:26 AM Kees Cook <keescook@chromium.org> wrote:
> > On Thu, May 16, 2019 at 09:53:01AM -0700, Kees Cook wrote:
> > > On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
> > > > Add sock_alloc_send_pskb_noinit(), which is similar to
> > > > sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
> > > > This helps reduce the slowdown on hackbench in the init_on_alloc mode
> > > > from 6.84% to 3.45%.
> > >
> > > Out of curiosity, why the creation of the new function over adding a
> > > gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
> > > are only 6 callers, and this change already updates 2 of those.)
> > >
> > > > Slowdown for the initialization features compared to init_on_free=0,
> > > > init_on_alloc=0:
> > > >
> > > > hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
> > > > hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)
> >
> > So I've run some of my own wall-clock timings of kernel builds (which
> > should be an pretty big "worst case" situation, and I see much smaller
> > performance changes:
> How many cores were you using? I suspect the numbers may vary a bit
> depending on that.

I was using 4.

> > init_on_alloc=1
> >         Run times: 289.72 286.95 287.87 287.34 287.35
> >         Min: 286.95 Max: 289.72 Mean: 287.85 Std Dev: 0.98
> >                 0.25% faster (within the std dev noise)
> >
> > init_on_free=1
> >         Run times: 303.26 301.44 301.19 301.55 301.39
> >         Min: 301.19 Max: 303.26 Mean: 301.77 Std Dev: 0.75
> >                 4.57% slower
> >
> > init_on_free=1 with the PAX_MEMORY_SANITIZE slabs excluded:
> >         Run times: 299.19 299.85 298.95 298.23 298.64
> >         Min: 298.23 Max: 299.85 Mean: 298.97 Std Dev: 0.55
> >                 3.60% slower
> >
> > So the tuning certainly improved things by 1%. My perf numbers don't
> > show the 24% hit you were seeing at all, though.
> Note that 24% is the _sys_ time slowdown. The wall time slowdown seen
> in this case was 8.34%

Ah! Gotcha. Yeah, seems the impact for init_on_free is pretty
variable. The init_on_alloc appears close to free, though.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-17 14:01           ` Michal Hocko
@ 2019-05-17 16:27             ` Kees Cook
  2019-05-17 17:11               ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2019-05-17 16:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Kernel Hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, Souptick Joarder, Matthew Wilcox,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > >
> > > The allocator can assume that caches with a constructor will initialize
> > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > explanatory.
> > Ah, I see. We already do that, see the want_init_on_alloc()
> > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > finds a performance problem with the feature enabled and think about
> > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > very beginning?
> > > >
> > > > There were two reasons to introduce this flag initially.
> > > > The first was double initialization of pages allocated for SLUB.
> > >
> > > Could you elaborate please?
> > When the kernel allocates an object from SLUB, and SLUB happens to be
> > short on free pages, it requests some from the page allocator.
> > Those pages are initialized by the page allocator
> 
> ... when the feature is enabled ...
> 
> > and split into objects. Finally SLUB initializes one of the available
> > objects and returns it back to the kernel.
> > Therefore the object is initialized twice for the first time (when it
> > comes directly from the page allocator).
> > This cost is however amortized by SLUB reusing the object after it's been freed.
> 
> OK, I see what you mean now. Is there any way to special case the page
> allocation for this feature? E.g. your implementation tries to make this
> zeroying special but why cannot you simply do this
> 
> 
> struct page *
> ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> 							nodemask_t *nodemask)
> {
> 	//current implementation
> }
> 
> struct page *
> __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> 							nodemask_t *nodemask)
> {
> 	if (your_feature_enabled)
> 		gfp_mask |= __GFP_ZERO;
> 	return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> 					nodemask);
> }
> 
> and use ____alloc_pages_nodemask from the slab or other internal
> allocators?

If an additional allocator function is preferred over a new GFP flag, then
I don't see any reason not to do this. (Though adding more "__"s seems
a bit unfriendly to code-documentation.) What might be better naming?

This would mean that the skb changes later in the series would use the
"no auto init" version of the allocator too, then.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-17 14:20       ` Michal Hocko
@ 2019-05-17 16:36         ` Kees Cook
  2019-05-17 17:11           ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2019-05-17 16:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Kernel Hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, Linux Memory Management List,
	linux-security-module

On Fri, May 17, 2019 at 04:20:48PM +0200, Michal Hocko wrote:
> On Fri 17-05-19 16:11:32, Alexander Potapenko wrote:
> > On Fri, May 17, 2019 at 4:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Tue 14-05-19 16:35:34, Alexander Potapenko wrote:
> > > > The new options are needed to prevent possible information leaks and
> > > > make control-flow bugs that depend on uninitialized values more
> > > > deterministic.
> > > >
> > > > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> > > > objects with zeroes. Initialization is done at allocation time at the
> > > > places where checks for __GFP_ZERO are performed.
> > > >
> > > > init_on_free=1 makes the kernel initialize freed pages and heap objects
> > > > with zeroes upon their deletion. This helps to ensure sensitive data
> > > > doesn't leak via use-after-free accesses.
> > >
> > > Why do we need both? The later is more robust because even free memory
> > > cannot be sniffed and the overhead might be shifted from the allocation
> > > context (e.g. to RCU) but why cannot we stick to a single model?
> > init_on_free appears to be slower because of cache effects. It's
> > several % in the best case vs. <1% for init_on_alloc.
> 
> This doesn't really explain why we need both.

There are a couple reasons. The first is that once we have hardware with
memory tagging (e.g. arm64's MTE) we'll need both on_alloc and on_free
hooks to do change the tags. With MTE, zeroing comes for "free" with
tagging (though tagging is as slow as zeroing, so it's really the tagging
that is free...), so we'll need to re-use the init_on_free infrastructure.

The second reason is for very paranoid use-cases where in-memory
data lifetime is desired to be minimized. There are various arguments
for/against the realism of the associated threat models, but given that
we'll need the infrastructre for MTE anyway, and there are people who
want wipe-on-free behavior no matter what the performance cost, it seems
reasonable to include it in this series.

All that said, init_on_alloc looks desirable enough that distros will
likely build with it enabled by default (I hope), and the very paranoid
users will switch to (or additionally enable) init_on_free for their
systems.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 2/4] lib: introduce test_meminit module
  2019-05-17 15:51       ` Alexander Potapenko
  (?)
@ 2019-05-17 16:37       ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2019-05-17 16:37 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Jann Horn,
	Linux Memory Management List, linux-security-module

On Fri, May 17, 2019 at 05:51:17PM +0200, Alexander Potapenko wrote:
> On Thu, May 16, 2019 at 3:02 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Tue, May 14, 2019 at 04:35:35PM +0200, Alexander Potapenko wrote:
> > > Add tests for heap and pagealloc initialization.
> > > These can be used to check init_on_alloc and init_on_free implementations
> > > as well as other approaches to initialization.
> >
> > This is nice! Easy way to test the results. It might be helpful to show
> > here what to expect when loading this module:
>
> Do you want me to add the expected output to the patch description?

Yes, I think it's worth having, as a way to show people what to expect
when running the test, without having to actually enable, build, and
run it themselves.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-17 16:27             ` Kees Cook
@ 2019-05-17 17:11               ` Michal Hocko
  2019-05-21 14:18                   ` Alexander Potapenko
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2019-05-17 17:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Kernel Hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, Souptick Joarder, Matthew Wilcox,
	Linux Memory Management List, linux-security-module

On Fri 17-05-19 09:27:54, Kees Cook wrote:
> On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> > On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > > >
> > > > The allocator can assume that caches with a constructor will initialize
> > > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > > explanatory.
> > > Ah, I see. We already do that, see the want_init_on_alloc()
> > > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > > finds a performance problem with the feature enabled and think about
> > > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > > very beginning?
> > > > >
> > > > > There were two reasons to introduce this flag initially.
> > > > > The first was double initialization of pages allocated for SLUB.
> > > >
> > > > Could you elaborate please?
> > > When the kernel allocates an object from SLUB, and SLUB happens to be
> > > short on free pages, it requests some from the page allocator.
> > > Those pages are initialized by the page allocator
> > 
> > ... when the feature is enabled ...
> > 
> > > and split into objects. Finally SLUB initializes one of the available
> > > objects and returns it back to the kernel.
> > > Therefore the object is initialized twice for the first time (when it
> > > comes directly from the page allocator).
> > > This cost is however amortized by SLUB reusing the object after it's been freed.
> > 
> > OK, I see what you mean now. Is there any way to special case the page
> > allocation for this feature? E.g. your implementation tries to make this
> > zeroying special but why cannot you simply do this
> > 
> > 
> > struct page *
> > ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > 							nodemask_t *nodemask)
> > {
> > 	//current implementation
> > }
> > 
> > struct page *
> > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > 							nodemask_t *nodemask)
> > {
> > 	if (your_feature_enabled)
> > 		gfp_mask |= __GFP_ZERO;
> > 	return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> > 					nodemask);
> > }
> > 
> > and use ____alloc_pages_nodemask from the slab or other internal
> > allocators?
> 
> If an additional allocator function is preferred over a new GFP flag, then
> I don't see any reason not to do this. (Though adding more "__"s seems
> a bit unfriendly to code-documentation.) What might be better naming?

The naminig is the last thing I would be worried about. Let's focus on
the most simplistic implementation first. And means, can we really make
it as simple as above? At least on the page allocator level.

> This would mean that the skb changes later in the series would use the
> "no auto init" version of the allocator too, then.

No, this would be an internal function to MM. I would really like to
optimize once there are numbers from _real_ workloads to base those
optimizations.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  2019-05-17 16:36         ` Kees Cook
@ 2019-05-17 17:11           ` Michal Hocko
  0 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2019-05-17 17:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Kernel Hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, Linux Memory Management List,
	linux-security-module

On Fri 17-05-19 09:36:36, Kees Cook wrote:
> On Fri, May 17, 2019 at 04:20:48PM +0200, Michal Hocko wrote:
> > On Fri 17-05-19 16:11:32, Alexander Potapenko wrote:
> > > On Fri, May 17, 2019 at 4:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Tue 14-05-19 16:35:34, Alexander Potapenko wrote:
> > > > > The new options are needed to prevent possible information leaks and
> > > > > make control-flow bugs that depend on uninitialized values more
> > > > > deterministic.
> > > > >
> > > > > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> > > > > objects with zeroes. Initialization is done at allocation time at the
> > > > > places where checks for __GFP_ZERO are performed.
> > > > >
> > > > > init_on_free=1 makes the kernel initialize freed pages and heap objects
> > > > > with zeroes upon their deletion. This helps to ensure sensitive data
> > > > > doesn't leak via use-after-free accesses.
> > > >
> > > > Why do we need both? The later is more robust because even free memory
> > > > cannot be sniffed and the overhead might be shifted from the allocation
> > > > context (e.g. to RCU) but why cannot we stick to a single model?
> > > init_on_free appears to be slower because of cache effects. It's
> > > several % in the best case vs. <1% for init_on_alloc.
> > 
> > This doesn't really explain why we need both.
> 
> There are a couple reasons. The first is that once we have hardware with
> memory tagging (e.g. arm64's MTE) we'll need both on_alloc and on_free
> hooks to do change the tags. With MTE, zeroing comes for "free" with
> tagging (though tagging is as slow as zeroing, so it's really the tagging
> that is free...), so we'll need to re-use the init_on_free infrastructure.

I am not sure I follow, but ...
> 
> The second reason is for very paranoid use-cases where in-memory
> data lifetime is desired to be minimized. There are various arguments
> for/against the realism of the associated threat models, but given that
> we'll need the infrastructre for MTE anyway, and there are people who
> want wipe-on-free behavior no matter what the performance cost, it seems
> reasonable to include it in this series.
> 
> All that said, init_on_alloc looks desirable enough that distros will
> likely build with it enabled by default (I hope), and the very paranoid
> users will switch to (or additionally enable) init_on_free for their
> systems.

... this should all be part of the changelog.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches
  2019-05-17  0:50   ` [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches Kees Cook
@ 2019-05-20  6:10       ` Mathias Krause
  2019-05-20  6:10       ` Mathias Krause
  1 sibling, 0 replies; 52+ messages in thread
From: Mathias Krause @ 2019-05-20  6:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, linux-mm, linux-security-module

Hi Kees,

On Fri, 17 May 2019 at 02:50, Kees Cook <keescook@chromium.org> wrote:
> In order to improve the init_on_free performance, some frequently
> freed caches with less sensitive contents can be excluded from the
> init_on_free behavior.
>
> This patch is modified from Brad Spengler/PaX Team's code in the
> last public patch of grsecurity/PaX based on my understanding of the
> code. Changes or omissions from the original code are mine and don't
> reflect the original grsecurity/PaX code.

you might want to give secunet credit for this one, as can be seen here:

  https://github.com/minipli/linux-grsec/commit/309d494e7a3f6533ca68fc8b3bd89fa76fd2c2df

However, please keep the "Changes or omissions from the original
code..." part as your version slightly differs.

Thanks,
Mathias

>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/buffer.c          | 2 +-
>  fs/dcache.c          | 3 ++-
>  include/linux/slab.h | 3 +++
>  kernel/fork.c        | 6 ++++--
>  mm/rmap.c            | 5 +++--
>  mm/slab.h            | 5 +++--
>  net/core/skbuff.c    | 6 ++++--
>  7 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 0faa41fb4c88..04a85bd4cf2e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3457,7 +3457,7 @@ void __init buffer_init(void)
>         bh_cachep = kmem_cache_create("buffer_head",
>                         sizeof(struct buffer_head), 0,
>                                 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
> -                               SLAB_MEM_SPREAD),
> +                               SLAB_MEM_SPREAD|SLAB_NO_FREE_INIT),
>                                 NULL);
>
>         /*
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 8136bda27a1f..323b039accba 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -3139,7 +3139,8 @@ void __init vfs_caches_init_early(void)
>  void __init vfs_caches_init(void)
>  {
>         names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NO_FREE_INIT, 0,
> +                       PATH_MAX, NULL);
>
>         dcache_init();
>         inode_init();
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9449b19c5f10..7eba9ad8830d 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -92,6 +92,9 @@
>  /* Avoid kmemleak tracing */
>  #define SLAB_NOLEAKTRACE       ((slab_flags_t __force)0x00800000U)
>
> +/* Exclude slab from zero-on-free when init_on_free is enabled */
> +#define SLAB_NO_FREE_INIT      ((slab_flags_t __force)0x01000000U)
> +
>  /* Fault injection mark */
>  #ifdef CONFIG_FAILSLAB
>  # define SLAB_FAILSLAB         ((slab_flags_t __force)0x02000000U)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b4cba953040a..9868585f5520 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2550,11 +2550,13 @@ void __init proc_caches_init(void)
>
>         mm_cachep = kmem_cache_create_usercopy("mm_struct",
>                         mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         offsetof(struct mm_struct, saved_auxv),
>                         sizeof_field(struct mm_struct, saved_auxv),
>                         NULL);
> -       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
> +       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT|
> +                                                   SLAB_NO_FREE_INIT);
>         mmap_init();
>         nsproxy_cache_init();
>  }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e5dfe2ae6b0d..b7b8013eeb0a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -432,10 +432,11 @@ static void anon_vma_ctor(void *data)
>  void __init anon_vma_init(void)
>  {
>         anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
> -                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
> +                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         anon_vma_ctor);
>         anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
> -                       SLAB_PANIC|SLAB_ACCOUNT);
> +                       SLAB_PANIC|SLAB_ACCOUNT|SLAB_NO_FREE_INIT);
>  }
>
>  /*
> diff --git a/mm/slab.h b/mm/slab.h
> index 24ae887359b8..f95b4f03c57b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -129,7 +129,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>  /* Legal flag mask for kmem_cache_create(), for various configurations */
>  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>                          SLAB_CACHE_DMA32 | SLAB_PANIC | \
> -                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
> +                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
> +                        SLAB_NO_FREE_INIT)
>
>  #if defined(CONFIG_DEBUG_SLAB)
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> @@ -535,7 +536,7 @@ static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
>  static inline bool slab_want_init_on_free(struct kmem_cache *c)
>  {
>         if (static_branch_unlikely(&init_on_free))
> -               return !(c->ctor);
> +               return !(c->ctor) && ((c->flags & SLAB_NO_FREE_INIT) == 0);
>         else
>                 return false;
>  }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e89be6282693..b65902d2c042 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3981,14 +3981,16 @@ void __init skb_init(void)
>         skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache",
>                                               sizeof(struct sk_buff),
>                                               0,
> -                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                             SLAB_NO_FREE_INIT,
>                                               offsetof(struct sk_buff, cb),
>                                               sizeof_field(struct sk_buff, cb),
>                                               NULL);
>         skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
>                                                 sizeof(struct sk_buff_fclones),
>                                                 0,
> -                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                               SLAB_NO_FREE_INIT,
>                                                 NULL);
>         skb_extensions_init();
>  }
> --
> 2.17.1
>
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches
@ 2019-05-20  6:10       ` Mathias Krause
  0 siblings, 0 replies; 52+ messages in thread
From: Mathias Krause @ 2019-05-20  6:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, linux-mm, linux-security-module

Hi Kees,

On Fri, 17 May 2019 at 02:50, Kees Cook <keescook@chromium.org> wrote:
> In order to improve the init_on_free performance, some frequently
> freed caches with less sensitive contents can be excluded from the
> init_on_free behavior.
>
> This patch is modified from Brad Spengler/PaX Team's code in the
> last public patch of grsecurity/PaX based on my understanding of the
> code. Changes or omissions from the original code are mine and don't
> reflect the original grsecurity/PaX code.

you might want to give secunet credit for this one, as can be seen here:

  https://github.com/minipli/linux-grsec/commit/309d494e7a3f6533ca68fc8b3bd89fa76fd2c2df

However, please keep the "Changes or omissions from the original
code..." part as your version slightly differs.

Thanks,
Mathias

>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/buffer.c          | 2 +-
>  fs/dcache.c          | 3 ++-
>  include/linux/slab.h | 3 +++
>  kernel/fork.c        | 6 ++++--
>  mm/rmap.c            | 5 +++--
>  mm/slab.h            | 5 +++--
>  net/core/skbuff.c    | 6 ++++--
>  7 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 0faa41fb4c88..04a85bd4cf2e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3457,7 +3457,7 @@ void __init buffer_init(void)
>         bh_cachep = kmem_cache_create("buffer_head",
>                         sizeof(struct buffer_head), 0,
>                                 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
> -                               SLAB_MEM_SPREAD),
> +                               SLAB_MEM_SPREAD|SLAB_NO_FREE_INIT),
>                                 NULL);
>
>         /*
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 8136bda27a1f..323b039accba 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -3139,7 +3139,8 @@ void __init vfs_caches_init_early(void)
>  void __init vfs_caches_init(void)
>  {
>         names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NO_FREE_INIT, 0,
> +                       PATH_MAX, NULL);
>
>         dcache_init();
>         inode_init();
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9449b19c5f10..7eba9ad8830d 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -92,6 +92,9 @@
>  /* Avoid kmemleak tracing */
>  #define SLAB_NOLEAKTRACE       ((slab_flags_t __force)0x00800000U)
>
> +/* Exclude slab from zero-on-free when init_on_free is enabled */
> +#define SLAB_NO_FREE_INIT      ((slab_flags_t __force)0x01000000U)
> +
>  /* Fault injection mark */
>  #ifdef CONFIG_FAILSLAB
>  # define SLAB_FAILSLAB         ((slab_flags_t __force)0x02000000U)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b4cba953040a..9868585f5520 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2550,11 +2550,13 @@ void __init proc_caches_init(void)
>
>         mm_cachep = kmem_cache_create_usercopy("mm_struct",
>                         mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
> -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         offsetof(struct mm_struct, saved_auxv),
>                         sizeof_field(struct mm_struct, saved_auxv),
>                         NULL);
> -       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
> +       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT|
> +                                                   SLAB_NO_FREE_INIT);
>         mmap_init();
>         nsproxy_cache_init();
>  }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e5dfe2ae6b0d..b7b8013eeb0a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -432,10 +432,11 @@ static void anon_vma_ctor(void *data)
>  void __init anon_vma_init(void)
>  {
>         anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
> -                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
> +                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT|
> +                       SLAB_NO_FREE_INIT,
>                         anon_vma_ctor);
>         anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
> -                       SLAB_PANIC|SLAB_ACCOUNT);
> +                       SLAB_PANIC|SLAB_ACCOUNT|SLAB_NO_FREE_INIT);
>  }
>
>  /*
> diff --git a/mm/slab.h b/mm/slab.h
> index 24ae887359b8..f95b4f03c57b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -129,7 +129,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>  /* Legal flag mask for kmem_cache_create(), for various configurations */
>  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>                          SLAB_CACHE_DMA32 | SLAB_PANIC | \
> -                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
> +                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
> +                        SLAB_NO_FREE_INIT)
>
>  #if defined(CONFIG_DEBUG_SLAB)
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> @@ -535,7 +536,7 @@ static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
>  static inline bool slab_want_init_on_free(struct kmem_cache *c)
>  {
>         if (static_branch_unlikely(&init_on_free))
> -               return !(c->ctor);
> +               return !(c->ctor) && ((c->flags & SLAB_NO_FREE_INIT) == 0);
>         else
>                 return false;
>  }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e89be6282693..b65902d2c042 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3981,14 +3981,16 @@ void __init skb_init(void)
>         skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache",
>                                               sizeof(struct sk_buff),
>                                               0,
> -                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                             SLAB_NO_FREE_INIT,
>                                               offsetof(struct sk_buff, cb),
>                                               sizeof_field(struct sk_buff, cb),
>                                               NULL);
>         skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
>                                                 sizeof(struct sk_buff_fclones),
>                                                 0,
> -                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                               SLAB_NO_FREE_INIT,
>                                                 NULL);
>         skb_extensions_init();
>  }
> --
> 2.17.1
>
>
> --
> Kees Cook


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches
  2019-05-20  6:10       ` Mathias Krause
  (?)
@ 2019-05-20 16:12       ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2019-05-20 16:12 UTC (permalink / raw)
  To: Mathias Krause
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	kernel-hardening, Masahiro Yamada, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov,
	Sandeep Patil, Laura Abbott, Randy Dunlap, Jann Horn,
	Mark Rutland, linux-mm, linux-security-module

On Mon, May 20, 2019 at 08:10:19AM +0200, Mathias Krause wrote:
> Hi Kees,
> 
> On Fri, 17 May 2019 at 02:50, Kees Cook <keescook@chromium.org> wrote:
> > In order to improve the init_on_free performance, some frequently
> > freed caches with less sensitive contents can be excluded from the
> > init_on_free behavior.
> >
> > This patch is modified from Brad Spengler/PaX Team's code in the
> > last public patch of grsecurity/PaX based on my understanding of the
> > code. Changes or omissions from the original code are mine and don't
> > reflect the original grsecurity/PaX code.
> 
> you might want to give secunet credit for this one, as can be seen here:
> 
>   https://github.com/minipli/linux-grsec/commit/309d494e7a3f6533ca68fc8b3bd89fa76fd2c2df
> 
> However, please keep the "Changes or omissions from the original
> code..." part as your version slightly differs.

Ah-ha! Thanks for finding the specific commit; I'll adjust
attribution. Are you able to describe how you chose the various excluded
kmem caches? (I assume it wasn't just "highest numbers in the stats
reporting".) And why run the ctor after wipe? Doesn't that means you're
just running the ctor again at the next allocation time?

Thanks!

-Kees

> 
> Thanks,
> Mathias
> 
> >
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> >  fs/buffer.c          | 2 +-
> >  fs/dcache.c          | 3 ++-
> >  include/linux/slab.h | 3 +++
> >  kernel/fork.c        | 6 ++++--
> >  mm/rmap.c            | 5 +++--
> >  mm/slab.h            | 5 +++--
> >  net/core/skbuff.c    | 6 ++++--
> >  7 files changed, 20 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 0faa41fb4c88..04a85bd4cf2e 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -3457,7 +3457,7 @@ void __init buffer_init(void)
> >         bh_cachep = kmem_cache_create("buffer_head",
> >                         sizeof(struct buffer_head), 0,
> >                                 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
> > -                               SLAB_MEM_SPREAD),
> > +                               SLAB_MEM_SPREAD|SLAB_NO_FREE_INIT),
> >                                 NULL);
> >
> >         /*
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 8136bda27a1f..323b039accba 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -3139,7 +3139,8 @@ void __init vfs_caches_init_early(void)
> >  void __init vfs_caches_init(void)
> >  {
> >         names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
> > -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
> > +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NO_FREE_INIT, 0,
> > +                       PATH_MAX, NULL);
> >
> >         dcache_init();
> >         inode_init();
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 9449b19c5f10..7eba9ad8830d 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -92,6 +92,9 @@
> >  /* Avoid kmemleak tracing */
> >  #define SLAB_NOLEAKTRACE       ((slab_flags_t __force)0x00800000U)
> >
> > +/* Exclude slab from zero-on-free when init_on_free is enabled */
> > +#define SLAB_NO_FREE_INIT      ((slab_flags_t __force)0x01000000U)
> > +
> >  /* Fault injection mark */
> >  #ifdef CONFIG_FAILSLAB
> >  # define SLAB_FAILSLAB         ((slab_flags_t __force)0x02000000U)
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index b4cba953040a..9868585f5520 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2550,11 +2550,13 @@ void __init proc_caches_init(void)
> >
> >         mm_cachep = kmem_cache_create_usercopy("mm_struct",
> >                         mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
> > -                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
> > +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|
> > +                       SLAB_NO_FREE_INIT,
> >                         offsetof(struct mm_struct, saved_auxv),
> >                         sizeof_field(struct mm_struct, saved_auxv),
> >                         NULL);
> > -       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
> > +       vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT|
> > +                                                   SLAB_NO_FREE_INIT);
> >         mmap_init();
> >         nsproxy_cache_init();
> >  }
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index e5dfe2ae6b0d..b7b8013eeb0a 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -432,10 +432,11 @@ static void anon_vma_ctor(void *data)
> >  void __init anon_vma_init(void)
> >  {
> >         anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
> > -                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
> > +                       0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT|
> > +                       SLAB_NO_FREE_INIT,
> >                         anon_vma_ctor);
> >         anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
> > -                       SLAB_PANIC|SLAB_ACCOUNT);
> > +                       SLAB_PANIC|SLAB_ACCOUNT|SLAB_NO_FREE_INIT);
> >  }
> >
> >  /*
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 24ae887359b8..f95b4f03c57b 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -129,7 +129,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
> >  /* Legal flag mask for kmem_cache_create(), for various configurations */
> >  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
> >                          SLAB_CACHE_DMA32 | SLAB_PANIC | \
> > -                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
> > +                        SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
> > +                        SLAB_NO_FREE_INIT)
> >
> >  #if defined(CONFIG_DEBUG_SLAB)
> >  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
> > @@ -535,7 +536,7 @@ static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
> >  static inline bool slab_want_init_on_free(struct kmem_cache *c)
> >  {
> >         if (static_branch_unlikely(&init_on_free))
> > -               return !(c->ctor);
> > +               return !(c->ctor) && ((c->flags & SLAB_NO_FREE_INIT) == 0);
> >         else
> >                 return false;
> >  }
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index e89be6282693..b65902d2c042 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -3981,14 +3981,16 @@ void __init skb_init(void)
> >         skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache",
> >                                               sizeof(struct sk_buff),
> >                                               0,
> > -                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> > +                                             SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> > +                                             SLAB_NO_FREE_INIT,
> >                                               offsetof(struct sk_buff, cb),
> >                                               sizeof_field(struct sk_buff, cb),
> >                                               NULL);
> >         skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
> >                                                 sizeof(struct sk_buff_fclones),
> >                                                 0,
> > -                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> > +                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> > +                                               SLAB_NO_FREE_INIT,
> >                                                 NULL);
> >         skb_extensions_init();
> >  }
> > --
> > 2.17.1
> >
> >
> > --
> > Kees Cook

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-17 17:11               ` Michal Hocko
@ 2019-05-21 14:18                   ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-21 14:18 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri, May 17, 2019 at 7:11 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 17-05-19 09:27:54, Kees Cook wrote:
> > On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> > > On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > > > >
> > > > > The allocator can assume that caches with a constructor will initialize
> > > > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > > > explanatory.
> > > > Ah, I see. We already do that, see the want_init_on_alloc()
> > > > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > > > finds a performance problem with the feature enabled and think about
> > > > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > > > very beginning?
> > > > > >
> > > > > > There were two reasons to introduce this flag initially.
> > > > > > The first was double initialization of pages allocated for SLUB.
> > > > >
> > > > > Could you elaborate please?
> > > > When the kernel allocates an object from SLUB, and SLUB happens to be
> > > > short on free pages, it requests some from the page allocator.
> > > > Those pages are initialized by the page allocator
> > >
> > > ... when the feature is enabled ...
> > >
> > > > and split into objects. Finally SLUB initializes one of the available
> > > > objects and returns it back to the kernel.
> > > > Therefore the object is initialized twice for the first time (when it
> > > > comes directly from the page allocator).
> > > > This cost is however amortized by SLUB reusing the object after it's been freed.
> > >
> > > OK, I see what you mean now. Is there any way to special case the page
> > > allocation for this feature? E.g. your implementation tries to make this
> > > zeroying special but why cannot you simply do this
> > >
> > >
> > > struct page *
> > > ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >                                                     nodemask_t *nodemask)
> > > {
> > >     //current implementation
> > > }
> > >
> > > struct page *
> > > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >                                                     nodemask_t *nodemask)
> > > {
> > >     if (your_feature_enabled)
> > >             gfp_mask |= __GFP_ZERO;
> > >     return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> > >                                     nodemask);
> > > }
> > >
> > > and use ____alloc_pages_nodemask from the slab or other internal
> > > allocators?
Given that calling alloc_pages() with __GFP_NO_AUTOINIT doesn't
visibly improve the chosen benchmarks,
and the next patch in the series ("net: apply __GFP_NO_AUTOINIT to
AF_UNIX sk_buff allocations") only improves hackbench,
shall we maybe drop both patches altogether?
> > If an additional allocator function is preferred over a new GFP flag, then
> > I don't see any reason not to do this. (Though adding more "__"s seems
> > a bit unfriendly to code-documentation.) What might be better naming?
>
> The naminig is the last thing I would be worried about. Let's focus on
> the most simplistic implementation first. And means, can we really make
> it as simple as above? At least on the page allocator level.
>
> > This would mean that the skb changes later in the series would use the
> > "no auto init" version of the allocator too, then.
>
> No, this would be an internal function to MM. I would really like to
> optimize once there are numbers from _real_ workloads to base those
> optimizations.
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
@ 2019-05-21 14:18                   ` Alexander Potapenko
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Potapenko @ 2019-05-21 14:18 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Fri, May 17, 2019 at 7:11 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 17-05-19 09:27:54, Kees Cook wrote:
> > On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> > > On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > > > >
> > > > > The allocator can assume that caches with a constructor will initialize
> > > > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > > > explanatory.
> > > > Ah, I see. We already do that, see the want_init_on_alloc()
> > > > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > > > finds a performance problem with the feature enabled and think about
> > > > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > > > very beginning?
> > > > > >
> > > > > > There were two reasons to introduce this flag initially.
> > > > > > The first was double initialization of pages allocated for SLUB.
> > > > >
> > > > > Could you elaborate please?
> > > > When the kernel allocates an object from SLUB, and SLUB happens to be
> > > > short on free pages, it requests some from the page allocator.
> > > > Those pages are initialized by the page allocator
> > >
> > > ... when the feature is enabled ...
> > >
> > > > and split into objects. Finally SLUB initializes one of the available
> > > > objects and returns it back to the kernel.
> > > > Therefore the object is initialized twice for the first time (when it
> > > > comes directly from the page allocator).
> > > > This cost is however amortized by SLUB reusing the object after it's been freed.
> > >
> > > OK, I see what you mean now. Is there any way to special case the page
> > > allocation for this feature? E.g. your implementation tries to make this
> > > zeroying special but why cannot you simply do this
> > >
> > >
> > > struct page *
> > > ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >                                                     nodemask_t *nodemask)
> > > {
> > >     //current implementation
> > > }
> > >
> > > struct page *
> > > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >                                                     nodemask_t *nodemask)
> > > {
> > >     if (your_feature_enabled)
> > >             gfp_mask |= __GFP_ZERO;
> > >     return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> > >                                     nodemask);
> > > }
> > >
> > > and use ____alloc_pages_nodemask from the slab or other internal
> > > allocators?
Given that calling alloc_pages() with __GFP_NO_AUTOINIT doesn't
visibly improve the chosen benchmarks,
and the next patch in the series ("net: apply __GFP_NO_AUTOINIT to
AF_UNIX sk_buff allocations") only improves hackbench,
shall we maybe drop both patches altogether?
> > If an additional allocator function is preferred over a new GFP flag, then
> > I don't see any reason not to do this. (Though adding more "__"s seems
> > a bit unfriendly to code-documentation.) What might be better naming?
>
> The naminig is the last thing I would be worried about. Let's focus on
> the most simplistic implementation first. And means, can we really make
> it as simple as above? At least on the page allocator level.
>
> > This would mean that the skb changes later in the series would use the
> > "no auto init" version of the allocator too, then.
>
> No, this would be an internal function to MM. I would really like to
> optimize once there are numbers from _real_ workloads to base those
> optimizations.
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT
  2019-05-21 14:18                   ` Alexander Potapenko
  (?)
@ 2019-05-21 14:25                   ` Michal Hocko
  -1 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2019-05-21 14:25 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Kees Cook, Andrew Morton, Christoph Lameter, Kernel Hardening,
	Masahiro Yamada, James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Souptick Joarder,
	Matthew Wilcox, Linux Memory Management List,
	linux-security-module

On Tue 21-05-19 16:18:37, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 7:11 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 17-05-19 09:27:54, Kees Cook wrote:
> > > On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> > > > On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > > > > >
> > > > > > The allocator can assume that caches with a constructor will initialize
> > > > > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > > > > explanatory.
> > > > > Ah, I see. We already do that, see the want_init_on_alloc()
> > > > > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > > > > finds a performance problem with the feature enabled and think about
> > > > > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > > > > very beginning?
> > > > > > >
> > > > > > > There were two reasons to introduce this flag initially.
> > > > > > > The first was double initialization of pages allocated for SLUB.
> > > > > >
> > > > > > Could you elaborate please?
> > > > > When the kernel allocates an object from SLUB, and SLUB happens to be
> > > > > short on free pages, it requests some from the page allocator.
> > > > > Those pages are initialized by the page allocator
> > > >
> > > > ... when the feature is enabled ...
> > > >
> > > > > and split into objects. Finally SLUB initializes one of the available
> > > > > objects and returns it back to the kernel.
> > > > > Therefore the object is initialized twice for the first time (when it
> > > > > comes directly from the page allocator).
> > > > > This cost is however amortized by SLUB reusing the object after it's been freed.
> > > >
> > > > OK, I see what you mean now. Is there any way to special case the page
> > > > allocation for this feature? E.g. your implementation tries to make this
> > > > zeroying special but why cannot you simply do this
> > > >
> > > >
> > > > struct page *
> > > > ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > > >                                                     nodemask_t *nodemask)
> > > > {
> > > >     //current implementation
> > > > }
> > > >
> > > > struct page *
> > > > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > > >                                                     nodemask_t *nodemask)
> > > > {
> > > >     if (your_feature_enabled)
> > > >             gfp_mask |= __GFP_ZERO;
> > > >     return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> > > >                                     nodemask);
> > > > }
> > > >
> > > > and use ____alloc_pages_nodemask from the slab or other internal
> > > > allocators?
> Given that calling alloc_pages() with __GFP_NO_AUTOINIT doesn't
> visibly improve the chosen benchmarks,
> and the next patch in the series ("net: apply __GFP_NO_AUTOINIT to
> AF_UNIX sk_buff allocations") only improves hackbench,
> shall we maybe drop both patches altogether?

Ohh, by all means. I was suggesting the same few emails ago. The above
is just a hint on how to implement the feature on the page allocator
level rather than hooking into the prep_new_page and add another branch
to zero memory.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2019-05-21 14:25 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-14 14:35 [PATCH v2 0/4] RFC: add init_on_alloc/init_on_free boot options Alexander Potapenko
2019-05-14 14:35 ` [PATCH v2 1/4] mm: security: introduce init_on_alloc=1 and init_on_free=1 " Alexander Potapenko
2019-05-14 14:35   ` Alexander Potapenko
2019-05-16 16:19   ` Kees Cook
2019-05-16 16:42     ` Alexander Potapenko
2019-05-16 16:42       ` Alexander Potapenko
2019-05-16 17:03       ` Kees Cook
2019-05-17  1:26   ` Kees Cook
2019-05-17 14:38     ` Alexander Potapenko
2019-05-17 14:38       ` Alexander Potapenko
2019-05-17 14:04   ` Michal Hocko
2019-05-17 14:11     ` Alexander Potapenko
2019-05-17 14:11       ` Alexander Potapenko
2019-05-17 14:20       ` Michal Hocko
2019-05-17 16:36         ` Kees Cook
2019-05-17 17:11           ` Michal Hocko
2019-05-14 14:35 ` [PATCH v2 2/4] lib: introduce test_meminit module Alexander Potapenko
2019-05-14 14:35   ` Alexander Potapenko
2019-05-16  1:02   ` Kees Cook
2019-05-17 15:51     ` Alexander Potapenko
2019-05-17 15:51       ` Alexander Potapenko
2019-05-17 16:37       ` Kees Cook
2019-05-14 14:35 ` [PATCH v2 3/4] gfp: mm: introduce __GFP_NO_AUTOINIT Alexander Potapenko
2019-05-14 14:35   ` Alexander Potapenko
2019-05-17 12:59   ` Michal Hocko
2019-05-17 13:18     ` Alexander Potapenko
2019-05-17 13:18       ` Alexander Potapenko
2019-05-17 13:25       ` Michal Hocko
2019-05-17 13:37         ` Alexander Potapenko
2019-05-17 13:37           ` Alexander Potapenko
2019-05-17 14:01           ` Michal Hocko
2019-05-17 16:27             ` Kees Cook
2019-05-17 17:11               ` Michal Hocko
2019-05-21 14:18                 ` Alexander Potapenko
2019-05-21 14:18                   ` Alexander Potapenko
2019-05-21 14:25                   ` Michal Hocko
2019-05-14 14:35 ` [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations Alexander Potapenko
2019-05-14 14:35   ` Alexander Potapenko
2019-05-16 16:53   ` Kees Cook
2019-05-17  0:26     ` Kees Cook
2019-05-17  8:49       ` Alexander Potapenko
2019-05-17  8:49         ` Alexander Potapenko
2019-05-17 13:50         ` Alexander Potapenko
2019-05-17 13:50           ` Alexander Potapenko
2019-05-17 16:13         ` Kees Cook
2019-05-17  0:50   ` [PATCH 5/4] mm: Introduce SLAB_NO_FREE_INIT and mark excluded caches Kees Cook
2019-05-17  8:34     ` Alexander Potapenko
2019-05-17  8:34       ` Alexander Potapenko
2019-05-17 15:59       ` Kees Cook
2019-05-20  6:10     ` Mathias Krause
2019-05-20  6:10       ` Mathias Krause
2019-05-20 16:12       ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.