All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] RFC: add init_allocations=1 boot option
@ 2019-04-18 15:42 ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

Following the recent discussions here's another take at initializing
pages and heap objects with zeroes. This is needed to prevent possible
information leaks and make the control-flow bugs that depend on
uninitialized values more deterministic.

The patchset introduces a new boot option, init_allocations, which
makes page allocator and SL[AOU]B initialize newly allocated memory.
init_allocations=0 doesn't (hopefully) add any overhead to the
allocation fast path (no noticeable slowdown on hackbench).

With only the the first of the proposed patches the slowdown numbers are:
 - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
 - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
 - 9.4% (stdev 0.5%) sys time slowdown on hackbench

The second patch introduces a GFP flag that allows to disable
initialization for certain allocations. The third page is an example of
applying it to af_unix.c, which helps hackbench greatly.

Slowdown numbers for the whole patchset are:
 - 1.8% (stdev 0.8%) on kernel build
 - 6.5% (stdev 0.2%) on af_inet_loopback
 - 0.12% (stdev 0.6%) on hackbench


Alexander Potapenko (3):
  mm: security: introduce the init_allocations=1 boot option
  gfp: mm: introduce __GFP_NOINIT
  net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations

 drivers/infiniband/core/uverbs_ioctl.c |  2 +-
 include/linux/gfp.h                    |  6 ++++-
 include/linux/mm.h                     |  8 +++++++
 include/linux/slab_def.h               |  1 +
 include/linux/slub_def.h               |  1 +
 include/net/sock.h                     |  5 +++++
 kernel/kexec_core.c                    |  4 ++--
 mm/dmapool.c                           |  2 +-
 mm/page_alloc.c                        | 18 ++++++++++++++-
 mm/slab.c                              | 14 ++++++------
 mm/slab.h                              |  1 +
 mm/slab_common.c                       | 15 +++++++++++++
 mm/slob.c                              |  3 ++-
 mm/slub.c                              |  9 ++++----
 net/core/sock.c                        | 31 +++++++++++++++++++++-----
 net/unix/af_unix.c                     | 13 ++++++-----
 16 files changed, 104 insertions(+), 29 deletions(-)

-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/3] RFC: add init_allocations=1 boot option
@ 2019-04-18 15:42 ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

Following the recent discussions here's another take at initializing
pages and heap objects with zeroes. This is needed to prevent possible
information leaks and make the control-flow bugs that depend on
uninitialized values more deterministic.

The patchset introduces a new boot option, init_allocations, which
makes page allocator and SL[AOU]B initialize newly allocated memory.
init_allocations=0 doesn't (hopefully) add any overhead to the
allocation fast path (no noticeable slowdown on hackbench).

With only the the first of the proposed patches the slowdown numbers are:
 - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
 - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
 - 9.4% (stdev 0.5%) sys time slowdown on hackbench

The second patch introduces a GFP flag that allows to disable
initialization for certain allocations. The third page is an example of
applying it to af_unix.c, which helps hackbench greatly.

Slowdown numbers for the whole patchset are:
 - 1.8% (stdev 0.8%) on kernel build
 - 6.5% (stdev 0.2%) on af_inet_loopback
 - 0.12% (stdev 0.6%) on hackbench


Alexander Potapenko (3):
  mm: security: introduce the init_allocations=1 boot option
  gfp: mm: introduce __GFP_NOINIT
  net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations

 drivers/infiniband/core/uverbs_ioctl.c |  2 +-
 include/linux/gfp.h                    |  6 ++++-
 include/linux/mm.h                     |  8 +++++++
 include/linux/slab_def.h               |  1 +
 include/linux/slub_def.h               |  1 +
 include/net/sock.h                     |  5 +++++
 kernel/kexec_core.c                    |  4 ++--
 mm/dmapool.c                           |  2 +-
 mm/page_alloc.c                        | 18 ++++++++++++++-
 mm/slab.c                              | 14 ++++++------
 mm/slab.h                              |  1 +
 mm/slab_common.c                       | 15 +++++++++++++
 mm/slob.c                              |  3 ++-
 mm/slub.c                              |  9 ++++----
 net/core/sock.c                        | 31 +++++++++++++++++++++-----
 net/unix/af_unix.c                     | 13 ++++++-----
 16 files changed, 104 insertions(+), 29 deletions(-)

-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 15:42 ` Alexander Potapenko
@ 2019-04-18 15:42   ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

This option adds the possibility to initialize newly allocated pages and
heap objects with zeroes. This is needed to prevent possible information
leaks and make the control-flow bugs that depend on uninitialized values
more deterministic.

Initialization is done at allocation time at the places where checks for
__GFP_ZERO are performed. We don't initialize slab caches with
constructors to preserve their semantics. To reduce runtime costs of
checking cachep->ctor we replace a call to memset with a call to
cachep->poison_fn, which is only executed if the memory block needs to
be initialized.

For kernel testing purposes filling allocations with a nonzero pattern
would be more suitable, but may require platform-specific code. To have
a simple baseline we've decided to start with zero-initialization.

No performance optimizations are done at the moment to reduce double
initialization of memory regions.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 drivers/infiniband/core/uverbs_ioctl.c |  2 +-
 include/linux/mm.h                     |  8 ++++++++
 include/linux/slab_def.h               |  1 +
 include/linux/slub_def.h               |  1 +
 kernel/kexec_core.c                    |  2 +-
 mm/dmapool.c                           |  2 +-
 mm/page_alloc.c                        | 18 +++++++++++++++++-
 mm/slab.c                              | 12 ++++++------
 mm/slab.h                              |  1 +
 mm/slab_common.c                       | 15 +++++++++++++++
 mm/slob.c                              |  2 +-
 mm/slub.c                              |  8 ++++----
 net/core/sock.c                        |  2 +-
 13 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index e1379949e663..f31234906be2 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
 	pbundle->internal_used =
 		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
-	if (flags & __GFP_ZERO)
+	if (want_init_memory(flags))
 		memset(res, 0, size);
 	return res;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 76769749b5a5..b38b71a5efaa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
+DECLARE_STATIC_KEY_FALSE(init_allocations);
+static inline bool want_init_memory(gfp_t flags)
+{
+	if (static_branch_unlikely(&init_allocations))
+		return true;
+	return flags & __GFP_ZERO;
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 extern bool _debug_pagealloc_enabled;
 extern void __kernel_map_pages(struct page *page, int numpages, int enable);
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 9a5eafb7145b..9dfe9eb639d7 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -37,6 +37,7 @@ struct kmem_cache {
 
 	/* constructor func */
 	void (*ctor)(void *obj);
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 
 /* 4) cache creation/removal */
 	const char *name;
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d2153789bd9f..afb928cb7c20 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -99,6 +99,7 @@ struct kmem_cache {
 	gfp_t allocflags;	/* gfp flags to use on each alloc */
 	int refcount;		/* Refcount for slab cache destroy */
 	void (*ctor)(void *);
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 	unsigned int inuse;		/* Offset to metadata */
 	unsigned int align;		/* Alignment */
 	unsigned int red_left_pad;	/* Left redzone padding size */
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index d7140447be75..be84f5f95c97 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		arch_kexec_post_alloc_pages(page_address(pages), count,
 					    gfp_mask);
 
-		if (gfp_mask & __GFP_ZERO)
+		if (want_init_memory(gfp_mask))
 			for (i = 0; i < count; i++)
 				clear_highpage(pages + i);
 	}
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 76a160083506..796e38160d39 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 #endif
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (mem_flags & __GFP_ZERO)
+	if (want_init_memory(mem_flags))
 		memset(retval, 0, pool->size);
 
 	return retval;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d96ca5bc555b..e2a21d866ac9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
+bool want_init_allocations __read_mostly;
+EXPORT_SYMBOL(want_init_allocations);
+DEFINE_STATIC_KEY_FALSE(init_allocations);
+
+static int __init early_init_allocations(char *buf)
+{
+	int ret;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &want_init_allocations);
+	if (want_init_allocations)
+		static_branch_enable(&init_allocations);
+	return ret;
+}
+early_param("init_allocations", early_init_allocations);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
@@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
 
 	post_alloc_hook(page, order, gfp_flags);
 
-	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
+	if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
 		for (i = 0; i < (1 << order); i++)
 			clear_highpage(page + i);
 
diff --git a/mm/slab.c b/mm/slab.c
index 47a380a486ee..dcc5b73cf767 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 
-	if (unlikely(flags & __GFP_ZERO) && ptr)
-		memset(ptr, 0, cachep->object_size);
+	if (unlikely(want_init_memory(flags)) && ptr)
+		cachep->poison_fn(cachep, ptr);
 
 	slab_post_alloc_hook(cachep, flags, 1, &ptr);
 	return ptr;
@@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
 
-	if (unlikely(flags & __GFP_ZERO) && objp)
-		memset(objp, 0, cachep->object_size);
+	if (unlikely(want_init_memory(flags)) && objp)
+		cachep->poison_fn(cachep, objp);
 
 	slab_post_alloc_hook(cachep, flags, 1, &objp);
 	return objp;
@@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
 
 	/* Clear memory outside IRQ disabled section */
-	if (unlikely(flags & __GFP_ZERO))
+	if (unlikely(want_init_memory(flags)))
 		for (i = 0; i < size; i++)
-			memset(p[i], 0, s->object_size);
+			s->poison_fn(s, p[i]);
 
 	slab_post_alloc_hook(s, flags, size, p);
 	/* FIXME: Trace call missing. Christoph would like a bulk variant */
diff --git a/mm/slab.h b/mm/slab.h
index 43ac818b8592..3b541e8970ee 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -27,6 +27,7 @@ struct kmem_cache {
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 	struct list_head list;	/* List of all slab caches on the system */
 };
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 58251ba63e4a..37810114b2ea 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
 	return NULL;
 }
 
+static void poison_zero(struct kmem_cache *c, void *object)
+{
+	memset(object, 0, c->object_size);
+}
+
+static void poison_dont(struct kmem_cache *c, void *object)
+{
+	/* Do nothing. Use for caches with constructors. */
+}
+
 static struct kmem_cache *create_cache(const char *name,
 		unsigned int object_size, unsigned int align,
 		slab_flags_t flags, unsigned int useroffset,
@@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
 	s->size = s->object_size = object_size;
 	s->align = align;
 	s->ctor = ctor;
+	if (ctor)
+		s->poison_fn = poison_dont;
+	else
+		s->poison_fn = poison_zero;
 	s->useroffset = useroffset;
 	s->usersize = usersize;
 
@@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
 	s->useroffset = useroffset;
 	s->usersize = usersize;
+	s->poison_fn = poison_zero;
 
 	slab_init_memcg_params(s);
 
diff --git a/mm/slob.c b/mm/slob.c
index 307c2c9feb44..18981a71e962 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
-	if (unlikely(gfp & __GFP_ZERO))
+	if (unlikely(want_init_memory(gfp)))
 		memset(b, 0, size);
 	return b;
 }
diff --git a/mm/slub.c b/mm/slub.c
index d30ede89f4a6..e4efb6575510 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		stat(s, ALLOC_FASTPATH);
 	}
 
-	if (unlikely(gfpflags & __GFP_ZERO) && object)
-		memset(object, 0, s->object_size);
+	if (unlikely(want_init_memory(gfpflags)) && object)
+		s->poison_fn(s, object);
 
 	slab_post_alloc_hook(s, gfpflags, 1, &object);
 
@@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	local_irq_enable();
 
 	/* Clear memory outside IRQ disabled fastpath loop */
-	if (unlikely(flags & __GFP_ZERO)) {
+	if (unlikely(want_init_memory(flags))) {
 		int j;
 
 		for (j = 0; j < i; j++)
-			memset(p[j], 0, s->object_size);
+			s->poison_fn(s, p[j]);
 	}
 
 	/* memcg and kmem_cache debug support */
diff --git a/net/core/sock.c b/net/core/sock.c
index 782343bb925b..99b288a19b39 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
 		if (!sk)
 			return sk;
-		if (priority & __GFP_ZERO)
+		if (want_init_memory(priority))
 			sk_prot_clear_nulls(sk, prot->obj_size);
 	} else
 		sk = kmalloc(prot->obj_size, priority);
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-18 15:42   ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

This option adds the possibility to initialize newly allocated pages and
heap objects with zeroes. This is needed to prevent possible information
leaks and make the control-flow bugs that depend on uninitialized values
more deterministic.

Initialization is done at allocation time at the places where checks for
__GFP_ZERO are performed. We don't initialize slab caches with
constructors to preserve their semantics. To reduce runtime costs of
checking cachep->ctor we replace a call to memset with a call to
cachep->poison_fn, which is only executed if the memory block needs to
be initialized.

For kernel testing purposes filling allocations with a nonzero pattern
would be more suitable, but may require platform-specific code. To have
a simple baseline we've decided to start with zero-initialization.

No performance optimizations are done at the moment to reduce double
initialization of memory regions.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 drivers/infiniband/core/uverbs_ioctl.c |  2 +-
 include/linux/mm.h                     |  8 ++++++++
 include/linux/slab_def.h               |  1 +
 include/linux/slub_def.h               |  1 +
 kernel/kexec_core.c                    |  2 +-
 mm/dmapool.c                           |  2 +-
 mm/page_alloc.c                        | 18 +++++++++++++++++-
 mm/slab.c                              | 12 ++++++------
 mm/slab.h                              |  1 +
 mm/slab_common.c                       | 15 +++++++++++++++
 mm/slob.c                              |  2 +-
 mm/slub.c                              |  8 ++++----
 net/core/sock.c                        |  2 +-
 13 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index e1379949e663..f31234906be2 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
 	pbundle->internal_used =
 		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
-	if (flags & __GFP_ZERO)
+	if (want_init_memory(flags))
 		memset(res, 0, size);
 	return res;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 76769749b5a5..b38b71a5efaa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
+DECLARE_STATIC_KEY_FALSE(init_allocations);
+static inline bool want_init_memory(gfp_t flags)
+{
+	if (static_branch_unlikely(&init_allocations))
+		return true;
+	return flags & __GFP_ZERO;
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 extern bool _debug_pagealloc_enabled;
 extern void __kernel_map_pages(struct page *page, int numpages, int enable);
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 9a5eafb7145b..9dfe9eb639d7 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -37,6 +37,7 @@ struct kmem_cache {
 
 	/* constructor func */
 	void (*ctor)(void *obj);
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 
 /* 4) cache creation/removal */
 	const char *name;
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d2153789bd9f..afb928cb7c20 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -99,6 +99,7 @@ struct kmem_cache {
 	gfp_t allocflags;	/* gfp flags to use on each alloc */
 	int refcount;		/* Refcount for slab cache destroy */
 	void (*ctor)(void *);
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 	unsigned int inuse;		/* Offset to metadata */
 	unsigned int align;		/* Alignment */
 	unsigned int red_left_pad;	/* Left redzone padding size */
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index d7140447be75..be84f5f95c97 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		arch_kexec_post_alloc_pages(page_address(pages), count,
 					    gfp_mask);
 
-		if (gfp_mask & __GFP_ZERO)
+		if (want_init_memory(gfp_mask))
 			for (i = 0; i < count; i++)
 				clear_highpage(pages + i);
 	}
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 76a160083506..796e38160d39 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 #endif
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (mem_flags & __GFP_ZERO)
+	if (want_init_memory(mem_flags))
 		memset(retval, 0, pool->size);
 
 	return retval;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d96ca5bc555b..e2a21d866ac9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
+bool want_init_allocations __read_mostly;
+EXPORT_SYMBOL(want_init_allocations);
+DEFINE_STATIC_KEY_FALSE(init_allocations);
+
+static int __init early_init_allocations(char *buf)
+{
+	int ret;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &want_init_allocations);
+	if (want_init_allocations)
+		static_branch_enable(&init_allocations);
+	return ret;
+}
+early_param("init_allocations", early_init_allocations);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
@@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
 
 	post_alloc_hook(page, order, gfp_flags);
 
-	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
+	if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
 		for (i = 0; i < (1 << order); i++)
 			clear_highpage(page + i);
 
diff --git a/mm/slab.c b/mm/slab.c
index 47a380a486ee..dcc5b73cf767 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 
-	if (unlikely(flags & __GFP_ZERO) && ptr)
-		memset(ptr, 0, cachep->object_size);
+	if (unlikely(want_init_memory(flags)) && ptr)
+		cachep->poison_fn(cachep, ptr);
 
 	slab_post_alloc_hook(cachep, flags, 1, &ptr);
 	return ptr;
@@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
 
-	if (unlikely(flags & __GFP_ZERO) && objp)
-		memset(objp, 0, cachep->object_size);
+	if (unlikely(want_init_memory(flags)) && objp)
+		cachep->poison_fn(cachep, objp);
 
 	slab_post_alloc_hook(cachep, flags, 1, &objp);
 	return objp;
@@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
 
 	/* Clear memory outside IRQ disabled section */
-	if (unlikely(flags & __GFP_ZERO))
+	if (unlikely(want_init_memory(flags)))
 		for (i = 0; i < size; i++)
-			memset(p[i], 0, s->object_size);
+			s->poison_fn(s, p[i]);
 
 	slab_post_alloc_hook(s, flags, size, p);
 	/* FIXME: Trace call missing. Christoph would like a bulk variant */
diff --git a/mm/slab.h b/mm/slab.h
index 43ac818b8592..3b541e8970ee 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -27,6 +27,7 @@ struct kmem_cache {
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 	struct list_head list;	/* List of all slab caches on the system */
 };
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 58251ba63e4a..37810114b2ea 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
 	return NULL;
 }
 
+static void poison_zero(struct kmem_cache *c, void *object)
+{
+	memset(object, 0, c->object_size);
+}
+
+static void poison_dont(struct kmem_cache *c, void *object)
+{
+	/* Do nothing. Use for caches with constructors. */
+}
+
 static struct kmem_cache *create_cache(const char *name,
 		unsigned int object_size, unsigned int align,
 		slab_flags_t flags, unsigned int useroffset,
@@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
 	s->size = s->object_size = object_size;
 	s->align = align;
 	s->ctor = ctor;
+	if (ctor)
+		s->poison_fn = poison_dont;
+	else
+		s->poison_fn = poison_zero;
 	s->useroffset = useroffset;
 	s->usersize = usersize;
 
@@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
 	s->useroffset = useroffset;
 	s->usersize = usersize;
+	s->poison_fn = poison_zero;
 
 	slab_init_memcg_params(s);
 
diff --git a/mm/slob.c b/mm/slob.c
index 307c2c9feb44..18981a71e962 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
-	if (unlikely(gfp & __GFP_ZERO))
+	if (unlikely(want_init_memory(gfp)))
 		memset(b, 0, size);
 	return b;
 }
diff --git a/mm/slub.c b/mm/slub.c
index d30ede89f4a6..e4efb6575510 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		stat(s, ALLOC_FASTPATH);
 	}
 
-	if (unlikely(gfpflags & __GFP_ZERO) && object)
-		memset(object, 0, s->object_size);
+	if (unlikely(want_init_memory(gfpflags)) && object)
+		s->poison_fn(s, object);
 
 	slab_post_alloc_hook(s, gfpflags, 1, &object);
 
@@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	local_irq_enable();
 
 	/* Clear memory outside IRQ disabled fastpath loop */
-	if (unlikely(flags & __GFP_ZERO)) {
+	if (unlikely(want_init_memory(flags))) {
 		int j;
 
 		for (j = 0; j < i; j++)
-			memset(p[j], 0, s->object_size);
+			s->poison_fn(s, p[j]);
 	}
 
 	/* memcg and kmem_cache debug support */
diff --git a/net/core/sock.c b/net/core/sock.c
index 782343bb925b..99b288a19b39 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
 		if (!sk)
 			return sk;
-		if (priority & __GFP_ZERO)
+		if (want_init_memory(priority))
 			sk_prot_clear_nulls(sk, prot->obj_size);
 	} else
 		sk = kmalloc(prot->obj_size, priority);
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
  2019-04-18 15:42 ` Alexander Potapenko
@ 2019-04-18 15:42   ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
tells it to not initialize the requested memory if the init_allocations
boot option is enabled. This can be useful in the cases the newly
allocated memory is going to be initialized by the caller right away.

__GFP_NOINIT basically defeats the hardening against information leaks
provided by the init_allocations feature, so one should use it with
caution.

This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 include/linux/gfp.h | 6 +++++-
 include/linux/mm.h  | 2 +-
 kernel/kexec_core.c | 2 +-
 mm/slab.c           | 2 +-
 mm/slob.c           | 1 +
 mm/slub.c           | 1 +
 6 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..66d7f5604fe2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@ struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NOINIT		0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -208,16 +209,19 @@ struct vm_area_struct;
  * %__GFP_COMP address compound page metadata.
  *
  * %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
  */
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOINIT	((__force gfp_t)___GFP_NOINIT)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b38b71a5efaa..8f03334a9033 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2601,7 +2601,7 @@ DECLARE_STATIC_KEY_FALSE(init_allocations);
 static inline bool want_init_memory(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_allocations))
-		return true;
+		return !(flags & __GFP_NOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index be84f5f95c97..f9d1f1236cd0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *pages;
 
-	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
+	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
 	if (pages) {
 		unsigned int count, i;
 
diff --git a/mm/slab.c b/mm/slab.c
index dcc5b73cf767..762cb0e7bcc1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	int nr_pages;
 
-	flags |= cachep->allocflags;
+	flags |= (cachep->allocflags | __GFP_NOINIT);
 
 	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page) {
diff --git a/mm/slob.c b/mm/slob.c
index 18981a71e962..867d2d68a693 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
+	gfp |= __GFP_NOINIT;
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
 		page = __alloc_pages_node(node, gfp, order);
diff --git a/mm/slub.c b/mm/slub.c
index e4efb6575510..a79b4cb768a2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1493,6 +1493,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
+	flags |= __GFP_NOINIT;
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
@ 2019-04-18 15:42   ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
tells it to not initialize the requested memory if the init_allocations
boot option is enabled. This can be useful in the cases the newly
allocated memory is going to be initialized by the caller right away.

__GFP_NOINIT basically defeats the hardening against information leaks
provided by the init_allocations feature, so one should use it with
caution.

This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 include/linux/gfp.h | 6 +++++-
 include/linux/mm.h  | 2 +-
 kernel/kexec_core.c | 2 +-
 mm/slab.c           | 2 +-
 mm/slob.c           | 1 +
 mm/slub.c           | 1 +
 6 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..66d7f5604fe2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@ struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NOINIT		0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -208,16 +209,19 @@ struct vm_area_struct;
  * %__GFP_COMP address compound page metadata.
  *
  * %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
  */
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOINIT	((__force gfp_t)___GFP_NOINIT)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b38b71a5efaa..8f03334a9033 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2601,7 +2601,7 @@ DECLARE_STATIC_KEY_FALSE(init_allocations);
 static inline bool want_init_memory(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_allocations))
-		return true;
+		return !(flags & __GFP_NOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index be84f5f95c97..f9d1f1236cd0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *pages;
 
-	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
+	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
 	if (pages) {
 		unsigned int count, i;
 
diff --git a/mm/slab.c b/mm/slab.c
index dcc5b73cf767..762cb0e7bcc1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	int nr_pages;
 
-	flags |= cachep->allocflags;
+	flags |= (cachep->allocflags | __GFP_NOINIT);
 
 	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page) {
diff --git a/mm/slob.c b/mm/slob.c
index 18981a71e962..867d2d68a693 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
+	gfp |= __GFP_NOINIT;
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
 		page = __alloc_pages_node(node, gfp, order);
diff --git a/mm/slub.c b/mm/slub.c
index e4efb6575510..a79b4cb768a2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1493,6 +1493,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
+	flags |= __GFP_NOINIT;
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/3] RFC: net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations
  2019-04-18 15:42 ` Alexander Potapenko
@ 2019-04-18 15:42   ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

Add sock_alloc_send_pskb_noinit(), which is similar to
sock_alloc_send_pskb(), but allocates with __GFP_NOINIT.
This helps reduce the slowdown on hackbench from 9% to 0.1%.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com

---
 include/net/sock.h |  5 +++++
 net/core/sock.c    | 29 +++++++++++++++++++++++++----
 net/unix/af_unix.c | 13 +++++++------
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8de5ee258b93..37fcdda23884 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1612,6 +1612,11 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				     unsigned long data_len, int noblock,
 				     int *errcode, int max_page_order);
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order);
 void *sock_kmalloc(struct sock *sk, int size, gfp_t priority);
 void sock_kfree_s(struct sock *sk, void *mem, int size);
 void sock_kzfree_s(struct sock *sk, void *mem, int size);
diff --git a/net/core/sock.c b/net/core/sock.c
index 99b288a19b39..0a2af1e1fa1c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2187,9 +2187,11 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
  *	Generic send/receive buffer handlers
  */
 
-struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
-				     unsigned long data_len, int noblock,
-				     int *errcode, int max_page_order)
+struct sk_buff *sock_alloc_send_pskb_internal(struct sock *sk,
+					      unsigned long header_len,
+					      unsigned long data_len,
+					      int noblock, int *errcode,
+					      int max_page_order, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	long timeo;
@@ -2218,7 +2220,7 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 		timeo = sock_wait_for_wmem(sk, timeo);
 	}
 	skb = alloc_skb_with_frags(header_len, data_len, max_page_order,
-				   errcode, sk->sk_allocation);
+				   errcode, sk->sk_allocation | gfp);
 	if (skb)
 		skb_set_owner_w(skb, sk);
 	return skb;
@@ -2229,8 +2231,27 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 	*errcode = err;
 	return NULL;
 }
+
+struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
+				     unsigned long data_len, int noblock,
+				     int *errcode, int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/0);
+}
 EXPORT_SYMBOL(sock_alloc_send_pskb);
 
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/__GFP_NOINIT);
+}
+EXPORT_SYMBOL(sock_alloc_send_pskb_noinit);
+
 struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 				    int noblock, int *errcode)
 {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ddb838a1b74c..9a45824c3c48 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1627,9 +1627,9 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
 	}
 
-	skb = sock_alloc_send_pskb(sk, len - data_len, data_len,
-				   msg->msg_flags & MSG_DONTWAIT, &err,
-				   PAGE_ALLOC_COSTLY_ORDER);
+	skb = sock_alloc_send_pskb_noinit(sk, len - data_len, data_len,
+					  msg->msg_flags & MSG_DONTWAIT, &err,
+					  PAGE_ALLOC_COSTLY_ORDER);
 	if (skb == NULL)
 		goto out;
 
@@ -1824,9 +1824,10 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 
 		data_len = min_t(size_t, size, PAGE_ALIGN(data_len));
 
-		skb = sock_alloc_send_pskb(sk, size - data_len, data_len,
-					   msg->msg_flags & MSG_DONTWAIT, &err,
-					   get_order(UNIX_SKB_FRAGS_SZ));
+		skb = sock_alloc_send_pskb_noinit(sk, size - data_len, data_len,
+						  msg->msg_flags & MSG_DONTWAIT,
+						  &err,
+						  get_order(UNIX_SKB_FRAGS_SZ));
 		if (!skb)
 			goto out_err;
 
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/3] RFC: net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations
@ 2019-04-18 15:42   ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:42 UTC (permalink / raw)
  To: akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

Add sock_alloc_send_pskb_noinit(), which is similar to
sock_alloc_send_pskb(), but allocates with __GFP_NOINIT.
This helps reduce the slowdown on hackbench from 9% to 0.1%.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com

---
 include/net/sock.h |  5 +++++
 net/core/sock.c    | 29 +++++++++++++++++++++++++----
 net/unix/af_unix.c | 13 +++++++------
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8de5ee258b93..37fcdda23884 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1612,6 +1612,11 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				     unsigned long data_len, int noblock,
 				     int *errcode, int max_page_order);
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order);
 void *sock_kmalloc(struct sock *sk, int size, gfp_t priority);
 void sock_kfree_s(struct sock *sk, void *mem, int size);
 void sock_kzfree_s(struct sock *sk, void *mem, int size);
diff --git a/net/core/sock.c b/net/core/sock.c
index 99b288a19b39..0a2af1e1fa1c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2187,9 +2187,11 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
  *	Generic send/receive buffer handlers
  */
 
-struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
-				     unsigned long data_len, int noblock,
-				     int *errcode, int max_page_order)
+struct sk_buff *sock_alloc_send_pskb_internal(struct sock *sk,
+					      unsigned long header_len,
+					      unsigned long data_len,
+					      int noblock, int *errcode,
+					      int max_page_order, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	long timeo;
@@ -2218,7 +2220,7 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 		timeo = sock_wait_for_wmem(sk, timeo);
 	}
 	skb = alloc_skb_with_frags(header_len, data_len, max_page_order,
-				   errcode, sk->sk_allocation);
+				   errcode, sk->sk_allocation | gfp);
 	if (skb)
 		skb_set_owner_w(skb, sk);
 	return skb;
@@ -2229,8 +2231,27 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 	*errcode = err;
 	return NULL;
 }
+
+struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
+				     unsigned long data_len, int noblock,
+				     int *errcode, int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/0);
+}
 EXPORT_SYMBOL(sock_alloc_send_pskb);
 
+struct sk_buff *sock_alloc_send_pskb_noinit(struct sock *sk,
+					    unsigned long header_len,
+					    unsigned long data_len,
+					    int noblock, int *errcode,
+					    int max_page_order)
+{
+	return sock_alloc_send_pskb_internal(sk, header_len, data_len,
+		noblock, errcode, max_page_order, /*gfp*/__GFP_NOINIT);
+}
+EXPORT_SYMBOL(sock_alloc_send_pskb_noinit);
+
 struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
 				    int noblock, int *errcode)
 {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ddb838a1b74c..9a45824c3c48 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1627,9 +1627,9 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
 	}
 
-	skb = sock_alloc_send_pskb(sk, len - data_len, data_len,
-				   msg->msg_flags & MSG_DONTWAIT, &err,
-				   PAGE_ALLOC_COSTLY_ORDER);
+	skb = sock_alloc_send_pskb_noinit(sk, len - data_len, data_len,
+					  msg->msg_flags & MSG_DONTWAIT, &err,
+					  PAGE_ALLOC_COSTLY_ORDER);
 	if (skb == NULL)
 		goto out;
 
@@ -1824,9 +1824,10 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 
 		data_len = min_t(size_t, size, PAGE_ALIGN(data_len));
 
-		skb = sock_alloc_send_pskb(sk, size - data_len, data_len,
-					   msg->msg_flags & MSG_DONTWAIT, &err,
-					   get_order(UNIX_SKB_FRAGS_SZ));
+		skb = sock_alloc_send_pskb_noinit(sk, size - data_len, data_len,
+						  msg->msg_flags & MSG_DONTWAIT,
+						  &err,
+						  get_order(UNIX_SKB_FRAGS_SZ));
 		if (!skb)
 			goto out_err;
 
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] RFC: add init_allocations=1 boot option
  2019-04-18 15:42 ` Alexander Potapenko
@ 2019-04-18 15:44   ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:44 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter, Dmitriy Vyukov, Kees Cook,
	Laura Abbott
  Cc: Linux Memory Management List, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 5:42 PM Alexander Potapenko <glider@google.com> wrote:
>
> Following the recent discussions here's another take at initializing
> pages and heap objects with zeroes. This is needed to prevent possible
> information leaks and make the control-flow bugs that depend on
> uninitialized values more deterministic.
>
> The patchset introduces a new boot option, init_allocations, which
> makes page allocator and SL[AOU]B initialize newly allocated memory.
> init_allocations=0 doesn't (hopefully) add any overhead to the
> allocation fast path (no noticeable slowdown on hackbench).
>
> With only the the first of the proposed patches the slowdown numbers are:
>  - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
>  - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
>  - 9.4% (stdev 0.5%) sys time slowdown on hackbench
>
> The second patch introduces a GFP flag that allows to disable
> initialization for certain allocations. The third page is an example of
> applying it to af_unix.c, which helps hackbench greatly.
>
> Slowdown numbers for the whole patchset are:
>  - 1.8% (stdev 0.8%) on kernel build
>  - 6.5% (stdev 0.2%) on af_inet_loopback
>  - 0.12% (stdev 0.6%) on hackbench
>
>
> Alexander Potapenko (3):
>   mm: security: introduce the init_allocations=1 boot option
>   gfp: mm: introduce __GFP_NOINIT
>   net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations
Oops, I was hoping git send-email would pull all the Cc: tags from the
patches and actually use them.
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/gfp.h                    |  6 ++++-
>  include/linux/mm.h                     |  8 +++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  include/net/sock.h                     |  5 +++++
>  kernel/kexec_core.c                    |  4 ++--
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 ++++++++++++++-
>  mm/slab.c                              | 14 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++
>  mm/slob.c                              |  3 ++-
>  mm/slub.c                              |  9 ++++----
>  net/core/sock.c                        | 31 +++++++++++++++++++++-----
>  net/unix/af_unix.c                     | 13 ++++++-----
>  16 files changed, 104 insertions(+), 29 deletions(-)
>
> --
> 2.21.0.392.gf8f6787159e-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] RFC: add init_allocations=1 boot option
@ 2019-04-18 15:44   ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 15:44 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter, Dmitriy Vyukov, Kees Cook,
	Laura Abbott
  Cc: Linux Memory Management List, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 5:42 PM Alexander Potapenko <glider@google.com> wrote:
>
> Following the recent discussions here's another take at initializing
> pages and heap objects with zeroes. This is needed to prevent possible
> information leaks and make the control-flow bugs that depend on
> uninitialized values more deterministic.
>
> The patchset introduces a new boot option, init_allocations, which
> makes page allocator and SL[AOU]B initialize newly allocated memory.
> init_allocations=0 doesn't (hopefully) add any overhead to the
> allocation fast path (no noticeable slowdown on hackbench).
>
> With only the the first of the proposed patches the slowdown numbers are:
>  - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
>  - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
>  - 9.4% (stdev 0.5%) sys time slowdown on hackbench
>
> The second patch introduces a GFP flag that allows to disable
> initialization for certain allocations. The third page is an example of
> applying it to af_unix.c, which helps hackbench greatly.
>
> Slowdown numbers for the whole patchset are:
>  - 1.8% (stdev 0.8%) on kernel build
>  - 6.5% (stdev 0.2%) on af_inet_loopback
>  - 0.12% (stdev 0.6%) on hackbench
>
>
> Alexander Potapenko (3):
>   mm: security: introduce the init_allocations=1 boot option
>   gfp: mm: introduce __GFP_NOINIT
>   net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations
Oops, I was hoping git send-email would pull all the Cc: tags from the
patches and actually use them.
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/gfp.h                    |  6 ++++-
>  include/linux/mm.h                     |  8 +++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  include/net/sock.h                     |  5 +++++
>  kernel/kexec_core.c                    |  4 ++--
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 ++++++++++++++-
>  mm/slab.c                              | 14 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++
>  mm/slob.c                              |  3 ++-
>  mm/slub.c                              |  9 ++++----
>  net/core/sock.c                        | 31 +++++++++++++++++++++-----
>  net/unix/af_unix.c                     | 13 ++++++-----
>  16 files changed, 104 insertions(+), 29 deletions(-)
>
> --
> 2.21.0.392.gf8f6787159e-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 15:42   ` Alexander Potapenko
  (?)
@ 2019-04-18 16:35   ` Dave Hansen
  2019-04-18 16:43       ` Alexander Potapenko
  2019-04-23  8:31     ` Michal Hocko
  -1 siblings, 2 replies; 41+ messages in thread
From: Dave Hansen @ 2019-04-18 16:35 UTC (permalink / raw)
  To: Alexander Potapenko, akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.

Isn't it better to do this at free time rather than allocation time?  If
doing it at free, you can't even have information leaks for pages that
are in the allocator.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 16:35   ` Dave Hansen
@ 2019-04-18 16:43       ` Alexander Potapenko
  2019-04-23  8:31     ` Michal Hocko
  1 sibling, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 16:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Christoph Lameter, Dmitriy Vyukov, Kees Cook,
	Laura Abbott, Linux Memory Management List,
	linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 6:35 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
>
> Isn't it better to do this at free time rather than allocation time?  If
> doing it at free, you can't even have information leaks for pages that
> are in the allocator.
I should have mentioned this in the patch description, as this
question is being asked every time I send a patch :)
If we want to avoid double initialization and take advantage of
__GFP_NOINIT (see the second and third patches in the series) we need
to do initialize the memory at allocation time, because free() and
free_pages() don't accept GFP flags.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-18 16:43       ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 16:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Christoph Lameter, Dmitriy Vyukov, Kees Cook,
	Laura Abbott, Linux Memory Management List,
	linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 6:35 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
>
> Isn't it better to do this at free time rather than allocation time?  If
> doing it at free, you can't even have information leaks for pages that
> are in the allocator.
I should have mentioned this in the patch description, as this
question is being asked every time I send a patch :)
If we want to avoid double initialization and take advantage of
__GFP_NOINIT (see the second and third patches in the series) we need
to do initialize the memory at allocation time, because free() and
free_pages() don't accept GFP flags.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 16:43       ` Alexander Potapenko
@ 2019-04-18 16:50         ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 16:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Christoph Lameter, Dmitriy Vyukov, Kees Cook,
	Laura Abbott, Linux Memory Management List,
	linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 6:43 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Thu, Apr 18, 2019 at 6:35 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > > This option adds the possibility to initialize newly allocated pages and
> > > heap objects with zeroes. This is needed to prevent possible information
> > > leaks and make the control-flow bugs that depend on uninitialized values
> > > more deterministic.
> >
> > Isn't it better to do this at free time rather than allocation time?  If
> > doing it at free, you can't even have information leaks for pages that
> > are in the allocator.
> I should have mentioned this in the patch description, as this
> question is being asked every time I send a patch :)
> If we want to avoid double initialization and take advantage of
> __GFP_NOINIT (see the second and third patches in the series) we need
> to do initialize the memory at allocation time, because free() and
> free_pages() don't accept GFP flags.

On a second thought, double zeroing on memory reclaim should be quite rare.
Most of the speedup we gain with __GFP_NOINIT is because we assume
it's safe to not initialize memory that'll be overwritten anyway.
I'll need to check how e.g. hackbench behaves if we choose to zero
memory on free() (my guess would be it'll be slower than with
__GFP_NOINIT hack, albeit a little safer)
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-18 16:50         ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-18 16:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Christoph Lameter, Dmitriy Vyukov, Kees Cook,
	Laura Abbott, Linux Memory Management List,
	linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 6:43 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Thu, Apr 18, 2019 at 6:35 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > > This option adds the possibility to initialize newly allocated pages and
> > > heap objects with zeroes. This is needed to prevent possible information
> > > leaks and make the control-flow bugs that depend on uninitialized values
> > > more deterministic.
> >
> > Isn't it better to do this at free time rather than allocation time?  If
> > doing it at free, you can't even have information leaks for pages that
> > are in the allocator.
> I should have mentioned this in the patch description, as this
> question is being asked every time I send a patch :)
> If we want to avoid double initialization and take advantage of
> __GFP_NOINIT (see the second and third patches in the series) we need
> to do initialize the memory at allocation time, because free() and
> free_pages() don't accept GFP flags.

On a second thought, double zeroing on memory reclaim should be quite rare.
Most of the speedup we gain with __GFP_NOINIT is because we assume
it's safe to not initialize memory that'll be overwritten anyway.
I'll need to check how e.g. hackbench behaves if we choose to zero
memory on free() (my guess would be it'll be slower than with
__GFP_NOINIT hack, albeit a little safer)
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
  2019-04-18 15:42   ` Alexander Potapenko
  (?)
@ 2019-04-18 16:52   ` Dave Hansen
  2019-04-23 19:14       ` Kees Cook
  -1 siblings, 1 reply; 41+ messages in thread
From: Dave Hansen @ 2019-04-18 16:52 UTC (permalink / raw)
  To: Alexander Potapenko, akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> __GFP_NOINIT basically defeats the hardening against information leaks
> provided by the init_allocations feature, so one should use it with
> caution.

Even more than that, shouldn't we try to use it only in places where
there is a demonstrated benefit, like performance data?

> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index be84f5f95c97..f9d1f1236cd0 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  {
>  	struct page *pages;
>  
> -	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> +	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
>  	if (pages) {
>  		unsigned int count, i;

While this is probably not super security-sensitive, it's also not
performance sensitive.

> diff --git a/mm/slab.c b/mm/slab.c
> index dcc5b73cf767..762cb0e7bcc1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>  	struct page *page;
>  	int nr_pages;
>  
> -	flags |= cachep->allocflags;
> +	flags |= (cachep->allocflags | __GFP_NOINIT);
>  
>  	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
>  	if (!page) {
> diff --git a/mm/slob.c b/mm/slob.c
> index 18981a71e962..867d2d68a693 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
>  {
>  	void *page;
>  
> +	gfp |= __GFP_NOINIT;
>  #ifdef CONFIG_NUMA
>  	if (node != NUMA_NO_NODE)
>  		page = __alloc_pages_node(node, gfp, order
> diff --git a/mm/slub.c b/mm/slub.c
> index e4efb6575510..a79b4cb768a2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1493,6 +1493,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>  	struct page *page;
>  	unsigned int order = oo_order(oo);
>  
> +	flags |= __GFP_NOINIT;
>  	if (node == NUMA_NO_NODE)
>  		page = alloc_pages(flags, order);
>  	else
> 

These sl*b ones seem like a bad idea.  We already have rules that sl*b
allocations must be initialized by callers, and we have reasonably
frequent bugs where the rules are broken.

Setting __GFP_NOINIT might be reasonable to do, though, for slabs that
have a constructor.  We have much higher confidence that *those* are
going to get initialized properly.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] RFC: add init_allocations=1 boot option
  2019-04-18 15:42 ` Alexander Potapenko
                   ` (4 preceding siblings ...)
  (?)
@ 2019-04-18 22:07 ` Randy Dunlap
  -1 siblings, 0 replies; 41+ messages in thread
From: Randy Dunlap @ 2019-04-18 22:07 UTC (permalink / raw)
  To: Alexander Potapenko, akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> Following the recent discussions here's another take at initializing
> pages and heap objects with zeroes. This is needed to prevent possible
> information leaks and make the control-flow bugs that depend on
> uninitialized values more deterministic.
> 
> The patchset introduces a new boot option, init_allocations, which
> makes page allocator and SL[AOU]B initialize newly allocated memory.
> init_allocations=0 doesn't (hopefully) add any overhead to the
> allocation fast path (no noticeable slowdown on hackbench).
> 
> With only the the first of the proposed patches the slowdown numbers are:
>  - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
>  - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
>  - 9.4% (stdev 0.5%) sys time slowdown on hackbench
> 
> The second patch introduces a GFP flag that allows to disable
> initialization for certain allocations. The third page is an example of

                                              third patch

> applying it to af_unix.c, which helps hackbench greatly.
> 
> Slowdown numbers for the whole patchset are:
>  - 1.8% (stdev 0.8%) on kernel build
>  - 6.5% (stdev 0.2%) on af_inet_loopback
>  - 0.12% (stdev 0.6%) on hackbench
> 
> 
> Alexander Potapenko (3):
>   mm: security: introduce the init_allocations=1 boot option
>   gfp: mm: introduce __GFP_NOINIT
>   net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations
> 
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/gfp.h                    |  6 ++++-
>  include/linux/mm.h                     |  8 +++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  include/net/sock.h                     |  5 +++++
>  kernel/kexec_core.c                    |  4 ++--
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 ++++++++++++++-
>  mm/slab.c                              | 14 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++
>  mm/slob.c                              |  3 ++-
>  mm/slub.c                              |  9 ++++----
>  net/core/sock.c                        | 31 +++++++++++++++++++++-----
>  net/unix/af_unix.c                     | 13 ++++++-----
>  16 files changed, 104 insertions(+), 29 deletions(-)
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 15:42   ` Alexander Potapenko
  (?)
  (?)
@ 2019-04-18 22:08   ` Randy Dunlap
  -1 siblings, 0 replies; 41+ messages in thread
From: Randy Dunlap @ 2019-04-18 22:08 UTC (permalink / raw)
  To: Alexander Potapenko, akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
> 
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.
> 
> For kernel testing purposes filling allocations with a nonzero pattern
> would be more suitable, but may require platform-specific code. To have
> a simple baseline we've decided to start with zero-initialization.
> 
> No performance optimizations are done at the moment to reduce double
> initialization of memory regions.
> 
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/mm.h                     |  8 ++++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  kernel/kexec_core.c                    |  2 +-
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 +++++++++++++++++-
>  mm/slab.c                              | 12 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++++
>  mm/slob.c                              |  2 +-
>  mm/slub.c                              |  8 ++++----
>  net/core/sock.c                        |  2 +-
>  13 files changed, 58 insertions(+), 16 deletions(-)
> 

Hi,
Please document init_allocations=N in Documentation/admin-guide/kernel-parameters.txt.

thanks.
-- 
~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 16:35   ` Dave Hansen
  2019-04-18 16:43       ` Alexander Potapenko
@ 2019-04-23  8:31     ` Michal Hocko
  1 sibling, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2019-04-23  8:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Alexander Potapenko, akpm, cl, dvyukov, keescook, labbott,
	linux-mm, linux-security-module, kernel-hardening

On Thu 18-04-19 09:35:32, Dave Hansen wrote:
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
> 
> Isn't it better to do this at free time rather than allocation time?  If
> doing it at free, you can't even have information leaks for pages that
> are in the allocator.

I would tend to agree here. Free path is usually less performance sensitive
than the allocation. Those really hot paths tend to defer the work.

I am also worried that an opt-out gfp flag would tend to be used
incorrectly as the history has shown for others - e.g. __GFP_TEMPORARY.
So I would rather see this robust without a fine tuning unless there is
real use case that would suffer from this and we can think of a
background scrubbing or something similar.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] RFC: add init_allocations=1 boot option
  2019-04-18 15:42 ` Alexander Potapenko
@ 2019-04-23 18:49   ` Kees Cook
  -1 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 18:49 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
>
> Following the recent discussions here's another take at initializing
> pages and heap objects with zeroes. This is needed to prevent possible
> information leaks and make the control-flow bugs that depend on
> uninitialized values more deterministic.
>
> The patchset introduces a new boot option, init_allocations, which
> makes page allocator and SL[AOU]B initialize newly allocated memory.
> init_allocations=0 doesn't (hopefully) add any overhead to the
> allocation fast path (no noticeable slowdown on hackbench).

I continue to prefer to have a way to both at-allocation
initialization _and_ poison-on-free, so let's not redirect this to
doing it only at free time. We're going to need both hooks when doing
Memory Tagging, so let's just get it in place now. The security
benefits on tagging, IMO, easily justify a 1-2% performance hit. And
likely we'll see this improve with new hardware.

> With only the the first of the proposed patches the slowdown numbers are:
>  - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
>  - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
>  - 9.4% (stdev 0.5%) sys time slowdown on hackbench
>
> The second patch introduces a GFP flag that allows to disable
> initialization for certain allocations. The third page is an example of
> applying it to af_unix.c, which helps hackbench greatly.
>
> Slowdown numbers for the whole patchset are:
>  - 1.8% (stdev 0.8%) on kernel build
>  - 6.5% (stdev 0.2%) on af_inet_loopback

Any idea why thes two went _up_?

>  - 0.12% (stdev 0.6%) on hackbench

Well that's quite an improvement. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] RFC: add init_allocations=1 boot option
@ 2019-04-23 18:49   ` Kees Cook
  0 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 18:49 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
>
> Following the recent discussions here's another take at initializing
> pages and heap objects with zeroes. This is needed to prevent possible
> information leaks and make the control-flow bugs that depend on
> uninitialized values more deterministic.
>
> The patchset introduces a new boot option, init_allocations, which
> makes page allocator and SL[AOU]B initialize newly allocated memory.
> init_allocations=0 doesn't (hopefully) add any overhead to the
> allocation fast path (no noticeable slowdown on hackbench).

I continue to prefer to have a way to both at-allocation
initialization _and_ poison-on-free, so let's not redirect this to
doing it only at free time. We're going to need both hooks when doing
Memory Tagging, so let's just get it in place now. The security
benefits on tagging, IMO, easily justify a 1-2% performance hit. And
likely we'll see this improve with new hardware.

> With only the the first of the proposed patches the slowdown numbers are:
>  - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
>  - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
>  - 9.4% (stdev 0.5%) sys time slowdown on hackbench
>
> The second patch introduces a GFP flag that allows to disable
> initialization for certain allocations. The third page is an example of
> applying it to af_unix.c, which helps hackbench greatly.
>
> Slowdown numbers for the whole patchset are:
>  - 1.8% (stdev 0.8%) on kernel build
>  - 6.5% (stdev 0.2%) on af_inet_loopback

Any idea why thes two went _up_?

>  - 0.12% (stdev 0.6%) on hackbench

Well that's quite an improvement. :)

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 15:42   ` Alexander Potapenko
@ 2019-04-23 19:00     ` Kees Cook
  -1 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:00 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
>
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.
>
> For kernel testing purposes filling allocations with a nonzero pattern
> would be more suitable, but may require platform-specific code. To have
> a simple baseline we've decided to start with zero-initialization.

The memory tagging future may be worth mentioning here too. We'll
always need an allocation-time hook. What we do in that hook (tag,
zero, poison) can be manage from there.

> No performance optimizations are done at the moment to reduce double
> initialization of memory regions.

Isn't this addressed in later patches?

>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/mm.h                     |  8 ++++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  kernel/kexec_core.c                    |  2 +-
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 +++++++++++++++++-
>  mm/slab.c                              | 12 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++++
>  mm/slob.c                              |  2 +-
>  mm/slub.c                              |  8 ++++----
>  net/core/sock.c                        |  2 +-
>  13 files changed, 58 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
> index e1379949e663..f31234906be2 100644
> --- a/drivers/infiniband/core/uverbs_ioctl.c
> +++ b/drivers/infiniband/core/uverbs_ioctl.c
> @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
>         res = (void *)pbundle->internal_buffer + pbundle->internal_used;
>         pbundle->internal_used =
>                 ALIGN(new_used, sizeof(*pbundle->internal_buffer));
> -       if (flags & __GFP_ZERO)
> +       if (want_init_memory(flags))
>                 memset(res, 0, size);
>         return res;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 76769749b5a5..b38b71a5efaa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
>                                         int enable) { }
>  #endif
>
> +DECLARE_STATIC_KEY_FALSE(init_allocations);

I'd like a CONFIG to control this default. We can keep the boot-time
option to change it, but I think a CONFIG is warranted here.

> +static inline bool want_init_memory(gfp_t flags)
> +{
> +       if (static_branch_unlikely(&init_allocations))
> +               return true;
> +       return flags & __GFP_ZERO;
> +}
> +
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  extern bool _debug_pagealloc_enabled;
>  extern void __kernel_map_pages(struct page *page, int numpages, int enable);
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 9a5eafb7145b..9dfe9eb639d7 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -37,6 +37,7 @@ struct kmem_cache {
>
>         /* constructor func */
>         void (*ctor)(void *obj);
> +       void (*poison_fn)(struct kmem_cache *c, void *object);
>
>  /* 4) cache creation/removal */
>         const char *name;
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index d2153789bd9f..afb928cb7c20 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -99,6 +99,7 @@ struct kmem_cache {
>         gfp_t allocflags;       /* gfp flags to use on each alloc */
>         int refcount;           /* Refcount for slab cache destroy */
>         void (*ctor)(void *);
> +       void (*poison_fn)(struct kmem_cache *c, void *object);
>         unsigned int inuse;             /* Offset to metadata */
>         unsigned int align;             /* Alignment */
>         unsigned int red_left_pad;      /* Left redzone padding size */
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index d7140447be75..be84f5f95c97 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>                 arch_kexec_post_alloc_pages(page_address(pages), count,
>                                             gfp_mask);
>
> -               if (gfp_mask & __GFP_ZERO)
> +               if (want_init_memory(gfp_mask))
>                         for (i = 0; i < count; i++)
>                                 clear_highpage(pages + i);
>         }
> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index 76a160083506..796e38160d39 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
>  #endif
>         spin_unlock_irqrestore(&pool->lock, flags);
>
> -       if (mem_flags & __GFP_ZERO)
> +       if (want_init_memory(mem_flags))
>                 memset(retval, 0, pool->size);
>
>         return retval;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..e2a21d866ac9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
>
>  int percpu_pagelist_fraction;
>  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> +bool want_init_allocations __read_mostly;

This can be a stack variable in early_init_allocations() -- it's never
used again.

> +EXPORT_SYMBOL(want_init_allocations);
> +DEFINE_STATIC_KEY_FALSE(init_allocations);
> +
> +static int __init early_init_allocations(char *buf)
> +{
> +       int ret;
> +
> +       if (!buf)
> +               return -EINVAL;
> +       ret = kstrtobool(buf, &want_init_allocations);
> +       if (want_init_allocations)
> +               static_branch_enable(&init_allocations);

With the CONFIG, this should have a _disable on an "else" here...

> +       return ret;
> +}
> +early_param("init_allocations", early_init_allocations);

Does early_init_allocations() get called before things like
prep_new_page() are up and running to call want_init_memory()?

>
>  /*
>   * A cached value of the page's pageblock's migratetype, used when the page is
> @@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
>
>         post_alloc_hook(page, order, gfp_flags);
>
> -       if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
> +       if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
>                 for (i = 0; i < (1 << order); i++)
>                         clear_highpage(page + i);
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 47a380a486ee..dcc5b73cf767 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
>         local_irq_restore(save_flags);
>         ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
>
> -       if (unlikely(flags & __GFP_ZERO) && ptr)
> -               memset(ptr, 0, cachep->object_size);
> +       if (unlikely(want_init_memory(flags)) && ptr)
> +               cachep->poison_fn(cachep, ptr);

So... this _must_ zero when __GFP_ZERO is present, so I'm not sure
"poison_fn" is the right name, and it likely needs to take the "flags"
argument.

>
>         slab_post_alloc_hook(cachep, flags, 1, &ptr);
>         return ptr;
> @@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
>         objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
>         prefetchw(objp);
>
> -       if (unlikely(flags & __GFP_ZERO) && objp)
> -               memset(objp, 0, cachep->object_size);
> +       if (unlikely(want_init_memory(flags)) && objp)
> +               cachep->poison_fn(cachep, objp);

Same.

>
>         slab_post_alloc_hook(cachep, flags, 1, &objp);
>         return objp;
> @@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>         cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
>
>         /* Clear memory outside IRQ disabled section */
> -       if (unlikely(flags & __GFP_ZERO))
> +       if (unlikely(want_init_memory(flags)))
>                 for (i = 0; i < size; i++)
> -                       memset(p[i], 0, s->object_size);
> +                       s->poison_fn(s, p[i]);

Same.

>
>         slab_post_alloc_hook(s, flags, size, p);
>         /* FIXME: Trace call missing. Christoph would like a bulk variant */
> diff --git a/mm/slab.h b/mm/slab.h
> index 43ac818b8592..3b541e8970ee 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -27,6 +27,7 @@ struct kmem_cache {
>         const char *name;       /* Slab name for sysfs */
>         int refcount;           /* Use counter */
>         void (*ctor)(void *);   /* Called on object slot creation */
> +       void (*poison_fn)(struct kmem_cache *c, void *object);

How about naming it just "initialize"?

>         struct list_head list;  /* List of all slab caches on the system */
>  };
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 58251ba63e4a..37810114b2ea 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
>         return NULL;
>  }
>
> +static void poison_zero(struct kmem_cache *c, void *object)
> +{
> +       memset(object, 0, c->object_size);
> +}
> +
> +static void poison_dont(struct kmem_cache *c, void *object)
> +{
> +       /* Do nothing. Use for caches with constructors. */
> +}
> +
>  static struct kmem_cache *create_cache(const char *name,
>                 unsigned int object_size, unsigned int align,
>                 slab_flags_t flags, unsigned int useroffset,
> @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
>         s->size = s->object_size = object_size;
>         s->align = align;
>         s->ctor = ctor;
> +       if (ctor)
> +               s->poison_fn = poison_dont;
> +       else
> +               s->poison_fn = poison_zero;

As mentioned, we must still always zero when __GFP_ZERO is present.

>         s->useroffset = useroffset;
>         s->usersize = usersize;
>
> @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>         s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
>         s->useroffset = useroffset;
>         s->usersize = usersize;
> +       s->poison_fn = poison_zero;
>
>         slab_init_memcg_params(s);
>
> diff --git a/mm/slob.c b/mm/slob.c
> index 307c2c9feb44..18981a71e962 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
>                 BUG_ON(!b);
>                 spin_unlock_irqrestore(&slob_lock, flags);
>         }
> -       if (unlikely(gfp & __GFP_ZERO))
> +       if (unlikely(want_init_memory(gfp)))
>                 memset(b, 0, size);
>         return b;
>  }
> diff --git a/mm/slub.c b/mm/slub.c
> index d30ede89f4a6..e4efb6575510 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
>                 stat(s, ALLOC_FASTPATH);
>         }
>
> -       if (unlikely(gfpflags & __GFP_ZERO) && object)
> -               memset(object, 0, s->object_size);
> +       if (unlikely(want_init_memory(gfpflags)) && object)
> +               s->poison_fn(s, object);
>
>         slab_post_alloc_hook(s, gfpflags, 1, &object);
>
> @@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>         local_irq_enable();
>
>         /* Clear memory outside IRQ disabled fastpath loop */
> -       if (unlikely(flags & __GFP_ZERO)) {
> +       if (unlikely(want_init_memory(flags))) {
>                 int j;
>
>                 for (j = 0; j < i; j++)
> -                       memset(p[j], 0, s->object_size);
> +                       s->poison_fn(s, p[j]);
>         }
>
>         /* memcg and kmem_cache debug support */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 782343bb925b..99b288a19b39 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
>                 sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
>                 if (!sk)
>                         return sk;
> -               if (priority & __GFP_ZERO)
> +               if (want_init_memory(priority))
>                         sk_prot_clear_nulls(sk, prot->obj_size);
>         } else
>                 sk = kmalloc(prot->obj_size, priority);
> --
> 2.21.0.392.gf8f6787159e-goog
>

Looking good. :) Thanks for working on this!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-23 19:00     ` Kees Cook
  0 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:00 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
>
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.
>
> For kernel testing purposes filling allocations with a nonzero pattern
> would be more suitable, but may require platform-specific code. To have
> a simple baseline we've decided to start with zero-initialization.

The memory tagging future may be worth mentioning here too. We'll
always need an allocation-time hook. What we do in that hook (tag,
zero, poison) can be manage from there.

> No performance optimizations are done at the moment to reduce double
> initialization of memory regions.

Isn't this addressed in later patches?

>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/mm.h                     |  8 ++++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  kernel/kexec_core.c                    |  2 +-
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 +++++++++++++++++-
>  mm/slab.c                              | 12 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++++
>  mm/slob.c                              |  2 +-
>  mm/slub.c                              |  8 ++++----
>  net/core/sock.c                        |  2 +-
>  13 files changed, 58 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
> index e1379949e663..f31234906be2 100644
> --- a/drivers/infiniband/core/uverbs_ioctl.c
> +++ b/drivers/infiniband/core/uverbs_ioctl.c
> @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
>         res = (void *)pbundle->internal_buffer + pbundle->internal_used;
>         pbundle->internal_used =
>                 ALIGN(new_used, sizeof(*pbundle->internal_buffer));
> -       if (flags & __GFP_ZERO)
> +       if (want_init_memory(flags))
>                 memset(res, 0, size);
>         return res;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 76769749b5a5..b38b71a5efaa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
>                                         int enable) { }
>  #endif
>
> +DECLARE_STATIC_KEY_FALSE(init_allocations);

I'd like a CONFIG to control this default. We can keep the boot-time
option to change it, but I think a CONFIG is warranted here.

> +static inline bool want_init_memory(gfp_t flags)
> +{
> +       if (static_branch_unlikely(&init_allocations))
> +               return true;
> +       return flags & __GFP_ZERO;
> +}
> +
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  extern bool _debug_pagealloc_enabled;
>  extern void __kernel_map_pages(struct page *page, int numpages, int enable);
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 9a5eafb7145b..9dfe9eb639d7 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -37,6 +37,7 @@ struct kmem_cache {
>
>         /* constructor func */
>         void (*ctor)(void *obj);
> +       void (*poison_fn)(struct kmem_cache *c, void *object);
>
>  /* 4) cache creation/removal */
>         const char *name;
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index d2153789bd9f..afb928cb7c20 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -99,6 +99,7 @@ struct kmem_cache {
>         gfp_t allocflags;       /* gfp flags to use on each alloc */
>         int refcount;           /* Refcount for slab cache destroy */
>         void (*ctor)(void *);
> +       void (*poison_fn)(struct kmem_cache *c, void *object);
>         unsigned int inuse;             /* Offset to metadata */
>         unsigned int align;             /* Alignment */
>         unsigned int red_left_pad;      /* Left redzone padding size */
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index d7140447be75..be84f5f95c97 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>                 arch_kexec_post_alloc_pages(page_address(pages), count,
>                                             gfp_mask);
>
> -               if (gfp_mask & __GFP_ZERO)
> +               if (want_init_memory(gfp_mask))
>                         for (i = 0; i < count; i++)
>                                 clear_highpage(pages + i);
>         }
> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index 76a160083506..796e38160d39 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
>  #endif
>         spin_unlock_irqrestore(&pool->lock, flags);
>
> -       if (mem_flags & __GFP_ZERO)
> +       if (want_init_memory(mem_flags))
>                 memset(retval, 0, pool->size);
>
>         return retval;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..e2a21d866ac9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
>
>  int percpu_pagelist_fraction;
>  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> +bool want_init_allocations __read_mostly;

This can be a stack variable in early_init_allocations() -- it's never
used again.

> +EXPORT_SYMBOL(want_init_allocations);
> +DEFINE_STATIC_KEY_FALSE(init_allocations);
> +
> +static int __init early_init_allocations(char *buf)
> +{
> +       int ret;
> +
> +       if (!buf)
> +               return -EINVAL;
> +       ret = kstrtobool(buf, &want_init_allocations);
> +       if (want_init_allocations)
> +               static_branch_enable(&init_allocations);

With the CONFIG, this should have a _disable on an "else" here...

> +       return ret;
> +}
> +early_param("init_allocations", early_init_allocations);

Does early_init_allocations() get called before things like
prep_new_page() are up and running to call want_init_memory()?

>
>  /*
>   * A cached value of the page's pageblock's migratetype, used when the page is
> @@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
>
>         post_alloc_hook(page, order, gfp_flags);
>
> -       if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
> +       if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
>                 for (i = 0; i < (1 << order); i++)
>                         clear_highpage(page + i);
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 47a380a486ee..dcc5b73cf767 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
>         local_irq_restore(save_flags);
>         ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
>
> -       if (unlikely(flags & __GFP_ZERO) && ptr)
> -               memset(ptr, 0, cachep->object_size);
> +       if (unlikely(want_init_memory(flags)) && ptr)
> +               cachep->poison_fn(cachep, ptr);

So... this _must_ zero when __GFP_ZERO is present, so I'm not sure
"poison_fn" is the right name, and it likely needs to take the "flags"
argument.

>
>         slab_post_alloc_hook(cachep, flags, 1, &ptr);
>         return ptr;
> @@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
>         objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
>         prefetchw(objp);
>
> -       if (unlikely(flags & __GFP_ZERO) && objp)
> -               memset(objp, 0, cachep->object_size);
> +       if (unlikely(want_init_memory(flags)) && objp)
> +               cachep->poison_fn(cachep, objp);

Same.

>
>         slab_post_alloc_hook(cachep, flags, 1, &objp);
>         return objp;
> @@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>         cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
>
>         /* Clear memory outside IRQ disabled section */
> -       if (unlikely(flags & __GFP_ZERO))
> +       if (unlikely(want_init_memory(flags)))
>                 for (i = 0; i < size; i++)
> -                       memset(p[i], 0, s->object_size);
> +                       s->poison_fn(s, p[i]);

Same.

>
>         slab_post_alloc_hook(s, flags, size, p);
>         /* FIXME: Trace call missing. Christoph would like a bulk variant */
> diff --git a/mm/slab.h b/mm/slab.h
> index 43ac818b8592..3b541e8970ee 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -27,6 +27,7 @@ struct kmem_cache {
>         const char *name;       /* Slab name for sysfs */
>         int refcount;           /* Use counter */
>         void (*ctor)(void *);   /* Called on object slot creation */
> +       void (*poison_fn)(struct kmem_cache *c, void *object);

How about naming it just "initialize"?

>         struct list_head list;  /* List of all slab caches on the system */
>  };
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 58251ba63e4a..37810114b2ea 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
>         return NULL;
>  }
>
> +static void poison_zero(struct kmem_cache *c, void *object)
> +{
> +       memset(object, 0, c->object_size);
> +}
> +
> +static void poison_dont(struct kmem_cache *c, void *object)
> +{
> +       /* Do nothing. Use for caches with constructors. */
> +}
> +
>  static struct kmem_cache *create_cache(const char *name,
>                 unsigned int object_size, unsigned int align,
>                 slab_flags_t flags, unsigned int useroffset,
> @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
>         s->size = s->object_size = object_size;
>         s->align = align;
>         s->ctor = ctor;
> +       if (ctor)
> +               s->poison_fn = poison_dont;
> +       else
> +               s->poison_fn = poison_zero;

As mentioned, we must still always zero when __GFP_ZERO is present.

>         s->useroffset = useroffset;
>         s->usersize = usersize;
>
> @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>         s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
>         s->useroffset = useroffset;
>         s->usersize = usersize;
> +       s->poison_fn = poison_zero;
>
>         slab_init_memcg_params(s);
>
> diff --git a/mm/slob.c b/mm/slob.c
> index 307c2c9feb44..18981a71e962 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
>                 BUG_ON(!b);
>                 spin_unlock_irqrestore(&slob_lock, flags);
>         }
> -       if (unlikely(gfp & __GFP_ZERO))
> +       if (unlikely(want_init_memory(gfp)))
>                 memset(b, 0, size);
>         return b;
>  }
> diff --git a/mm/slub.c b/mm/slub.c
> index d30ede89f4a6..e4efb6575510 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
>                 stat(s, ALLOC_FASTPATH);
>         }
>
> -       if (unlikely(gfpflags & __GFP_ZERO) && object)
> -               memset(object, 0, s->object_size);
> +       if (unlikely(want_init_memory(gfpflags)) && object)
> +               s->poison_fn(s, object);
>
>         slab_post_alloc_hook(s, gfpflags, 1, &object);
>
> @@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>         local_irq_enable();
>
>         /* Clear memory outside IRQ disabled fastpath loop */
> -       if (unlikely(flags & __GFP_ZERO)) {
> +       if (unlikely(want_init_memory(flags))) {
>                 int j;
>
>                 for (j = 0; j < i; j++)
> -                       memset(p[j], 0, s->object_size);
> +                       s->poison_fn(s, p[j]);
>         }
>
>         /* memcg and kmem_cache debug support */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 782343bb925b..99b288a19b39 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
>                 sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
>                 if (!sk)
>                         return sk;
> -               if (priority & __GFP_ZERO)
> +               if (want_init_memory(priority))
>                         sk_prot_clear_nulls(sk, prot->obj_size);
>         } else
>                 sk = kmalloc(prot->obj_size, priority);
> --
> 2.21.0.392.gf8f6787159e-goog
>

Looking good. :) Thanks for working on this!

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
  2019-04-18 15:42   ` Alexander Potapenko
@ 2019-04-23 19:11     ` Kees Cook
  -1 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:11 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
>
> When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> tells it to not initialize the requested memory if the init_allocations
> boot option is enabled. This can be useful in the cases the newly
> allocated memory is going to be initialized by the caller right away.

Maybe add "... as seen when the slab allocator needs to allocate new
pages from the page allocator." just to help clarify it here (instead
of from the end of the commit log where you mention it offhand).

>
> __GFP_NOINIT basically defeats the hardening against information leaks
> provided by the init_allocations feature, so one should use it with
> caution.
>
> This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  include/linux/gfp.h | 6 +++++-
>  include/linux/mm.h  | 2 +-
>  kernel/kexec_core.c | 2 +-
>  mm/slab.c           | 2 +-
>  mm/slob.c           | 1 +
>  mm/slub.c           | 1 +
>  6 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index fdab7de7490d..66d7f5604fe2 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -44,6 +44,7 @@ struct vm_area_struct;
>  #else
>  #define ___GFP_NOLOCKDEP       0
>  #endif
> +#define ___GFP_NOINIT          0x1000000u
>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */

I think you want to add NOINIT below GFP_ACCOUNT, update NOLOCKDEP and
then adjust GFP_BITS_SHIFT differently, noted below.

>
>  /*
> @@ -208,16 +209,19 @@ struct vm_area_struct;
>   * %__GFP_COMP address compound page metadata.
>   *
>   * %__GFP_ZERO returns a zeroed page on success.
> + *
> + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
>   */
>  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
>  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
>  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
>
>  /* Disable lockdep for GFP context tracking */
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (25)

This should just be 24 + ...   with the bit field added above NOLOCKDEP.

>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>
>  /**
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b38b71a5efaa..8f03334a9033 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2601,7 +2601,7 @@ DECLARE_STATIC_KEY_FALSE(init_allocations);
>  static inline bool want_init_memory(gfp_t flags)
>  {
>         if (static_branch_unlikely(&init_allocations))
> -               return true;
> +               return !(flags & __GFP_NOINIT);
>         return flags & __GFP_ZERO;
>  }

You need to test for GFP_ZERO here too: return ((flags & __GFP_NOINIT
| __GFP_ZERO) == 0)

Also, I wonder, for the sake of readability, if this should be named
__GFP_NO_AUTOINIT ?

I'd also like to see each use of __GFP_NOINIT include a comment above
its use where the logic is explained for _why_ it's safe (or
reasonable) to use __GFP_NOINIT in each place.

>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index be84f5f95c97..f9d1f1236cd0 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  {
>         struct page *pages;
>
> -       pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> +       pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
>         if (pages) {
>                 unsigned int count, i;
>
> diff --git a/mm/slab.c b/mm/slab.c
> index dcc5b73cf767..762cb0e7bcc1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>         struct page *page;
>         int nr_pages;
>
> -       flags |= cachep->allocflags;
> +       flags |= (cachep->allocflags | __GFP_NOINIT);
>
>         page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
>         if (!page) {
> diff --git a/mm/slob.c b/mm/slob.c
> index 18981a71e962..867d2d68a693 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
>  {
>         void *page;
>
> +       gfp |= __GFP_NOINIT;
>  #ifdef CONFIG_NUMA
>         if (node != NUMA_NO_NODE)
>                 page = __alloc_pages_node(node, gfp, order);
> diff --git a/mm/slub.c b/mm/slub.c
> index e4efb6575510..a79b4cb768a2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1493,6 +1493,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,

What about kmalloc_large_node()?

>         struct page *page;
>         unsigned int order = oo_order(oo);
>
> +       flags |= __GFP_NOINIT;
>         if (node == NUMA_NO_NODE)
>                 page = alloc_pages(flags, order);
>         else

And just so I make sure I'm understanding this correctly: __GFP_NOINIT
is passed to the page allocator because we know each allocation from
the slab will get initialized at "sub allocation" time.

Looks good!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
@ 2019-04-23 19:11     ` Kees Cook
  0 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:11 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
>
> When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> tells it to not initialize the requested memory if the init_allocations
> boot option is enabled. This can be useful in the cases the newly
> allocated memory is going to be initialized by the caller right away.

Maybe add "... as seen when the slab allocator needs to allocate new
pages from the page allocator." just to help clarify it here (instead
of from the end of the commit log where you mention it offhand).

>
> __GFP_NOINIT basically defeats the hardening against information leaks
> provided by the init_allocations feature, so one should use it with
> caution.
>
> This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  include/linux/gfp.h | 6 +++++-
>  include/linux/mm.h  | 2 +-
>  kernel/kexec_core.c | 2 +-
>  mm/slab.c           | 2 +-
>  mm/slob.c           | 1 +
>  mm/slub.c           | 1 +
>  6 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index fdab7de7490d..66d7f5604fe2 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -44,6 +44,7 @@ struct vm_area_struct;
>  #else
>  #define ___GFP_NOLOCKDEP       0
>  #endif
> +#define ___GFP_NOINIT          0x1000000u
>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */

I think you want to add NOINIT below GFP_ACCOUNT, update NOLOCKDEP and
then adjust GFP_BITS_SHIFT differently, noted below.

>
>  /*
> @@ -208,16 +209,19 @@ struct vm_area_struct;
>   * %__GFP_COMP address compound page metadata.
>   *
>   * %__GFP_ZERO returns a zeroed page on success.
> + *
> + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
>   */
>  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
>  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
>  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
>
>  /* Disable lockdep for GFP context tracking */
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (25)

This should just be 24 + ...   with the bit field added above NOLOCKDEP.

>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>
>  /**
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b38b71a5efaa..8f03334a9033 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2601,7 +2601,7 @@ DECLARE_STATIC_KEY_FALSE(init_allocations);
>  static inline bool want_init_memory(gfp_t flags)
>  {
>         if (static_branch_unlikely(&init_allocations))
> -               return true;
> +               return !(flags & __GFP_NOINIT);
>         return flags & __GFP_ZERO;
>  }

You need to test for GFP_ZERO here too: return ((flags & __GFP_NOINIT
| __GFP_ZERO) == 0)

Also, I wonder, for the sake of readability, if this should be named
__GFP_NO_AUTOINIT ?

I'd also like to see each use of __GFP_NOINIT include a comment above
its use where the logic is explained for _why_ it's safe (or
reasonable) to use __GFP_NOINIT in each place.

>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index be84f5f95c97..f9d1f1236cd0 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  {
>         struct page *pages;
>
> -       pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> +       pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
>         if (pages) {
>                 unsigned int count, i;
>
> diff --git a/mm/slab.c b/mm/slab.c
> index dcc5b73cf767..762cb0e7bcc1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>         struct page *page;
>         int nr_pages;
>
> -       flags |= cachep->allocflags;
> +       flags |= (cachep->allocflags | __GFP_NOINIT);
>
>         page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
>         if (!page) {
> diff --git a/mm/slob.c b/mm/slob.c
> index 18981a71e962..867d2d68a693 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
>  {
>         void *page;
>
> +       gfp |= __GFP_NOINIT;
>  #ifdef CONFIG_NUMA
>         if (node != NUMA_NO_NODE)
>                 page = __alloc_pages_node(node, gfp, order);
> diff --git a/mm/slub.c b/mm/slub.c
> index e4efb6575510..a79b4cb768a2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1493,6 +1493,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,

What about kmalloc_large_node()?

>         struct page *page;
>         unsigned int order = oo_order(oo);
>
> +       flags |= __GFP_NOINIT;
>         if (node == NUMA_NO_NODE)
>                 page = alloc_pages(flags, order);
>         else

And just so I make sure I'm understanding this correctly: __GFP_NOINIT
is passed to the page allocator because we know each allocation from
the slab will get initialized at "sub allocation" time.

Looks good!

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
  2019-04-18 16:52   ` Dave Hansen
@ 2019-04-23 19:14       ` Kees Cook
  0 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Dmitry Vyukov, Laura Abbott, Linux-MM, linux-security-module,
	Kernel Hardening

On Thu, Apr 18, 2019 at 9:52 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index be84f5f95c97..f9d1f1236cd0 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> >  {
> >       struct page *pages;
> >
> > -     pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> > +     pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
> >       if (pages) {
> >               unsigned int count, i;
>
> While this is probably not super security-sensitive, it's also not
> performance sensitive.

It is, however, a pretty clear case of "and then we immediately zero it".

> These sl*b ones seem like a bad idea.  We already have rules that sl*b
> allocations must be initialized by callers, and we have reasonably
> frequent bugs where the rules are broken.

Hm? No, this is saying that the page allocator can skip the auto-init
because the slab internals will be doing it later.

> Setting __GFP_NOINIT might be reasonable to do, though, for slabs that
> have a constructor.  We have much higher confidence that *those* are
> going to get initialized properly.

That's already handled in patch 1.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
@ 2019-04-23 19:14       ` Kees Cook
  0 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Dmitry Vyukov, Laura Abbott, Linux-MM, linux-security-module,
	Kernel Hardening

On Thu, Apr 18, 2019 at 9:52 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index be84f5f95c97..f9d1f1236cd0 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> >  {
> >       struct page *pages;
> >
> > -     pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> > +     pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
> >       if (pages) {
> >               unsigned int count, i;
>
> While this is probably not super security-sensitive, it's also not
> performance sensitive.

It is, however, a pretty clear case of "and then we immediately zero it".

> These sl*b ones seem like a bad idea.  We already have rules that sl*b
> allocations must be initialized by callers, and we have reasonably
> frequent bugs where the rules are broken.

Hm? No, this is saying that the page allocator can skip the auto-init
because the slab internals will be doing it later.

> Setting __GFP_NOINIT might be reasonable to do, though, for slabs that
> have a constructor.  We have much higher confidence that *those* are
> going to get initialized properly.

That's already handled in patch 1.

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] RFC: net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations
  2019-04-18 15:42   ` Alexander Potapenko
@ 2019-04-23 19:17     ` Kees Cook
  -1 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
>
> Add sock_alloc_send_pskb_noinit(), which is similar to
> sock_alloc_send_pskb(), but allocates with __GFP_NOINIT.
> This helps reduce the slowdown on hackbench from 9% to 0.1%.

I would include a detailed justification about why this is safe to do.
I imagine (but haven't looked) that the skb is immediately written to
after allocation, so this is basically avoiding a "double init". Is
that correct?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] RFC: net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations
@ 2019-04-23 19:17     ` Kees Cook
  0 siblings, 0 replies; 41+ messages in thread
From: Kees Cook @ 2019-04-23 19:17 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
>
> Add sock_alloc_send_pskb_noinit(), which is similar to
> sock_alloc_send_pskb(), but allocates with __GFP_NOINIT.
> This helps reduce the slowdown on hackbench from 9% to 0.1%.

I would include a detailed justification about why this is safe to do.
I imagine (but haven't looked) that the skb is immediately written to
after allocation, so this is basically avoiding a "double init". Is
that correct?

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 15:42   ` Alexander Potapenko
                     ` (3 preceding siblings ...)
  (?)
@ 2019-04-23 20:36   ` Dave Hansen
  -1 siblings, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2019-04-23 20:36 UTC (permalink / raw)
  To: Alexander Potapenko, akpm, cl, dvyukov, keescook, labbott
  Cc: linux-mm, linux-security-module, kernel-hardening

On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> +static void poison_dont(struct kmem_cache *c, void *object)
> +{
> +	/* Do nothing. Use for caches with constructors. */
> +}
> +
>  static struct kmem_cache *create_cache(const char *name,
>  		unsigned int object_size, unsigned int align,
>  		slab_flags_t flags, unsigned int useroffset,
> @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
>  	s->size = s->object_size = object_size;
>  	s->align = align;
>  	s->ctor = ctor;
> +	if (ctor)
> +		s->poison_fn = poison_dont;
> +	else
> +		s->poison_fn = poison_zero;
>  	s->useroffset = useroffset;
>  	s->usersize = usersize;
>  
> @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>  	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
>  	s->useroffset = useroffset;
>  	s->usersize = usersize;
> +	s->poison_fn = poison_zero;

An empty indirect call is probably a pretty bad idea on systems with
retpoline.  Isn't this just a bool anyway for either calling poison_dont
or poison_zero?  Can it call anything else?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT
  2019-04-23 19:14       ` Kees Cook
  (?)
@ 2019-04-23 20:40       ` Dave Hansen
  -1 siblings, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2019-04-23 20:40 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Dmitry Vyukov, Laura Abbott, Linux-MM, linux-security-module,
	Kernel Hardening

On 4/23/19 12:14 PM, Kees Cook wrote:
>> These sl*b ones seem like a bad idea.  We already have rules that sl*b
>> allocations must be initialized by callers, and we have reasonably
>> frequent bugs where the rules are broken.
> 
> Hm? No, this is saying that the page allocator can skip the auto-init
> because the slab internals will be doing it later.

Ahhh, got it.  I missed all the fun in patch 1.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-23 19:00     ` Kees Cook
@ 2019-04-26 12:12       ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-26 12:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Tue, Apr 23, 2019 at 9:00 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
> >
> > Initialization is done at allocation time at the places where checks for
> > __GFP_ZERO are performed. We don't initialize slab caches with
> > constructors to preserve their semantics. To reduce runtime costs of
> > checking cachep->ctor we replace a call to memset with a call to
> > cachep->poison_fn, which is only executed if the memory block needs to
> > be initialized.
> >
> > For kernel testing purposes filling allocations with a nonzero pattern
> > would be more suitable, but may require platform-specific code. To have
> > a simple baseline we've decided to start with zero-initialization.
>
> The memory tagging future may be worth mentioning here too. We'll
> always need an allocation-time hook. What we do in that hook (tag,
> zero, poison) can be manage from there.
Shall we factor out the allocation hook in this patch? Note that I'll
be probably dropping this poison_fn() stuff, as it should be enough to
just call memset() under a static branch.
> > No performance optimizations are done at the moment to reduce double
> > initialization of memory regions.
>
> Isn't this addressed in later patches?
Agreed.
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: James Morris <jmorris@namei.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > Cc: Kostya Serebryany <kcc@google.com>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Sandeep Patil <sspatil@android.com>
> > Cc: Laura Abbott <labbott@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Qian Cai <cai@lca.pw>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: linux-mm@kvack.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-hardening@lists.openwall.com
> > ---
> >  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
> >  include/linux/mm.h                     |  8 ++++++++
> >  include/linux/slab_def.h               |  1 +
> >  include/linux/slub_def.h               |  1 +
> >  kernel/kexec_core.c                    |  2 +-
> >  mm/dmapool.c                           |  2 +-
> >  mm/page_alloc.c                        | 18 +++++++++++++++++-
> >  mm/slab.c                              | 12 ++++++------
> >  mm/slab.h                              |  1 +
> >  mm/slab_common.c                       | 15 +++++++++++++++
> >  mm/slob.c                              |  2 +-
> >  mm/slub.c                              |  8 ++++----
> >  net/core/sock.c                        |  2 +-
> >  13 files changed, 58 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
> > index e1379949e663..f31234906be2 100644
> > --- a/drivers/infiniband/core/uverbs_ioctl.c
> > +++ b/drivers/infiniband/core/uverbs_ioctl.c
> > @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
> >         res = (void *)pbundle->internal_buffer + pbundle->internal_used;
> >         pbundle->internal_used =
> >                 ALIGN(new_used, sizeof(*pbundle->internal_buffer));
> > -       if (flags & __GFP_ZERO)
> > +       if (want_init_memory(flags))
> >                 memset(res, 0, size);
> >         return res;
> >  }
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 76769749b5a5..b38b71a5efaa 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
> >                                         int enable) { }
> >  #endif
> >
> > +DECLARE_STATIC_KEY_FALSE(init_allocations);
>
> I'd like a CONFIG to control this default. We can keep the boot-time
> option to change it, but I think a CONFIG is warranted here.
Ok, will do.
> > +static inline bool want_init_memory(gfp_t flags)
> > +{
> > +       if (static_branch_unlikely(&init_allocations))
> > +               return true;
> > +       return flags & __GFP_ZERO;
> > +}
> > +
> >  #ifdef CONFIG_DEBUG_PAGEALLOC
> >  extern bool _debug_pagealloc_enabled;
> >  extern void __kernel_map_pages(struct page *page, int numpages, int enable);
> > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > index 9a5eafb7145b..9dfe9eb639d7 100644
> > --- a/include/linux/slab_def.h
> > +++ b/include/linux/slab_def.h
> > @@ -37,6 +37,7 @@ struct kmem_cache {
> >
> >         /* constructor func */
> >         void (*ctor)(void *obj);
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
> >
> >  /* 4) cache creation/removal */
> >         const char *name;
> > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > index d2153789bd9f..afb928cb7c20 100644
> > --- a/include/linux/slub_def.h
> > +++ b/include/linux/slub_def.h
> > @@ -99,6 +99,7 @@ struct kmem_cache {
> >         gfp_t allocflags;       /* gfp flags to use on each alloc */
> >         int refcount;           /* Refcount for slab cache destroy */
> >         void (*ctor)(void *);
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
> >         unsigned int inuse;             /* Offset to metadata */
> >         unsigned int align;             /* Alignment */
> >         unsigned int red_left_pad;      /* Left redzone padding size */
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index d7140447be75..be84f5f95c97 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> >                 arch_kexec_post_alloc_pages(page_address(pages), count,
> >                                             gfp_mask);
> >
> > -               if (gfp_mask & __GFP_ZERO)
> > +               if (want_init_memory(gfp_mask))
> >                         for (i = 0; i < count; i++)
> >                                 clear_highpage(pages + i);
> >         }
> > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > index 76a160083506..796e38160d39 100644
> > --- a/mm/dmapool.c
> > +++ b/mm/dmapool.c
> > @@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> >  #endif
> >         spin_unlock_irqrestore(&pool->lock, flags);
> >
> > -       if (mem_flags & __GFP_ZERO)
> > +       if (want_init_memory(mem_flags))
> >                 memset(retval, 0, pool->size);
> >
> >         return retval;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index d96ca5bc555b..e2a21d866ac9 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
> >
> >  int percpu_pagelist_fraction;
> >  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> > +bool want_init_allocations __read_mostly;
>
> This can be a stack variable in early_init_allocations() -- it's never
> used again.
Ack.
> > +EXPORT_SYMBOL(want_init_allocations);
> > +DEFINE_STATIC_KEY_FALSE(init_allocations);
> > +
> > +static int __init early_init_allocations(char *buf)
> > +{
> > +       int ret;
> > +
> > +       if (!buf)
> > +               return -EINVAL;
> > +       ret = kstrtobool(buf, &want_init_allocations);
> > +       if (want_init_allocations)
> > +               static_branch_enable(&init_allocations);
>
> With the CONFIG, this should have a _disable on an "else" here...
Ack.
> > +       return ret;
> > +}
> > +early_param("init_allocations", early_init_allocations);
>
> Does early_init_allocations() get called before things like
> prep_new_page() are up and running to call want_init_memory()?
Yes, IIUC early params are initialized before the memory subsystem is
initialized.

> >
> >  /*
> >   * A cached value of the page's pageblock's migratetype, used when the page is
> > @@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
> >
> >         post_alloc_hook(page, order, gfp_flags);
> >
> > -       if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
> > +       if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
> >                 for (i = 0; i < (1 << order); i++)
> >                         clear_highpage(page + i);
> >
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 47a380a486ee..dcc5b73cf767 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> >         local_irq_restore(save_flags);
> >         ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
> >
> > -       if (unlikely(flags & __GFP_ZERO) && ptr)
> > -               memset(ptr, 0, cachep->object_size);
> > +       if (unlikely(want_init_memory(flags)) && ptr)
> > +               cachep->poison_fn(cachep, ptr);
>
> So... this _must_ zero when __GFP_ZERO is present, so I'm not sure
> "poison_fn" is the right name, and it likely needs to take the "flags"
> argument.
I'll rework this part, as it had been pointed out that an empty
indirect call still costs something.
We're basically choosing between two options:
 - flags & __GFP_ZERO when initialization is disabled;
 - and cachep->ctr when it's enabled
A static branch should be enough to switch between those without
introducing extra checks or indirections.
>
> >
> >         slab_post_alloc_hook(cachep, flags, 1, &ptr);
> >         return ptr;
> > @@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
> >         objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
> >         prefetchw(objp);
> >
> > -       if (unlikely(flags & __GFP_ZERO) && objp)
> > -               memset(objp, 0, cachep->object_size);
> > +       if (unlikely(want_init_memory(flags)) && objp)
> > +               cachep->poison_fn(cachep, objp);
>
> Same.
>
> >
> >         slab_post_alloc_hook(cachep, flags, 1, &objp);
> >         return objp;
> > @@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >         cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
> >
> >         /* Clear memory outside IRQ disabled section */
> > -       if (unlikely(flags & __GFP_ZERO))
> > +       if (unlikely(want_init_memory(flags)))
> >                 for (i = 0; i < size; i++)
> > -                       memset(p[i], 0, s->object_size);
> > +                       s->poison_fn(s, p[i]);
>
> Same.
>
> >
> >         slab_post_alloc_hook(s, flags, size, p);
> >         /* FIXME: Trace call missing. Christoph would like a bulk variant */
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 43ac818b8592..3b541e8970ee 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -27,6 +27,7 @@ struct kmem_cache {
> >         const char *name;       /* Slab name for sysfs */
> >         int refcount;           /* Use counter */
> >         void (*ctor)(void *);   /* Called on object slot creation */
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
>
> How about naming it just "initialize"?
DItto.
> >         struct list_head list;  /* List of all slab caches on the system */
> >  };
> >
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 58251ba63e4a..37810114b2ea 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
> >         return NULL;
> >  }
> >
> > +static void poison_zero(struct kmem_cache *c, void *object)
> > +{
> > +       memset(object, 0, c->object_size);
> > +}
> > +
> > +static void poison_dont(struct kmem_cache *c, void *object)
> > +{
> > +       /* Do nothing. Use for caches with constructors. */
> > +}
> > +
> >  static struct kmem_cache *create_cache(const char *name,
> >                 unsigned int object_size, unsigned int align,
> >                 slab_flags_t flags, unsigned int useroffset,
> > @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
> >         s->size = s->object_size = object_size;
> >         s->align = align;
> >         s->ctor = ctor;
> > +       if (ctor)
> > +               s->poison_fn = poison_dont;
> > +       else
> > +               s->poison_fn = poison_zero;
>
> As mentioned, we must still always zero when __GFP_ZERO is present.
This part will be gone, but in general __GFP_ZERO is incompatible with
constructors, see a check in new_slab_objects().
I don't think we must support erroneous behaviour.

> >         s->useroffset = useroffset;
> >         s->usersize = usersize;
> >
> > @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> >         s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
> >         s->useroffset = useroffset;
> >         s->usersize = usersize;
> > +       s->poison_fn = poison_zero;
> >
> >         slab_init_memcg_params(s);
> >
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 307c2c9feb44..18981a71e962 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
> >                 BUG_ON(!b);
> >                 spin_unlock_irqrestore(&slob_lock, flags);
> >         }
> > -       if (unlikely(gfp & __GFP_ZERO))
> > +       if (unlikely(want_init_memory(gfp)))
> >                 memset(b, 0, size);
> >         return b;
> >  }
> > diff --git a/mm/slub.c b/mm/slub.c
> > index d30ede89f4a6..e4efb6575510 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
> >                 stat(s, ALLOC_FASTPATH);
> >         }
> >
> > -       if (unlikely(gfpflags & __GFP_ZERO) && object)
> > -               memset(object, 0, s->object_size);
> > +       if (unlikely(want_init_memory(gfpflags)) && object)
> > +               s->poison_fn(s, object);
> >
> >         slab_post_alloc_hook(s, gfpflags, 1, &object);
> >
> > @@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >         local_irq_enable();
> >
> >         /* Clear memory outside IRQ disabled fastpath loop */
> > -       if (unlikely(flags & __GFP_ZERO)) {
> > +       if (unlikely(want_init_memory(flags))) {
> >                 int j;
> >
> >                 for (j = 0; j < i; j++)
> > -                       memset(p[j], 0, s->object_size);
> > +                       s->poison_fn(s, p[j]);
> >         }
> >
> >         /* memcg and kmem_cache debug support */
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 782343bb925b..99b288a19b39 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
> >                 sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
> >                 if (!sk)
> >                         return sk;
> > -               if (priority & __GFP_ZERO)
> > +               if (want_init_memory(priority))
> >                         sk_prot_clear_nulls(sk, prot->obj_size);
> >         } else
> >                 sk = kmalloc(prot->obj_size, priority);
> > --
> > 2.21.0.392.gf8f6787159e-goog
> >
>
> Looking good. :) Thanks for working on this!
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-26 12:12       ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-26 12:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Tue, Apr 23, 2019 at 9:00 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
> >
> > Initialization is done at allocation time at the places where checks for
> > __GFP_ZERO are performed. We don't initialize slab caches with
> > constructors to preserve their semantics. To reduce runtime costs of
> > checking cachep->ctor we replace a call to memset with a call to
> > cachep->poison_fn, which is only executed if the memory block needs to
> > be initialized.
> >
> > For kernel testing purposes filling allocations with a nonzero pattern
> > would be more suitable, but may require platform-specific code. To have
> > a simple baseline we've decided to start with zero-initialization.
>
> The memory tagging future may be worth mentioning here too. We'll
> always need an allocation-time hook. What we do in that hook (tag,
> zero, poison) can be manage from there.
Shall we factor out the allocation hook in this patch? Note that I'll
be probably dropping this poison_fn() stuff, as it should be enough to
just call memset() under a static branch.
> > No performance optimizations are done at the moment to reduce double
> > initialization of memory regions.
>
> Isn't this addressed in later patches?
Agreed.
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: James Morris <jmorris@namei.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > Cc: Kostya Serebryany <kcc@google.com>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Sandeep Patil <sspatil@android.com>
> > Cc: Laura Abbott <labbott@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Qian Cai <cai@lca.pw>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: linux-mm@kvack.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-hardening@lists.openwall.com
> > ---
> >  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
> >  include/linux/mm.h                     |  8 ++++++++
> >  include/linux/slab_def.h               |  1 +
> >  include/linux/slub_def.h               |  1 +
> >  kernel/kexec_core.c                    |  2 +-
> >  mm/dmapool.c                           |  2 +-
> >  mm/page_alloc.c                        | 18 +++++++++++++++++-
> >  mm/slab.c                              | 12 ++++++------
> >  mm/slab.h                              |  1 +
> >  mm/slab_common.c                       | 15 +++++++++++++++
> >  mm/slob.c                              |  2 +-
> >  mm/slub.c                              |  8 ++++----
> >  net/core/sock.c                        |  2 +-
> >  13 files changed, 58 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
> > index e1379949e663..f31234906be2 100644
> > --- a/drivers/infiniband/core/uverbs_ioctl.c
> > +++ b/drivers/infiniband/core/uverbs_ioctl.c
> > @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
> >         res = (void *)pbundle->internal_buffer + pbundle->internal_used;
> >         pbundle->internal_used =
> >                 ALIGN(new_used, sizeof(*pbundle->internal_buffer));
> > -       if (flags & __GFP_ZERO)
> > +       if (want_init_memory(flags))
> >                 memset(res, 0, size);
> >         return res;
> >  }
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 76769749b5a5..b38b71a5efaa 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
> >                                         int enable) { }
> >  #endif
> >
> > +DECLARE_STATIC_KEY_FALSE(init_allocations);
>
> I'd like a CONFIG to control this default. We can keep the boot-time
> option to change it, but I think a CONFIG is warranted here.
Ok, will do.
> > +static inline bool want_init_memory(gfp_t flags)
> > +{
> > +       if (static_branch_unlikely(&init_allocations))
> > +               return true;
> > +       return flags & __GFP_ZERO;
> > +}
> > +
> >  #ifdef CONFIG_DEBUG_PAGEALLOC
> >  extern bool _debug_pagealloc_enabled;
> >  extern void __kernel_map_pages(struct page *page, int numpages, int enable);
> > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > index 9a5eafb7145b..9dfe9eb639d7 100644
> > --- a/include/linux/slab_def.h
> > +++ b/include/linux/slab_def.h
> > @@ -37,6 +37,7 @@ struct kmem_cache {
> >
> >         /* constructor func */
> >         void (*ctor)(void *obj);
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
> >
> >  /* 4) cache creation/removal */
> >         const char *name;
> > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > index d2153789bd9f..afb928cb7c20 100644
> > --- a/include/linux/slub_def.h
> > +++ b/include/linux/slub_def.h
> > @@ -99,6 +99,7 @@ struct kmem_cache {
> >         gfp_t allocflags;       /* gfp flags to use on each alloc */
> >         int refcount;           /* Refcount for slab cache destroy */
> >         void (*ctor)(void *);
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
> >         unsigned int inuse;             /* Offset to metadata */
> >         unsigned int align;             /* Alignment */
> >         unsigned int red_left_pad;      /* Left redzone padding size */
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index d7140447be75..be84f5f95c97 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> >                 arch_kexec_post_alloc_pages(page_address(pages), count,
> >                                             gfp_mask);
> >
> > -               if (gfp_mask & __GFP_ZERO)
> > +               if (want_init_memory(gfp_mask))
> >                         for (i = 0; i < count; i++)
> >                                 clear_highpage(pages + i);
> >         }
> > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > index 76a160083506..796e38160d39 100644
> > --- a/mm/dmapool.c
> > +++ b/mm/dmapool.c
> > @@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> >  #endif
> >         spin_unlock_irqrestore(&pool->lock, flags);
> >
> > -       if (mem_flags & __GFP_ZERO)
> > +       if (want_init_memory(mem_flags))
> >                 memset(retval, 0, pool->size);
> >
> >         return retval;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index d96ca5bc555b..e2a21d866ac9 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
> >
> >  int percpu_pagelist_fraction;
> >  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> > +bool want_init_allocations __read_mostly;
>
> This can be a stack variable in early_init_allocations() -- it's never
> used again.
Ack.
> > +EXPORT_SYMBOL(want_init_allocations);
> > +DEFINE_STATIC_KEY_FALSE(init_allocations);
> > +
> > +static int __init early_init_allocations(char *buf)
> > +{
> > +       int ret;
> > +
> > +       if (!buf)
> > +               return -EINVAL;
> > +       ret = kstrtobool(buf, &want_init_allocations);
> > +       if (want_init_allocations)
> > +               static_branch_enable(&init_allocations);
>
> With the CONFIG, this should have a _disable on an "else" here...
Ack.
> > +       return ret;
> > +}
> > +early_param("init_allocations", early_init_allocations);
>
> Does early_init_allocations() get called before things like
> prep_new_page() are up and running to call want_init_memory()?
Yes, IIUC early params are initialized before the memory subsystem is
initialized.

> >
> >  /*
> >   * A cached value of the page's pageblock's migratetype, used when the page is
> > @@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
> >
> >         post_alloc_hook(page, order, gfp_flags);
> >
> > -       if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
> > +       if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
> >                 for (i = 0; i < (1 << order); i++)
> >                         clear_highpage(page + i);
> >
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 47a380a486ee..dcc5b73cf767 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> >         local_irq_restore(save_flags);
> >         ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
> >
> > -       if (unlikely(flags & __GFP_ZERO) && ptr)
> > -               memset(ptr, 0, cachep->object_size);
> > +       if (unlikely(want_init_memory(flags)) && ptr)
> > +               cachep->poison_fn(cachep, ptr);
>
> So... this _must_ zero when __GFP_ZERO is present, so I'm not sure
> "poison_fn" is the right name, and it likely needs to take the "flags"
> argument.
I'll rework this part, as it had been pointed out that an empty
indirect call still costs something.
We're basically choosing between two options:
 - flags & __GFP_ZERO when initialization is disabled;
 - and cachep->ctr when it's enabled
A static branch should be enough to switch between those without
introducing extra checks or indirections.
>
> >
> >         slab_post_alloc_hook(cachep, flags, 1, &ptr);
> >         return ptr;
> > @@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
> >         objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
> >         prefetchw(objp);
> >
> > -       if (unlikely(flags & __GFP_ZERO) && objp)
> > -               memset(objp, 0, cachep->object_size);
> > +       if (unlikely(want_init_memory(flags)) && objp)
> > +               cachep->poison_fn(cachep, objp);
>
> Same.
>
> >
> >         slab_post_alloc_hook(cachep, flags, 1, &objp);
> >         return objp;
> > @@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >         cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
> >
> >         /* Clear memory outside IRQ disabled section */
> > -       if (unlikely(flags & __GFP_ZERO))
> > +       if (unlikely(want_init_memory(flags)))
> >                 for (i = 0; i < size; i++)
> > -                       memset(p[i], 0, s->object_size);
> > +                       s->poison_fn(s, p[i]);
>
> Same.
>
> >
> >         slab_post_alloc_hook(s, flags, size, p);
> >         /* FIXME: Trace call missing. Christoph would like a bulk variant */
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 43ac818b8592..3b541e8970ee 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -27,6 +27,7 @@ struct kmem_cache {
> >         const char *name;       /* Slab name for sysfs */
> >         int refcount;           /* Use counter */
> >         void (*ctor)(void *);   /* Called on object slot creation */
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
>
> How about naming it just "initialize"?
DItto.
> >         struct list_head list;  /* List of all slab caches on the system */
> >  };
> >
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 58251ba63e4a..37810114b2ea 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
> >         return NULL;
> >  }
> >
> > +static void poison_zero(struct kmem_cache *c, void *object)
> > +{
> > +       memset(object, 0, c->object_size);
> > +}
> > +
> > +static void poison_dont(struct kmem_cache *c, void *object)
> > +{
> > +       /* Do nothing. Use for caches with constructors. */
> > +}
> > +
> >  static struct kmem_cache *create_cache(const char *name,
> >                 unsigned int object_size, unsigned int align,
> >                 slab_flags_t flags, unsigned int useroffset,
> > @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
> >         s->size = s->object_size = object_size;
> >         s->align = align;
> >         s->ctor = ctor;
> > +       if (ctor)
> > +               s->poison_fn = poison_dont;
> > +       else
> > +               s->poison_fn = poison_zero;
>
> As mentioned, we must still always zero when __GFP_ZERO is present.
This part will be gone, but in general __GFP_ZERO is incompatible with
constructors, see a check in new_slab_objects().
I don't think we must support erroneous behaviour.

> >         s->useroffset = useroffset;
> >         s->usersize = usersize;
> >
> > @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> >         s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
> >         s->useroffset = useroffset;
> >         s->usersize = usersize;
> > +       s->poison_fn = poison_zero;
> >
> >         slab_init_memcg_params(s);
> >
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 307c2c9feb44..18981a71e962 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
> >                 BUG_ON(!b);
> >                 spin_unlock_irqrestore(&slob_lock, flags);
> >         }
> > -       if (unlikely(gfp & __GFP_ZERO))
> > +       if (unlikely(want_init_memory(gfp)))
> >                 memset(b, 0, size);
> >         return b;
> >  }
> > diff --git a/mm/slub.c b/mm/slub.c
> > index d30ede89f4a6..e4efb6575510 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
> >                 stat(s, ALLOC_FASTPATH);
> >         }
> >
> > -       if (unlikely(gfpflags & __GFP_ZERO) && object)
> > -               memset(object, 0, s->object_size);
> > +       if (unlikely(want_init_memory(gfpflags)) && object)
> > +               s->poison_fn(s, object);
> >
> >         slab_post_alloc_hook(s, gfpflags, 1, &object);
> >
> > @@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >         local_irq_enable();
> >
> >         /* Clear memory outside IRQ disabled fastpath loop */
> > -       if (unlikely(flags & __GFP_ZERO)) {
> > +       if (unlikely(want_init_memory(flags))) {
> >                 int j;
> >
> >                 for (j = 0; j < i; j++)
> > -                       memset(p[j], 0, s->object_size);
> > +                       s->poison_fn(s, p[j]);
> >         }
> >
> >         /* memcg and kmem_cache debug support */
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 782343bb925b..99b288a19b39 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
> >                 sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
> >                 if (!sk)
> >                         return sk;
> > -               if (priority & __GFP_ZERO)
> > +               if (want_init_memory(priority))
> >                         sk_prot_clear_nulls(sk, prot->obj_size);
> >         } else
> >                 sk = kmalloc(prot->obj_size, priority);
> > --
> > 2.21.0.392.gf8f6787159e-goog
> >
>
> Looking good. :) Thanks for working on this!
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] RFC: add init_allocations=1 boot option
  2019-04-23 18:49   ` Kees Cook
@ 2019-04-26 12:39     ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-26 12:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Tue, Apr 23, 2019 at 8:49 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > Following the recent discussions here's another take at initializing
> > pages and heap objects with zeroes. This is needed to prevent possible
> > information leaks and make the control-flow bugs that depend on
> > uninitialized values more deterministic.
> >
> > The patchset introduces a new boot option, init_allocations, which
> > makes page allocator and SL[AOU]B initialize newly allocated memory.
> > init_allocations=0 doesn't (hopefully) add any overhead to the
> > allocation fast path (no noticeable slowdown on hackbench).
>
> I continue to prefer to have a way to both at-allocation
> initialization _and_ poison-on-free, so let's not redirect this to
> doing it only at free time.
There's a problem with poison-on-free.
By default SLUB stores the freelist pointer (not sure if it's the only
piece of data) in the memory chunk itself, so newly allocated memory
is dirty despite it has been zeroed out previously.
We could probably zero out the bits used by the allocator when
allocating the memory chunk, but it sounds hacky (yet saves us 8 bytes
on every allocation)
A cleaner solution would be to unconditionally relocate the free
pointer by short-circuiting
https://elixir.bootlin.com/linux/latest/source/mm/slub.c#L3531
Surprisingly, this doesn't work, because now the sizes of slub caches
are a bit off, and create_unique_id() in slub.c returns clashing sysfs
names.

> We're going to need both hooks when doing
> Memory Tagging, so let's just get it in place now. The security
> benefits on tagging, IMO, easily justify a 1-2% performance hit. And
> likely we'll see this improve with new hardware.
>
> > With only the the first of the proposed patches the slowdown numbers are:
> >  - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
> >  - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
> >  - 9.4% (stdev 0.5%) sys time slowdown on hackbench
> >
> > The second patch introduces a GFP flag that allows to disable
> > initialization for certain allocations. The third page is an example of
> > applying it to af_unix.c, which helps hackbench greatly.
> >
> > Slowdown numbers for the whole patchset are:
> >  - 1.8% (stdev 0.8%) on kernel build
> >  - 6.5% (stdev 0.2%) on af_inet_loopback
>
> Any idea why thes two went _up_?
>
> >  - 0.12% (stdev 0.6%) on hackbench
>
> Well that's quite an improvement. :)
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] RFC: add init_allocations=1 boot option
@ 2019-04-26 12:39     ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-26 12:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Christoph Lameter, Dmitry Vyukov, Laura Abbott,
	Linux-MM, linux-security-module, Kernel Hardening

On Tue, Apr 23, 2019 at 8:49 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > Following the recent discussions here's another take at initializing
> > pages and heap objects with zeroes. This is needed to prevent possible
> > information leaks and make the control-flow bugs that depend on
> > uninitialized values more deterministic.
> >
> > The patchset introduces a new boot option, init_allocations, which
> > makes page allocator and SL[AOU]B initialize newly allocated memory.
> > init_allocations=0 doesn't (hopefully) add any overhead to the
> > allocation fast path (no noticeable slowdown on hackbench).
>
> I continue to prefer to have a way to both at-allocation
> initialization _and_ poison-on-free, so let's not redirect this to
> doing it only at free time.
There's a problem with poison-on-free.
By default SLUB stores the freelist pointer (not sure if it's the only
piece of data) in the memory chunk itself, so newly allocated memory
is dirty despite it has been zeroed out previously.
We could probably zero out the bits used by the allocator when
allocating the memory chunk, but it sounds hacky (yet saves us 8 bytes
on every allocation)
A cleaner solution would be to unconditionally relocate the free
pointer by short-circuiting
https://elixir.bootlin.com/linux/latest/source/mm/slub.c#L3531
Surprisingly, this doesn't work, because now the sizes of slub caches
are a bit off, and create_unique_id() in slub.c returns clashing sysfs
names.

> We're going to need both hooks when doing
> Memory Tagging, so let's just get it in place now. The security
> benefits on tagging, IMO, easily justify a 1-2% performance hit. And
> likely we'll see this improve with new hardware.
>
> > With only the the first of the proposed patches the slowdown numbers are:
> >  - 1.1% (stdev 0.2%) sys time slowdown building Linux kernel
> >  - 3.1% (stdev 0.3%) sys time slowdown on af_inet_loopback benchmark
> >  - 9.4% (stdev 0.5%) sys time slowdown on hackbench
> >
> > The second patch introduces a GFP flag that allows to disable
> > initialization for certain allocations. The third page is an example of
> > applying it to af_unix.c, which helps hackbench greatly.
> >
> > Slowdown numbers for the whole patchset are:
> >  - 1.8% (stdev 0.8%) on kernel build
> >  - 6.5% (stdev 0.2%) on af_inet_loopback
>
> Any idea why thes two went _up_?
>
> >  - 0.12% (stdev 0.6%) on hackbench
>
> Well that's quite an improvement. :)
>
> --
> Kees Cook



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-18 15:42   ` Alexander Potapenko
@ 2019-04-26 14:14     ` Christopher Lameter
  -1 siblings, 0 replies; 41+ messages in thread
From: Christopher Lameter @ 2019-04-26 14:14 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, dvyukov, keescook, labbott, linux-mm,
	linux-security-module, kernel-hardening

On Thu, 18 Apr 2019, Alexander Potapenko wrote:

> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
>
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.

Just check for a ctor and then zero or use whatever pattern ? Why add a
new function?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-26 14:14     ` Christopher Lameter
  0 siblings, 0 replies; 41+ messages in thread
From: Christopher Lameter @ 2019-04-26 14:14 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, dvyukov, keescook, labbott, linux-mm,
	linux-security-module, kernel-hardening

On Thu, 18 Apr 2019, Alexander Potapenko wrote:

> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
>
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.

Just check for a ctor and then zero or use whatever pattern ? Why add a
new function?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
       [not found]   ` <alpine.DEB.2.21.1904260911570.8340@nuc-kabylake>
@ 2019-04-26 15:24       ` Christopher Lameter
  0 siblings, 0 replies; 41+ messages in thread
From: Christopher Lameter @ 2019-04-26 15:24 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, dvyukov, keescook, labbott, linux-mm,
	linux-security-module, kernel-hardening

Hmmmm.... Maybe its better to zero on free? That way you dont need to
initialize the allocations. You could even check if someone mucked with
the object during allocation. This is a replication of some of the
inherent debugging facilities in the allocator though.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-26 15:24       ` Christopher Lameter
  0 siblings, 0 replies; 41+ messages in thread
From: Christopher Lameter @ 2019-04-26 15:24 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, dvyukov, keescook, labbott, linux-mm,
	linux-security-module, kernel-hardening

Hmmmm.... Maybe its better to zero on free? That way you dont need to
initialize the allocations. You could even check if someone mucked with
the object during allocation. This is a replication of some of the
inherent debugging facilities in the allocator though.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
  2019-04-26 15:24       ` Christopher Lameter
@ 2019-04-26 15:48         ` Alexander Potapenko
  -1 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-26 15:48 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Andrew Morton, Dmitriy Vyukov, Kees Cook, Laura Abbott,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, Apr 26, 2019 at 5:24 PM Christopher Lameter <cl@linux.com> wrote:
>
> Hmmmm.... Maybe its better to zero on free? That way you dont need to
> initialize the allocations. You could even check if someone mucked with
> the object during allocation.
As mentioned somewhere in the neighboring threads, Kees requested the
options to initialize on both allocation and free, because we'll need
this functionality for memory tagging someday.
> This is a replication of some of the
> inherent debugging facilities in the allocator though.
I'm curious, how often do people use SL[AU]B poisoning and redzones these days?
Guess those should be superseded by KASAN already?
I agree it makes sense to reuse the existing debugging code, but on
the other hand we probably want to keep the initialization fast enough
to be used in production.
>
Re: poison_fn, it wasn't a good idea, so I'll be dropping it.

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] mm: security: introduce the init_allocations=1 boot option
@ 2019-04-26 15:48         ` Alexander Potapenko
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Potapenko @ 2019-04-26 15:48 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Andrew Morton, Dmitriy Vyukov, Kees Cook, Laura Abbott,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening

On Fri, Apr 26, 2019 at 5:24 PM Christopher Lameter <cl@linux.com> wrote:
>
> Hmmmm.... Maybe its better to zero on free? That way you dont need to
> initialize the allocations. You could even check if someone mucked with
> the object during allocation.
As mentioned somewhere in the neighboring threads, Kees requested the
options to initialize on both allocation and free, because we'll need
this functionality for memory tagging someday.
> This is a replication of some of the
> inherent debugging facilities in the allocator though.
I'm curious, how often do people use SL[AU]B poisoning and redzones these days?
Guess those should be superseded by KASAN already?
I agree it makes sense to reuse the existing debugging code, but on
the other hand we probably want to keep the initialization fast enough
to be used in production.
>
Re: poison_fn, it wasn't a good idea, so I'll be dropping it.

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-04-26 15:48 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-18 15:42 [PATCH 0/3] RFC: add init_allocations=1 boot option Alexander Potapenko
2019-04-18 15:42 ` Alexander Potapenko
2019-04-18 15:42 ` [PATCH 1/3] mm: security: introduce the " Alexander Potapenko
2019-04-18 15:42   ` Alexander Potapenko
2019-04-18 16:35   ` Dave Hansen
2019-04-18 16:43     ` Alexander Potapenko
2019-04-18 16:43       ` Alexander Potapenko
2019-04-18 16:50       ` Alexander Potapenko
2019-04-18 16:50         ` Alexander Potapenko
2019-04-23  8:31     ` Michal Hocko
2019-04-18 22:08   ` Randy Dunlap
2019-04-23 19:00   ` Kees Cook
2019-04-23 19:00     ` Kees Cook
2019-04-26 12:12     ` Alexander Potapenko
2019-04-26 12:12       ` Alexander Potapenko
2019-04-23 20:36   ` Dave Hansen
2019-04-26 14:14   ` Christopher Lameter
2019-04-26 14:14     ` Christopher Lameter
     [not found]   ` <alpine.DEB.2.21.1904260911570.8340@nuc-kabylake>
2019-04-26 15:24     ` Christopher Lameter
2019-04-26 15:24       ` Christopher Lameter
2019-04-26 15:48       ` Alexander Potapenko
2019-04-26 15:48         ` Alexander Potapenko
2019-04-18 15:42 ` [PATCH 2/3] gfp: mm: introduce __GFP_NOINIT Alexander Potapenko
2019-04-18 15:42   ` Alexander Potapenko
2019-04-18 16:52   ` Dave Hansen
2019-04-23 19:14     ` Kees Cook
2019-04-23 19:14       ` Kees Cook
2019-04-23 20:40       ` Dave Hansen
2019-04-23 19:11   ` Kees Cook
2019-04-23 19:11     ` Kees Cook
2019-04-18 15:42 ` [PATCH 3/3] RFC: net: apply __GFP_NOINIT to AF_UNIX sk_buff allocations Alexander Potapenko
2019-04-18 15:42   ` Alexander Potapenko
2019-04-23 19:17   ` Kees Cook
2019-04-23 19:17     ` Kees Cook
2019-04-18 15:44 ` [PATCH 0/3] RFC: add init_allocations=1 boot option Alexander Potapenko
2019-04-18 15:44   ` Alexander Potapenko
2019-04-18 22:07 ` Randy Dunlap
2019-04-23 18:49 ` Kees Cook
2019-04-23 18:49   ` Kees Cook
2019-04-26 12:39   ` Alexander Potapenko
2019-04-26 12:39     ` Alexander Potapenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.