linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 06/38] usercopy: Prepare for usercopy whitelisting
       [not found] <1515636190-24061-1-git-send-email-keescook@chromium.org>
@ 2018-01-11  2:02 ` Kees Cook
  2018-01-11  2:02 ` [PATCH 07/38] usercopy: WARN() on slab cache usercopy region violations Kees Cook
  2018-01-11  2:02 ` [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches Kees Cook
  2 siblings, 0 replies; 28+ messages in thread
From: Kees Cook @ 2018-01-11  2:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Alexander Viro, Andy Lutomirski, Christoph Hellwig,
	Christoph Lameter, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christian Borntraeger,
	Christoffer Dall, Dave Kleikamp, Jan Kara, Luis de Bethencourt,
	Marc Zyngier, Rik van Riel, Matthew Garrett, linux-fsdevel,
	linux-arch, netdev, kernel-hardening

From: David Windsor <dave@nullcore.net>

This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: convert BUGs to WARNs and fail closed]
[kees: add attack surface reduction analysis to commit log]
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slab.h     | 27 +++++++++++++++++++++------
 include/linux/slab_def.h |  3 +++
 include/linux/slub_def.h |  3 +++
 mm/slab.c                |  2 +-
 mm/slab.h                |  5 ++++-
 mm/slab_common.c         | 46 ++++++++++++++++++++++++++++++++++++++--------
 mm/slub.c                | 11 +++++++++--
 7 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 2dbeccdcb76b..8bf14d9762ec 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -135,9 +135,13 @@ struct mem_cgroup;
 void __init kmem_cache_init(void);
 bool slab_is_available(void);
 
-struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
-			slab_flags_t,
-			void (*)(void *));
+struct kmem_cache *kmem_cache_create(const char *name, size_t size,
+			size_t align, slab_flags_t flags,
+			void (*ctor)(void *));
+struct kmem_cache *kmem_cache_create_usercopy(const char *name,
+			size_t size, size_t align, slab_flags_t flags,
+			size_t useroffset, size_t usersize,
+			void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 
@@ -153,9 +157,20 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
  * f.e. add ____cacheline_aligned_in_smp to the struct declaration
  * then the objects will be properly aligned in SMP configurations.
  */
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
-		sizeof(struct __struct), __alignof__(struct __struct),\
-		(__flags), NULL)
+#define KMEM_CACHE(__struct, __flags)					\
+		kmem_cache_create(#__struct, sizeof(struct __struct),	\
+			__alignof__(struct __struct), (__flags), NULL)
+
+/*
+ * To whitelist a single field for copying to/from usercopy, use this
+ * macro instead for KMEM_CACHE() above.
+ */
+#define KMEM_CACHE_USERCOPY(__struct, __flags, __field)			\
+		kmem_cache_create_usercopy(#__struct,			\
+			sizeof(struct __struct),			\
+			__alignof__(struct __struct), (__flags),	\
+			offsetof(struct __struct, __field),		\
+			sizeof_field(struct __struct, __field), NULL)
 
 /*
  * Common kmalloc functions provided by all allocators
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 072e46e9e1d5..7385547c04b1 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -85,6 +85,9 @@ struct kmem_cache {
 	unsigned int *random_seq;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 0adae162dc8f..8ad99c47b19c 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -135,6 +135,9 @@ struct kmem_cache {
 	struct kasan_cache kasan_info;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/mm/slab.c b/mm/slab.c
index b2beb2cc15e2..47acfe54e1ae 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 		offsetof(struct kmem_cache, node) +
 				  nr_node_ids * sizeof(struct kmem_cache_node *),
-				  SLAB_HWCACHE_ALIGN);
+				  SLAB_HWCACHE_ALIGN, 0, 0);
 	list_add(&kmem_cache->list, &slab_caches);
 	slab_state = PARTIAL;
 
diff --git a/mm/slab.h b/mm/slab.h
index 7d29e69ac310..1897991df3fa 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -22,6 +22,8 @@ struct kmem_cache {
 	unsigned int size;	/* The aligned/padded/added on size  */
 	unsigned int align;	/* Alignment as calculated */
 	slab_flags_t flags;	/* Active flags on the slab */
+	size_t useroffset;	/* Usercopy region offset */
+	size_t usersize;	/* Usercopy region size */
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
@@ -97,7 +99,8 @@ int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags);
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
 			slab_flags_t flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
-			size_t size, slab_flags_t flags);
+			size_t size, slab_flags_t flags, size_t useroffset,
+			size_t usersize);
 
 int slab_unmergeable(struct kmem_cache *s);
 struct kmem_cache *find_mergeable(size_t size, size_t align,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index c8cb36774ba1..fc3e66bdce75 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -281,6 +281,9 @@ int slab_unmergeable(struct kmem_cache *s)
 	if (s->ctor)
 		return 1;
 
+	if (s->usersize)
+		return 1;
+
 	/*
 	 * We may have set a slab to be unmergeable during bootstrap.
 	 */
@@ -366,12 +369,16 @@ unsigned long calculate_alignment(slab_flags_t flags,
 
 static struct kmem_cache *create_cache(const char *name,
 		size_t object_size, size_t size, size_t align,
-		slab_flags_t flags, void (*ctor)(void *),
+		slab_flags_t flags, size_t useroffset,
+		size_t usersize, void (*ctor)(void *),
 		struct mem_cgroup *memcg, struct kmem_cache *root_cache)
 {
 	struct kmem_cache *s;
 	int err;
 
+	if (WARN_ON(useroffset + usersize > object_size))
+		useroffset = usersize = 0;
+
 	err = -ENOMEM;
 	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
 	if (!s)
@@ -382,6 +389,8 @@ static struct kmem_cache *create_cache(const char *name,
 	s->size = size;
 	s->align = align;
 	s->ctor = ctor;
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	err = init_memcg_params(s, memcg, root_cache);
 	if (err)
@@ -406,11 +415,13 @@ static struct kmem_cache *create_cache(const char *name,
 }
 
 /*
- * kmem_cache_create - Create a cache.
+ * kmem_cache_create_usercopy - Create a cache.
  * @name: A string which is used in /proc/slabinfo to identify this cache.
  * @size: The size of objects to be created in this cache.
  * @align: The required alignment for the objects.
  * @flags: SLAB flags
+ * @useroffset: Usercopy region offset
+ * @usersize: Usercopy region size
  * @ctor: A constructor for the objects.
  *
  * Returns a ptr to the cache on success, NULL on failure.
@@ -430,8 +441,9 @@ static struct kmem_cache *create_cache(const char *name,
  * as davem.
  */
 struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
-		  slab_flags_t flags, void (*ctor)(void *))
+kmem_cache_create_usercopy(const char *name, size_t size, size_t align,
+		  slab_flags_t flags, size_t useroffset, size_t usersize,
+		  void (*ctor)(void *))
 {
 	struct kmem_cache *s = NULL;
 	const char *cache_name;
@@ -462,7 +474,13 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	 */
 	flags &= CACHE_CREATE_MASK;
 
-	s = __kmem_cache_alias(name, size, align, flags, ctor);
+	/* Fail closed on bad usersize of useroffset values. */
+	if (WARN_ON(!usersize && useroffset) ||
+	    WARN_ON(size < usersize || size - usersize < useroffset))
+		usersize = useroffset = 0;
+
+	if (!usersize)
+		s = __kmem_cache_alias(name, size, align, flags, ctor);
 	if (s)
 		goto out_unlock;
 
@@ -474,7 +492,7 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 
 	s = create_cache(cache_name, size, size,
 			 calculate_alignment(flags, align, size),
-			 flags, ctor, NULL, NULL);
+			 flags, useroffset, usersize, ctor, NULL, NULL);
 	if (IS_ERR(s)) {
 		err = PTR_ERR(s);
 		kfree_const(cache_name);
@@ -500,6 +518,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	}
 	return s;
 }
+EXPORT_SYMBOL(kmem_cache_create_usercopy);
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+		slab_flags_t flags, void (*ctor)(void *))
+{
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+					  ctor);
+}
 EXPORT_SYMBOL(kmem_cache_create);
 
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
@@ -612,6 +639,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	s = create_cache(cache_name, root_cache->object_size,
 			 root_cache->size, root_cache->align,
 			 root_cache->flags & CACHE_CREATE_MASK,
+			 root_cache->useroffset, root_cache->usersize,
 			 root_cache->ctor, memcg, root_cache);
 	/*
 	 * If we could not create a memcg cache, do not complain, because
@@ -879,13 +907,15 @@ bool slab_is_available(void)
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
-		slab_flags_t flags)
+		slab_flags_t flags, size_t useroffset, size_t usersize)
 {
 	int err;
 
 	s->name = name;
 	s->size = s->object_size = size;
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	slab_init_memcg_params(s);
 
@@ -906,7 +936,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags);
+	create_boot_cache(s, name, size, flags, 0, size);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
diff --git a/mm/slub.c b/mm/slub.c
index bcd22332300a..f40a57164dd6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4183,7 +4183,7 @@ void __init kmem_cache_init(void)
 	kmem_cache = &boot_kmem_cache;
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
-		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN);
+		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
 
 	register_hotmemory_notifier(&slab_memory_callback_nb);
 
@@ -4193,7 +4193,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 			offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *),
-		       SLAB_HWCACHE_ALIGN);
+		       SLAB_HWCACHE_ALIGN, 0, 0);
 
 	kmem_cache = bootstrap(&boot_kmem_cache);
 
@@ -5063,6 +5063,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+static ssize_t usersize_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%zu\n", s->usersize);
+}
+SLAB_ATTR_RO(usersize);
+
 static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
@@ -5437,6 +5443,7 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
+	&usersize_attr.attr,
 
 	NULL
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/38] usercopy: WARN() on slab cache usercopy region violations
       [not found] <1515636190-24061-1-git-send-email-keescook@chromium.org>
  2018-01-11  2:02 ` [PATCH 06/38] usercopy: Prepare for usercopy whitelisting Kees Cook
@ 2018-01-11  2:02 ` Kees Cook
  2018-01-11  2:02 ` [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches Kees Cook
  2 siblings, 0 replies; 28+ messages in thread
From: Kees Cook @ 2018-01-11  2:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Laura Abbott, Ingo Molnar,
	Mark Rutland, linux-mm, linux-xfs, Linus Torvalds, David Windsor,
	Alexander Viro, Andy Lutomirski, Christoph Hellwig,
	David S. Miller, Martin K. Petersen, Paolo Bonzini,
	Christian Borntraeger, Christoffer Dall, Dave Kleikamp, Jan Kara,
	Luis de Bethencourt, Marc Zyngier, Rik van Riel, Matthew Garrett,
	linux-fsdevel, linux-arch, netdev, kernel-hardening

This patch adds checking of usercopy cache whitelisting, and is modified
from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the
last public patch of grsecurity/PaX based on my understanding of the
code. Changes or omissions from the original code are mine and don't
reflect the original grsecurity/PaX code.

The SLAB and SLUB allocators are modified to WARN() on all copy operations
in which the kernel heap memory being modified falls outside of the cache's
defined usercopy region.

Based on an earlier patch from David Windsor.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c     | 22 +++++++++++++++++++---
 mm/slab.h     |  2 ++
 mm/slub.c     | 23 +++++++++++++++++++----
 mm/usercopy.c | 21 ++++++++++++++++++---
 4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 47acfe54e1ae..1c02f6e94235 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4392,7 +4392,9 @@ module_init(slab_proc_init);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -4412,10 +4414,24 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
 	/* Find offset within object. */
 	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= cachep->object_size && n <= cachep->object_size - offset)
+	/* Allow address range falling entirely within usercopy region. */
+	if (offset >= cachep->useroffset &&
+	    offset - cachep->useroffset <= cachep->usersize &&
+	    n <= cachep->useroffset - offset + cachep->usersize)
 		return;
 
+	/*
+	 * If the copy is still within the allocated object, produce
+	 * a warning instead of rejecting the copy. This is intended
+	 * to be a temporary method to find any missing usercopy
+	 * whitelists.
+	 */
+	if (offset <= cachep->object_size &&
+	    n <= cachep->object_size - offset) {
+		usercopy_warn("SLAB object", cachep->name, to_user, offset, n);
+		return;
+	}
+
 	usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
diff --git a/mm/slab.h b/mm/slab.h
index 1897991df3fa..b476c435680f 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -530,6 +530,8 @@ static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
 #ifdef CONFIG_HARDENED_USERCOPY
+void usercopy_warn(const char *name, const char *detail, bool to_user,
+		   unsigned long offset, unsigned long len);
 void __noreturn usercopy_abort(const char *name, const char *detail,
 			       bool to_user, unsigned long offset,
 			       unsigned long len);
diff --git a/mm/slub.c b/mm/slub.c
index f40a57164dd6..6d9b1e7d3226 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3813,7 +3813,9 @@ EXPORT_SYMBOL(__kmalloc_node);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -3827,7 +3829,6 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
 
 	/* Find object and usable object size. */
 	s = page->slab_cache;
-	object_size = slab_ksize(s);
 
 	/* Reject impossible pointers. */
 	if (ptr < page_address(page))
@@ -3845,10 +3846,24 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
 		offset -= s->red_left_pad;
 	}
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= object_size && n <= object_size - offset)
+	/* Allow address range falling entirely within usercopy region. */
+	if (offset >= s->useroffset &&
+	    offset - s->useroffset <= s->usersize &&
+	    n <= s->useroffset - offset + s->usersize)
 		return;
 
+	/*
+	 * If the copy is still within the allocated object, produce
+	 * a warning instead of rejecting the copy. This is intended
+	 * to be a temporary method to find any missing usercopy
+	 * whitelists.
+	 */
+	object_size = slab_ksize(s);
+	if (offset <= object_size && n <= object_size - offset) {
+		usercopy_warn("SLUB object", s->name, to_user, offset, n);
+		return;
+	}
+
 	usercopy_abort("SLUB object", s->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a562dd094ace..e9e9325f7638 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -59,13 +59,28 @@ static noinline int check_stack_object(const void *obj, unsigned long len)
 }
 
 /*
- * If this function is reached, then CONFIG_HARDENED_USERCOPY has found an
- * unexpected state during a copy_from_user() or copy_to_user() call.
+ * If these functions are reached, then CONFIG_HARDENED_USERCOPY has found
+ * an unexpected state during a copy_from_user() or copy_to_user() call.
  * There are several checks being performed on the buffer by the
  * __check_object_size() function. Normal stack buffer usage should never
  * trip the checks, and kernel text addressing will always trip the check.
- * For cache objects, copies must be within the object size.
+ * For cache objects, it is checking that only the whitelisted range of
+ * bytes for a given cache is being accessed (via the cache's usersize and
+ * useroffset fields). To adjust a cache whitelist, use the usercopy-aware
+ * kmem_cache_create_usercopy() function to create the cache (and
+ * carefully audit the whitelist range).
  */
+void usercopy_warn(const char *name, const char *detail, bool to_user,
+		   unsigned long offset, unsigned long len)
+{
+	WARN_ONCE(1, "Bad or missing usercopy whitelist? Kernel memory %s attempt detected %s %s%s%s%s (offset %lu, size %lu)!\n",
+		 to_user ? "exposure" : "overwrite",
+		 to_user ? "from" : "to",
+		 name ? : "unknown?!",
+		 detail ? " '" : "", detail ? : "", detail ? "'" : "",
+		 offset, len);
+}
+
 void __noreturn usercopy_abort(const char *name, const char *detail,
 			       bool to_user, unsigned long offset,
 			       unsigned long len)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
       [not found] <1515636190-24061-1-git-send-email-keescook@chromium.org>
  2018-01-11  2:02 ` [PATCH 06/38] usercopy: Prepare for usercopy whitelisting Kees Cook
  2018-01-11  2:02 ` [PATCH 07/38] usercopy: WARN() on slab cache usercopy region violations Kees Cook
@ 2018-01-11  2:02 ` Kees Cook
  2019-11-12  7:17   ` [kernel-hardening] " Jiri Slaby
  2 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2018-01-11  2:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Alexander Viro, Andy Lutomirski, Christoph Hellwig,
	Christoph Lameter, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christian Borntraeger,
	Christoffer Dall, Dave Kleikamp, Jan Kara, Luis de Bethencourt,
	Marc Zyngier, Rik van Riel, Matthew Garrett, linux-fsdevel,
	linux-arch, netdev, kernel-hardening

From: David Windsor <dave@nullcore.net>

Mark the kmalloc slab caches as entirely whitelisted. These caches
are frequently used to fulfill kernel allocations that contain data
to be copied to/from userspace. Internal-only uses are also common,
but are scattered in the kernel. For now, mark all the kmalloc caches
as whitelisted.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: merged in moved kmalloc hunks, adjust commit log]
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/slab.c        |  3 ++-
 mm/slab.h        |  3 ++-
 mm/slab_common.c | 10 ++++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index b9b0df620bb9..dd367fe17a4e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
 	 */
 	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
 				kmalloc_info[INDEX_NODE].name,
-				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
+				0, kmalloc_size(INDEX_NODE));
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index b476c435680f..98c5bcc45d8b 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -97,7 +97,8 @@ struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags);
 
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
-			slab_flags_t flags);
+			slab_flags_t flags, size_t useroffset,
+			size_t usersize);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			size_t size, slab_flags_t flags, size_t useroffset,
 			size_t usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a51f65408637..8ac2a6320a6c 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -937,14 +937,15 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t siz
 }
 
 struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
-				slab_flags_t flags)
+				slab_flags_t flags, size_t useroffset,
+				size_t usersize)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags, 0, size);
+	create_boot_cache(s, name, size, flags, useroffset, usersize);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
@@ -1098,7 +1099,8 @@ void __init setup_kmalloc_cache_index_table(void)
 static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
 {
 	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
-					kmalloc_info[idx].size, flags);
+					kmalloc_info[idx].size, flags, 0,
+					kmalloc_info[idx].size);
 }
 
 /*
@@ -1139,7 +1141,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 
 			BUG_ON(!n);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
-				size, SLAB_CACHE_DMA | flags);
+				size, SLAB_CACHE_DMA | flags, 0, 0);
 		}
 	}
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2018-01-11  2:02 ` [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches Kees Cook
@ 2019-11-12  7:17   ` Jiri Slaby
  2019-11-12 21:21     ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Slaby @ 2019-11-12  7:17 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Alexander Viro, Andy Lutomirski, Christoph Hellwig,
	Christoph Lameter, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christian Borntraeger,
	Christoffer Dall, Dave Kleikamp, Jan Kara, Luis de Bethencourt,
	Marc Zyngier, Rik van Riel, Matthew Garrett, linux-fsdevel,
	linux-arch, netdev, kernel-hardening, Vlastimil Babka,
	Michal Kubecek

On 11. 01. 18, 3:02, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> Mark the kmalloc slab caches as entirely whitelisted. These caches
> are frequently used to fulfill kernel allocations that contain data
> to be copied to/from userspace. Internal-only uses are also common,
> but are scattered in the kernel. For now, mark all the kmalloc caches
> as whitelisted.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: merged in moved kmalloc hunks, adjust commit log]
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Cc: linux-xfs@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Acked-by: Christoph Lameter <cl@linux.com>
> ---
>  mm/slab.c        |  3 ++-
>  mm/slab.h        |  3 ++-
>  mm/slab_common.c | 10 ++++++----
>  3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index b9b0df620bb9..dd367fe17a4e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
...
> @@ -1098,7 +1099,8 @@ void __init setup_kmalloc_cache_index_table(void)
>  static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
>  {
>  	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
> -					kmalloc_info[idx].size, flags);
> +					kmalloc_info[idx].size, flags, 0,
> +					kmalloc_info[idx].size);
>  }
>  
>  /*
> @@ -1139,7 +1141,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>  
>  			BUG_ON(!n);
>  			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
> -				size, SLAB_CACHE_DMA | flags);
> +				size, SLAB_CACHE_DMA | flags, 0, 0);

Hi,

was there any (undocumented) reason NOT to mark DMA caches as usercopy?

We are seeing this on s390x:

> usercopy: Kernel memory overwrite attempt detected to SLUB object
'dma-kmalloc-1k' (offset 0, size 11)!
> ------------[ cut here ]------------
> kernel BUG at mm/usercopy.c:99!

See:
https://bugzilla.suse.com/show_bug.cgi?id=1156053

This indeed fixes it:
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1290,7 +1290,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
                        kmalloc_caches[KMALLOC_DMA][i] =
create_kmalloc_cache(
                                kmalloc_info[i].name[KMALLOC_DMA],
                                kmalloc_info[i].size,
-                               SLAB_CACHE_DMA | flags, 0, 0);
+                               SLAB_CACHE_DMA | flags, 0,
+                               kmalloc_info[i].size);
                }
        }
 #endif


thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2019-11-12  7:17   ` [kernel-hardening] " Jiri Slaby
@ 2019-11-12 21:21     ` Kees Cook
  2019-11-14 21:27       ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2019-11-12 21:21 UTC (permalink / raw)
  To: Jiri Slaby, Alexander Viro
  Cc: linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, Christoph Hellwig, Christoph Lameter,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christian Borntraeger, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek

On Tue, Nov 12, 2019 at 08:17:57AM +0100, Jiri Slaby wrote:
> On 11. 01. 18, 3:02, Kees Cook wrote:
> > From: David Windsor <dave@nullcore.net>
> > 
> > Mark the kmalloc slab caches as entirely whitelisted. These caches
> > are frequently used to fulfill kernel allocations that contain data
> > to be copied to/from userspace. Internal-only uses are also common,
> > but are scattered in the kernel. For now, mark all the kmalloc caches
> > as whitelisted.
> > 
> > This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> > whitelisting code in the last public patch of grsecurity/PaX based on my
> > understanding of the code. Changes or omissions from the original code are
> > mine and don't reflect the original grsecurity/PaX code.
> > 
> > Signed-off-by: David Windsor <dave@nullcore.net>
> > [kees: merged in moved kmalloc hunks, adjust commit log]
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: linux-mm@kvack.org
> > Cc: linux-xfs@vger.kernel.org
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > Acked-by: Christoph Lameter <cl@linux.com>
> > ---
> >  mm/slab.c        |  3 ++-
> >  mm/slab.h        |  3 ++-
> >  mm/slab_common.c | 10 ++++++----
> >  3 files changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/mm/slab.c b/mm/slab.c
> > index b9b0df620bb9..dd367fe17a4e 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> ...
> > @@ -1098,7 +1099,8 @@ void __init setup_kmalloc_cache_index_table(void)
> >  static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
> >  {
> >  	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
> > -					kmalloc_info[idx].size, flags);
> > +					kmalloc_info[idx].size, flags, 0,
> > +					kmalloc_info[idx].size);
> >  }
> >  
> >  /*
> > @@ -1139,7 +1141,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
> >  
> >  			BUG_ON(!n);
> >  			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
> > -				size, SLAB_CACHE_DMA | flags);
> > +				size, SLAB_CACHE_DMA | flags, 0, 0);
> 
> Hi,
> 
> was there any (undocumented) reason NOT to mark DMA caches as usercopy?
> 
> We are seeing this on s390x:
> 
> > usercopy: Kernel memory overwrite attempt detected to SLUB object
> 'dma-kmalloc-1k' (offset 0, size 11)!
> > ------------[ cut here ]------------
> > kernel BUG at mm/usercopy.c:99!

Interesting! I believe the rationale was that if the region is used for
DMA, allowing direct access to it from userspace could be prone to
races.

> See:
> https://bugzilla.suse.com/show_bug.cgi?id=1156053

For context from the bug, the trace is:

(<0000000000386c5a> usercopy_abort+0xa2/0xa8) 
 <000000000036097a> __check_heap_object+0x11a/0x120  
 <0000000000386b3a> __check_object_size+0x18a/0x208  
 <000000000079b4ba> skb_copy_datagram_from_iter+0x62/0x240  
 <000003ff804edd5c> iucv_sock_sendmsg+0x1fc/0x858 Ýaf_iucv¨  
 <0000000000785894> sock_sendmsg+0x54/0x90  
 <0000000000785944> sock_write_iter+0x74/0xa0  
 <000000000038a3f0> new_sync_write+0x110/0x180  
 <000000000038d42e> vfs_write+0xa6/0x1d0  
 <000000000038d748> ksys_write+0x60/0xe8  
 <000000000096a660> system_call+0xdc/0x2e0  

I know Al worked on fixing up usercopy checking for iters. I wonder if
there is redundant checking happening here? i.e. haven't iters already
done object size verifications, so they're not needed during iter copy
helpers?

> This indeed fixes it:
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1290,7 +1290,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>                         kmalloc_caches[KMALLOC_DMA][i] =
> create_kmalloc_cache(
>                                 kmalloc_info[i].name[KMALLOC_DMA],
>                                 kmalloc_info[i].size,
> -                               SLAB_CACHE_DMA | flags, 0, 0);
> +                               SLAB_CACHE_DMA | flags, 0,
> +                               kmalloc_info[i].size);
>                 }
>         }
>  #endif

How is iucv the only network protocol that has run into this? Do others
use a bounce buffer?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2019-11-12 21:21     ` Kees Cook
@ 2019-11-14 21:27       ` Kees Cook
  2020-01-23  8:14         ` Jiri Slaby
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2019-11-14 21:27 UTC (permalink / raw)
  To: Jiri Slaby, Alexander Viro
  Cc: linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, Christoph Hellwig, Christoph Lameter,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christian Borntraeger, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek

On Tue, Nov 12, 2019 at 01:21:54PM -0800, Kees Cook wrote:
> How is iucv the only network protocol that has run into this? Do others
> use a bounce buffer?

Another solution would be to use a dedicated kmem cache (instead of the
shared kmalloc dma one)?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2019-11-14 21:27       ` Kees Cook
@ 2020-01-23  8:14         ` Jiri Slaby
  2020-01-27 23:19           ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Slaby @ 2020-01-23  8:14 UTC (permalink / raw)
  To: Kees Cook, Alexander Viro
  Cc: linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, Christoph Hellwig, Christoph Lameter,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christian Borntraeger, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek

On 14. 11. 19, 22:27, Kees Cook wrote:
> On Tue, Nov 12, 2019 at 01:21:54PM -0800, Kees Cook wrote:
>> How is iucv the only network protocol that has run into this? Do others
>> use a bounce buffer?
> 
> Another solution would be to use a dedicated kmem cache (instead of the
> shared kmalloc dma one)?

Has there been any conclusion to this thread yet? For the time being, we
disabled HARDENED_USERCOPY on s390...

https://lore.kernel.org/kernel-hardening/9519edb7-456a-a2fa-659e-3e5a1ff89466@suse.cz/

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-23  8:14         ` Jiri Slaby
@ 2020-01-27 23:19           ` Kees Cook
  2020-01-28  7:58             ` Christian Borntraeger
  0 siblings, 1 reply; 28+ messages in thread
From: Kees Cook @ 2020-01-27 23:19 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Alexander Viro, linux-kernel, David Windsor, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs,
	Linus Torvalds, Andy Lutomirski, Christoph Hellwig,
	Christoph Lameter, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christian Borntraeger,
	Christoffer Dall, Dave Kleikamp, Jan Kara, Luis de Bethencourt,
	Marc Zyngier, Rik van Riel, Matthew Garrett, linux-fsdevel,
	linux-arch, netdev, kernel-hardening, Vlastimil Babka,
	Michal Kubecek

On Thu, Jan 23, 2020 at 09:14:20AM +0100, Jiri Slaby wrote:
> On 14. 11. 19, 22:27, Kees Cook wrote:
> > On Tue, Nov 12, 2019 at 01:21:54PM -0800, Kees Cook wrote:
> >> How is iucv the only network protocol that has run into this? Do others
> >> use a bounce buffer?
> > 
> > Another solution would be to use a dedicated kmem cache (instead of the
> > shared kmalloc dma one)?
> 
> Has there been any conclusion to this thread yet? For the time being, we
> disabled HARDENED_USERCOPY on s390...
> 
> https://lore.kernel.org/kernel-hardening/9519edb7-456a-a2fa-659e-3e5a1ff89466@suse.cz/

I haven't heard anything new. What did people think of a separate kmem
cache?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-27 23:19           ` Kees Cook
@ 2020-01-28  7:58             ` Christian Borntraeger
  2020-01-28 23:01               ` Kees Cook
  0 siblings, 1 reply; 28+ messages in thread
From: Christian Borntraeger @ 2020-01-28  7:58 UTC (permalink / raw)
  To: Kees Cook, Jiri Slaby, Julian Wiedmann, Ursula Braun
  Cc: Alexander Viro, linux-kernel, David Windsor, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs,
	Linus Torvalds, Andy Lutomirski, Christoph Hellwig,
	Christoph Lameter, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek



On 28.01.20 00:19, Kees Cook wrote:
> On Thu, Jan 23, 2020 at 09:14:20AM +0100, Jiri Slaby wrote:
>> On 14. 11. 19, 22:27, Kees Cook wrote:
>>> On Tue, Nov 12, 2019 at 01:21:54PM -0800, Kees Cook wrote:
>>>> How is iucv the only network protocol that has run into this? Do others
>>>> use a bounce buffer?
>>>
>>> Another solution would be to use a dedicated kmem cache (instead of the
>>> shared kmalloc dma one)?
>>
>> Has there been any conclusion to this thread yet? For the time being, we
>> disabled HARDENED_USERCOPY on s390...
>>
>> https://lore.kernel.org/kernel-hardening/9519edb7-456a-a2fa-659e-3e5a1ff89466@suse.cz/
> 
> I haven't heard anything new. What did people think of a separate kmem
> cache?
> 

Adding Julian and Ursula. A separate kmem cache for iucv might be indeed
a solution for the user hardening issue.
On the other hand not marking the DMA caches still seems questionable.

For reference
https://bugzilla.suse.com/show_bug.cgi?id=1156053
the kernel hardening now triggers a warning.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-28  7:58             ` Christian Borntraeger
@ 2020-01-28 23:01               ` Kees Cook
  2020-01-29  9:26                 ` Ursula Braun
  2020-01-29 16:43                 ` Christopher Lameter
  0 siblings, 2 replies; 28+ messages in thread
From: Kees Cook @ 2020-01-28 23:01 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jiri Slaby, Julian Wiedmann, Ursula Braun, Alexander Viro,
	linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, Christoph Hellwig, Christoph Lameter,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christoffer Dall, Dave Kleikamp, Jan Kara,
	Luis de Bethencourt, Marc Zyngier, Rik van Riel, Matthew Garrett,
	linux-fsdevel, linux-arch, netdev, kernel-hardening,
	Vlastimil Babka, Michal Kubecek

On Tue, Jan 28, 2020 at 08:58:31AM +0100, Christian Borntraeger wrote:
> 
> 
> On 28.01.20 00:19, Kees Cook wrote:
> > On Thu, Jan 23, 2020 at 09:14:20AM +0100, Jiri Slaby wrote:
> >> On 14. 11. 19, 22:27, Kees Cook wrote:
> >>> On Tue, Nov 12, 2019 at 01:21:54PM -0800, Kees Cook wrote:
> >>>> How is iucv the only network protocol that has run into this? Do others
> >>>> use a bounce buffer?
> >>>
> >>> Another solution would be to use a dedicated kmem cache (instead of the
> >>> shared kmalloc dma one)?
> >>
> >> Has there been any conclusion to this thread yet? For the time being, we
> >> disabled HARDENED_USERCOPY on s390...
> >>
> >> https://lore.kernel.org/kernel-hardening/9519edb7-456a-a2fa-659e-3e5a1ff89466@suse.cz/
> > 
> > I haven't heard anything new. What did people think of a separate kmem
> > cache?
> > 
> 
> Adding Julian and Ursula. A separate kmem cache for iucv might be indeed
> a solution for the user hardening issue.

It should be very clean -- any existing kmallocs already have to be
"special" in the sense that they're marked with the DMA flag. So
converting these to a separate cache should be mostly mechanical.

> On the other hand not marking the DMA caches still seems questionable.

My understanding is that exposing DMA memory to userspace copies can
lead to unexpected results, especially for misbehaving hardware, so I'm
not convinced this is a generically bad hardening choice.

-Kees

> 
> For reference
> https://bugzilla.suse.com/show_bug.cgi?id=1156053
> the kernel hardening now triggers a warning.
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-28 23:01               ` Kees Cook
@ 2020-01-29  9:26                 ` Ursula Braun
  2020-01-29 16:43                 ` Christopher Lameter
  1 sibling, 0 replies; 28+ messages in thread
From: Ursula Braun @ 2020-01-29  9:26 UTC (permalink / raw)
  To: Kees Cook, Christian Borntraeger
  Cc: Jiri Slaby, Julian Wiedmann, Alexander Viro, linux-kernel,
	David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, Christoph Hellwig, Christoph Lameter,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christoffer Dall, Dave Kleikamp, Jan Kara,
	Luis de Bethencourt, Marc Zyngier, Rik van Riel, Matthew Garrett,
	linux-fsdevel, linux-arch, netdev, kernel-hardening,
	Vlastimil Babka, Michal Kubecek



On 1/29/20 12:01 AM, Kees Cook wrote:
> On Tue, Jan 28, 2020 at 08:58:31AM +0100, Christian Borntraeger wrote:
>>
>>
>> On 28.01.20 00:19, Kees Cook wrote:
>>> On Thu, Jan 23, 2020 at 09:14:20AM +0100, Jiri Slaby wrote:
>>>> On 14. 11. 19, 22:27, Kees Cook wrote:
>>>>> On Tue, Nov 12, 2019 at 01:21:54PM -0800, Kees Cook wrote:
>>>>>> How is iucv the only network protocol that has run into this? Do others
>>>>>> use a bounce buffer?
>>>>>
>>>>> Another solution would be to use a dedicated kmem cache (instead of the
>>>>> shared kmalloc dma one)?
>>>>
>>>> Has there been any conclusion to this thread yet? For the time being, we
>>>> disabled HARDENED_USERCOPY on s390...
>>>>
>>>> https://lore.kernel.org/kernel-hardening/9519edb7-456a-a2fa-659e-3e5a1ff89466@suse.cz/
>>>
>>> I haven't heard anything new. What did people think of a separate kmem
>>> cache?
>>>
>>
>> Adding Julian and Ursula. A separate kmem cache for iucv might be indeed
>> a solution for the user hardening issue.
> 
> It should be very clean -- any existing kmallocs already have to be
> "special" in the sense that they're marked with the DMA flag. So
> converting these to a separate cache should be mostly mechanical.
> 

Linux on System z can run within a guest hosted by the IBM mainframe operating system
z/VM. z/VM offers a transport called Inter-User Communications Vehicle (short IUCV).
It is limited to 4-byte-addresses when sending and receiving data.
One base transport for AF_IUCV sockets in the Linux kernel is this Inter-User
Communications Vehicle of z/VM. AF_IUCV sockets exist for s390 only. 

AF_IUCV sockets make use of the base socket layer, and work with sk_buffs for sending
and receiving data of variable length.
Storage for sk_buffs is allocated with __alloc_skb(), which invokes
   data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
For IUCV transport the "data"-address should fit into 4 bytes. That's the reason why
we work with GFP_DMA here.

kmem_caches manage memory of fixed size. This does not fit well for sk_buff memory
of variable length. Do you propose to add a kmem_cache solution for sk_buff memory here?

>> On the other hand not marking the DMA caches still seems questionable.
> 
> My understanding is that exposing DMA memory to userspace copies can
> lead to unexpected results, especially for misbehaving hardware, so I'm
> not convinced this is a generically bad hardening choice.
> 

We have not yet been reported a memory problem here. Do you have more details, if
this is really a problem for the s390 architecture?

Kind regards, Ursula 

> -Kees
> 
>>
>> For reference
>> https://bugzilla.suse.com/show_bug.cgi?id=1156053
>> the kernel hardening now triggers a warning.
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-28 23:01               ` Kees Cook
  2020-01-29  9:26                 ` Ursula Braun
@ 2020-01-29 16:43                 ` Christopher Lameter
  2020-01-29 17:07                   ` Christian Borntraeger
  1 sibling, 1 reply; 28+ messages in thread
From: Christopher Lameter @ 2020-01-29 16:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christian Borntraeger, Jiri Slaby, Julian Wiedmann, Ursula Braun,
	Alexander Viro, linux-kernel, David Windsor, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs,
	Linus Torvalds, Andy Lutomirski, Christoph Hellwig,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christoffer Dall, Dave Kleikamp, Jan Kara,
	Luis de Bethencourt, Marc Zyngier, Rik van Riel, Matthew Garrett,
	linux-fsdevel, linux-arch, netdev, kernel-hardening,
	Vlastimil Babka, Michal Kubecek

On Tue, 28 Jan 2020, Kees Cook wrote:

> > On the other hand not marking the DMA caches still seems questionable.
>
> My understanding is that exposing DMA memory to userspace copies can
> lead to unexpected results, especially for misbehaving hardware, so I'm
> not convinced this is a generically bad hardening choice.

"DMA" memory (and thus DMA caches) have nothing to do with DMA. Its a
legacy term. "DMA Memory" is memory limited to a certain
physical address boundary (old restrictions on certain devices only
supporting a limited number of address bits).

DMA can be done to NORMAL memory as well.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-29 16:43                 ` Christopher Lameter
@ 2020-01-29 17:07                   ` Christian Borntraeger
  2020-01-29 17:09                     ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Christian Borntraeger @ 2020-01-29 17:07 UTC (permalink / raw)
  To: Christopher Lameter, Kees Cook
  Cc: Jiri Slaby, Julian Wiedmann, Ursula Braun, Alexander Viro,
	linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, Christoph Hellwig, David S. Miller,
	Laura Abbott, Mark Rutland, Martin K. Petersen, Paolo Bonzini,
	Christoffer Dall, Dave Kleikamp, Jan Kara, Luis de Bethencourt,
	Marc Zyngier, Rik van Riel, Matthew Garrett, linux-fsdevel,
	linux-arch, netdev, kernel-hardening, Vlastimil Babka,
	Michal Kubecek



On 29.01.20 17:43, Christopher Lameter wrote:
> On Tue, 28 Jan 2020, Kees Cook wrote:
> 
>>> On the other hand not marking the DMA caches still seems questionable.
>>
>> My understanding is that exposing DMA memory to userspace copies can
>> lead to unexpected results, especially for misbehaving hardware, so I'm
>> not convinced this is a generically bad hardening choice.
> 
> "DMA" memory (and thus DMA caches) have nothing to do with DMA. Its a
> legacy term. "DMA Memory" is memory limited to a certain
> physical address boundary (old restrictions on certain devices only
> supporting a limited number of address bits).
> 
> DMA can be done to NORMAL memory as well.

Exactly. 
I think iucv uses GFP_DMA because z/VM needs those buffers to reside below 2GB (which is ZONA_DMA for s390).


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-29 17:07                   ` Christian Borntraeger
@ 2020-01-29 17:09                     ` Christoph Hellwig
  2020-01-29 17:19                       ` Christian Borntraeger
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2020-01-29 17:09 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christopher Lameter, Kees Cook, Jiri Slaby, Julian Wiedmann,
	Ursula Braun, Alexander Viro, linux-kernel, David Windsor,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	linux-mm, linux-xfs, Linus Torvalds, Andy Lutomirski,
	Christoph Hellwig, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek

On Wed, Jan 29, 2020 at 06:07:14PM +0100, Christian Borntraeger wrote:
> > DMA can be done to NORMAL memory as well.
> 
> Exactly. 
> I think iucv uses GFP_DMA because z/VM needs those buffers to reside below 2GB (which is ZONA_DMA for s390).

The normal way to allocate memory with addressing limits would be to
use dma_alloc_coherent and friends.  Any chance to switch iucv over to
that?  Or is there no device associated with it?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-29 17:09                     ` Christoph Hellwig
@ 2020-01-29 17:19                       ` Christian Borntraeger
  2020-01-30 19:23                         ` Kees Cook
  2020-02-03 17:36                         ` Christoph Hellwig
  0 siblings, 2 replies; 28+ messages in thread
From: Christian Borntraeger @ 2020-01-29 17:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christopher Lameter, Kees Cook, Jiri Slaby, Julian Wiedmann,
	Ursula Braun, Alexander Viro, linux-kernel, David Windsor,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	linux-mm, linux-xfs, Linus Torvalds, Andy Lutomirski,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christoffer Dall, Dave Kleikamp, Jan Kara,
	Luis de Bethencourt, Marc Zyngier, Rik van Riel, Matthew Garrett,
	linux-fsdevel, linux-arch, netdev, kernel-hardening,
	Vlastimil Babka, Michal Kubecek



On 29.01.20 18:09, Christoph Hellwig wrote:
> On Wed, Jan 29, 2020 at 06:07:14PM +0100, Christian Borntraeger wrote:
>>> DMA can be done to NORMAL memory as well.
>>
>> Exactly. 
>> I think iucv uses GFP_DMA because z/VM needs those buffers to reside below 2GB (which is ZONA_DMA for s390).
> 
> The normal way to allocate memory with addressing limits would be to
> use dma_alloc_coherent and friends.  Any chance to switch iucv over to
> that?  Or is there no device associated with it?

There is not necessarily a device for that. It is a hypervisor interface (an
instruction that is interpreted by z/VM). We do have the netiucv driver that
creates a virtual nic, but there is also AF_IUCV which works without a device.

But back to the original question: If we mark kmalloc caches as usercopy caches,
we should do the same for DMA kmalloc caches. As outlined by Christoph, this has
nothing to do with device DMA.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-29 17:19                       ` Christian Borntraeger
@ 2020-01-30 19:23                         ` Kees Cook
  2020-01-31 12:03                           ` Jann Horn
  2020-02-03 17:38                           ` Christoph Hellwig
  2020-02-03 17:36                         ` Christoph Hellwig
  1 sibling, 2 replies; 28+ messages in thread
From: Kees Cook @ 2020-01-30 19:23 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Christopher Lameter, Jiri Slaby,
	Julian Wiedmann, Ursula Braun, Alexander Viro, linux-kernel,
	David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek

On Wed, Jan 29, 2020 at 06:19:56PM +0100, Christian Borntraeger wrote:
> On 29.01.20 18:09, Christoph Hellwig wrote:
> > On Wed, Jan 29, 2020 at 06:07:14PM +0100, Christian Borntraeger wrote:
> >>> DMA can be done to NORMAL memory as well.
> >>
> >> Exactly. 
> >> I think iucv uses GFP_DMA because z/VM needs those buffers to reside below 2GB (which is ZONA_DMA for s390).
> > 
> > The normal way to allocate memory with addressing limits would be to
> > use dma_alloc_coherent and friends.  Any chance to switch iucv over to
> > that?  Or is there no device associated with it?
> 
> There is not necessarily a device for that. It is a hypervisor interface (an
> instruction that is interpreted by z/VM). We do have the netiucv driver that
> creates a virtual nic, but there is also AF_IUCV which works without a device.
> 
> But back to the original question: If we mark kmalloc caches as usercopy caches,
> we should do the same for DMA kmalloc caches. As outlined by Christoph, this has
> nothing to do with device DMA.

Hm, looks like it's allocated from the low 16MB. Seems like poor naming!
:) There seems to be a LOT of stuff using GFP_DMA, and it seems unlikely
those are all expecting low addresses?

Since this has only been a problem on s390, should just s390 gain the
weakening of the usercopy restriction?  Something like:


diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1907cb2903c7..c5bbc141f20b 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1303,7 +1303,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
 				kmalloc_info[i].name[KMALLOC_DMA],
 				kmalloc_info[i].size,
-				SLAB_CACHE_DMA | flags, 0, 0);
+				SLAB_CACHE_DMA | flags, 0,
+				IS_ENABLED(CONFIG_S390) ?
+					kmalloc_info[i].size : 0);
 		}
 	}
 #endif



-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-30 19:23                         ` Kees Cook
@ 2020-01-31 12:03                           ` Jann Horn
  2020-02-01 17:56                             ` Kees Cook
  2020-04-07  8:00                             ` Vlastimil Babka
  2020-02-03 17:38                           ` Christoph Hellwig
  1 sibling, 2 replies; 28+ messages in thread
From: Jann Horn @ 2020-01-31 12:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christian Borntraeger, Christoph Hellwig, Christopher Lameter,
	Jiri Slaby, Julian Wiedmann, Ursula Braun, Alexander Viro,
	kernel list, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Linux-MM, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch,
	Network Development, Kernel Hardening, Vlastimil Babka,
	Michal Kubecek

On Thu, Jan 30, 2020 at 8:23 PM Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 29, 2020 at 06:19:56PM +0100, Christian Borntraeger wrote:
> > On 29.01.20 18:09, Christoph Hellwig wrote:
> > > On Wed, Jan 29, 2020 at 06:07:14PM +0100, Christian Borntraeger wrote:
> > >>> DMA can be done to NORMAL memory as well.
> > >>
> > >> Exactly.
> > >> I think iucv uses GFP_DMA because z/VM needs those buffers to reside below 2GB (which is ZONA_DMA for s390).
> > >
> > > The normal way to allocate memory with addressing limits would be to
> > > use dma_alloc_coherent and friends.  Any chance to switch iucv over to
> > > that?  Or is there no device associated with it?
> >
> > There is not necessarily a device for that. It is a hypervisor interface (an
> > instruction that is interpreted by z/VM). We do have the netiucv driver that
> > creates a virtual nic, but there is also AF_IUCV which works without a device.
> >
> > But back to the original question: If we mark kmalloc caches as usercopy caches,
> > we should do the same for DMA kmalloc caches. As outlined by Christoph, this has
> > nothing to do with device DMA.
>
> Hm, looks like it's allocated from the low 16MB. Seems like poor naming!

The physical address limit of the DMA zone depends on the architecture
(and the kernel version); e.g. on Linux 4.4 on arm64 (which is used on
the Pixel 2), the DMA zone goes up to 4GiB. Later, arm64 started using
the DMA32 zone for that instead (as was already the case on e.g.
X86-64); but recently (commit 1a8e1cef7603), arm64 started using the
DMA zone again, but now for up to 1GiB. (AFAICS the DMA32 zone can't
be used with kmalloc at all, that only works with the DMA zone.)

> :) There seems to be a LOT of stuff using GFP_DMA, and it seems unlikely
> those are all expecting low addresses?

I think there's a bunch of (especially really old) hardware where the
hardware can only talk to low physical addresses, e.g. stuff that uses
the ISA bus.

However, there aren't *that* many users of GFP_DMA that actually cause
kmalloc allocations with GFP_DMA - many of the users of GFP_DMA
actually just pass that flag to dma_alloc_coherent()/dma_pool_alloc(),
where it is filtered away and the allocation ultimately doesn't go
through the slab allocator AFAICS, or they pass it directly to the
page allocator, where it has no effect on the usercopy stuff. Looking
on my workstation, there are zero objects allocated in dma-kmalloc-*
slabs:

/sys/kernel/slab# for name in dma-kmalloc-*; do echo "$name: $(cat
$name/objects)"; done
dma-kmalloc-128: 0
dma-kmalloc-16: 0
dma-kmalloc-192: 0
dma-kmalloc-1k: 0
dma-kmalloc-256: 0
dma-kmalloc-2k: 0
dma-kmalloc-32: 0
dma-kmalloc-4k: 0
dma-kmalloc-512: 0
dma-kmalloc-64: 0
dma-kmalloc-8: 0
dma-kmalloc-8k: 0
dma-kmalloc-96: 0

On a Pixel 2, there are a whole five objects allocated across the DMA
zone kmalloc caches:

walleye:/sys/kernel/slab # for name in dma-kmalloc-*; do echo "$name:
$(cat $name/objects)"; done
dma-kmalloc-1024: 0
dma-kmalloc-128: 0
dma-kmalloc-2048: 2
dma-kmalloc-256: 0
dma-kmalloc-4096: 3
dma-kmalloc-512: 0
dma-kmalloc-8192: 0

> Since this has only been a problem on s390, should just s390 gain the
> weakening of the usercopy restriction?  Something like:
>
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1907cb2903c7..c5bbc141f20b 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1303,7 +1303,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>                         kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
>                                 kmalloc_info[i].name[KMALLOC_DMA],
>                                 kmalloc_info[i].size,
> -                               SLAB_CACHE_DMA | flags, 0, 0);
> +                               SLAB_CACHE_DMA | flags, 0,
> +                               IS_ENABLED(CONFIG_S390) ?
> +                                       kmalloc_info[i].size : 0);
>                 }
>         }
>  #endif

I think dma-kmalloc slabs should be handled the same way as normal
kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
just normal kernel memory - even if it might later be used for DMA -,
and it should be perfectly fine to copy_from_user() into such
allocations at that point, and to copy_to_user() out of them at the
end. If you look at the places where such allocations are created, you
can see things like kmemdup(), memcpy() and so on - all normal
operations that shouldn't conceptually be different from usercopy in
any relevant way.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-31 12:03                           ` Jann Horn
@ 2020-02-01 17:56                             ` Kees Cook
  2020-02-01 19:27                               ` Jann Horn
  2020-02-03 17:20                               ` Christopher Lameter
  2020-04-07  8:00                             ` Vlastimil Babka
  1 sibling, 2 replies; 28+ messages in thread
From: Kees Cook @ 2020-02-01 17:56 UTC (permalink / raw)
  To: Jann Horn
  Cc: Christian Borntraeger, Christoph Hellwig, Christopher Lameter,
	Jiri Slaby, Julian Wiedmann, Ursula Braun, Alexander Viro,
	kernel list, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Linux-MM, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch,
	Network Development, Kernel Hardening, Vlastimil Babka,
	Michal Kubecek

On Fri, Jan 31, 2020 at 01:03:40PM +0100, Jann Horn wrote:
> I think dma-kmalloc slabs should be handled the same way as normal
> kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
> just normal kernel memory - even if it might later be used for DMA -,
> and it should be perfectly fine to copy_from_user() into such
> allocations at that point, and to copy_to_user() out of them at the
> end. If you look at the places where such allocations are created, you
> can see things like kmemdup(), memcpy() and so on - all normal
> operations that shouldn't conceptually be different from usercopy in
> any relevant way.

I can't find where the address limit for dma-kmalloc is implemented.

As to whitelisting all of dma-kmalloc -- I guess I can be talked into
it. It still seems like the memory used for direct hardware
communication shouldn't be exposed to userspace, but it we're dealing
with packet data, etc, then it makes sense not to have to have bounce
buffers, etc.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-02-01 17:56                             ` Kees Cook
@ 2020-02-01 19:27                               ` Jann Horn
  2020-02-03  7:46                                 ` Matthew Wilcox
  2020-02-03 17:20                               ` Christopher Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Jann Horn @ 2020-02-01 19:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christian Borntraeger, Christoph Hellwig, Christopher Lameter,
	Jiri Slaby, Julian Wiedmann, Ursula Braun, Alexander Viro,
	kernel list, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Linux-MM, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Dave Kleikamp, Jan Kara,
	Marc Zyngier, Matthew Garrett, linux-fsdevel, linux-arch,
	Network Development, Kernel Hardening, Vlastimil Babka,
	Michal Kubecek

[pruned bogus addresses from recipient list]

On Sat, Feb 1, 2020 at 6:56 PM Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jan 31, 2020 at 01:03:40PM +0100, Jann Horn wrote:
> > I think dma-kmalloc slabs should be handled the same way as normal
> > kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
> > just normal kernel memory - even if it might later be used for DMA -,
> > and it should be perfectly fine to copy_from_user() into such
> > allocations at that point, and to copy_to_user() out of them at the
> > end. If you look at the places where such allocations are created, you
> > can see things like kmemdup(), memcpy() and so on - all normal
> > operations that shouldn't conceptually be different from usercopy in
> > any relevant way.
>
> I can't find where the address limit for dma-kmalloc is implemented.

dma-kmalloc is a slab that uses GFP_DMA pages.

Things have changed a bit through the kernel versions, but in current
mainline, the zone limit for GFP_DMA is reported from arch code to
generic code via zone_dma_bits, from where it is used to decide which
zones should be used for allocations based on the address limit of a
given device:

kernel/dma/direct.c:
/*
 * Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it
 * it for entirely different regions. In that case the arch code needs to
 * override the variable below for dma-direct to work properly.
 */
unsigned int zone_dma_bits __ro_after_init = 24;
[...]
static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
                u64 *phys_limit)
{
[...]
        /*
         * Optimistically try the zone that the physical address mask falls
         * into first.  If that returns memory that isn't actually addressable
         * we will fallback to the next lower zone and try again.
         *
         * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
         * zones.
         */
        if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
                return GFP_DMA;
        if (*phys_limit <= DMA_BIT_MASK(32))
                return GFP_DMA32;
        return 0;
}


There are only a few architectures that override the limit:

powerpc:
        /*
         * Allow 30-bit DMA for very limited Broadcom wifi chips on many
         * powerbooks.
         */
        if (IS_ENABLED(CONFIG_PPC32))
                zone_dma_bits = 30;
        else
                zone_dma_bits = 31;

s390:
        zone_dma_bits = 31;

and arm64:
#define ARM64_ZONE_DMA_BITS     30
[...]
        if (IS_ENABLED(CONFIG_ZONE_DMA)) {
                zone_dma_bits = ARM64_ZONE_DMA_BITS;
                arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
        }


The actual categorization of page ranges into zones happens via
free_area_init_nodes() or free_area_init_node(); these are provided
with arrays of maximum physical addresses or zone sizes (depending on
which of them is called) by arch-specific code.
For arm64, the caller is zone_sizes_init(). X86 does it in zone_sizes_init().

> As to whitelisting all of dma-kmalloc -- I guess I can be talked into
> it. It still seems like the memory used for direct hardware
> communication shouldn't be exposed to userspace, but it we're dealing
> with packet data, etc, then it makes sense not to have to have bounce
> buffers, etc.

FWIW, as far as I understand, usercopy doesn't actually have any
effect on drivers that use the modern, proper APIs, since those don't
use the slab allocator at all - as I pointed out in my last mail, the
dma-kmalloc* slabs are used very rarely. (Which is good, because
putting objects from less-than-page-size slabs into iommu entries is a
terrible idea from a security and reliability perspective because it
gives the hardware access to completely unrelated memory.) Instead,
they get pages from the page allocator, and these pages may e.g. be
allocated from the DMA, DMA32 or NORMAL zones depending on the
restrictions imposed by hardware. So I think the usercopy restriction
only affects a few oddball drivers (like this s390 stuff), which is
why you're not seeing more bug reports caused by this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-02-01 19:27                               ` Jann Horn
@ 2020-02-03  7:46                                 ` Matthew Wilcox
  2020-02-03 17:41                                   ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2020-02-03  7:46 UTC (permalink / raw)
  To: Jann Horn
  Cc: Kees Cook, Christian Borntraeger, Christoph Hellwig,
	Christopher Lameter, Jiri Slaby, Julian Wiedmann, Ursula Braun,
	Alexander Viro, kernel list, David Windsor, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Linux-MM, linux-xfs,
	Linus Torvalds, Andy Lutomirski, David S. Miller, Laura Abbott,
	Mark Rutland, Martin K. Petersen, Paolo Bonzini, Dave Kleikamp,
	Jan Kara, Marc Zyngier, Matthew Garrett, linux-fsdevel,
	linux-arch, Network Development, Kernel Hardening,
	Vlastimil Babka, Michal Kubecek

On Sat, Feb 01, 2020 at 08:27:49PM +0100, Jann Horn wrote:
> FWIW, as far as I understand, usercopy doesn't actually have any
> effect on drivers that use the modern, proper APIs, since those don't
> use the slab allocator at all - as I pointed out in my last mail, the
> dma-kmalloc* slabs are used very rarely. (Which is good, because
> putting objects from less-than-page-size slabs into iommu entries is a
> terrible idea from a security and reliability perspective because it
> gives the hardware access to completely unrelated memory.) Instead,
> they get pages from the page allocator, and these pages may e.g. be
> allocated from the DMA, DMA32 or NORMAL zones depending on the
> restrictions imposed by hardware. So I think the usercopy restriction
> only affects a few oddball drivers (like this s390 stuff), which is
> why you're not seeing more bug reports caused by this.

Getting pages from the page allocator is true for dma_alloc_coherent()
and friends.  But it's not true for streaming DMA mappings (dma_map_*)
for which the memory usually comes from kmalloc().  If this is something
we want to fix (and I have an awful feeling we're going to regret it
if we say "no, we trust the hardware"), we're going to have to come up
with a new memory allocation API for these cases.  Or bounce bugger the
memory for devices we don't trust.

The problem with the dma_map_* API is that memory might end up being
allocated once and then used multiple times by different drivers.  eg if
I allocate an NFS packet, it might get sent first to eth0, then (when the
route fails) sent to eth1.  Similarly in storage, a RAID-5 driver might
map the same memory several times to send to different disk controllers.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-02-01 17:56                             ` Kees Cook
  2020-02-01 19:27                               ` Jann Horn
@ 2020-02-03 17:20                               ` Christopher Lameter
  1 sibling, 0 replies; 28+ messages in thread
From: Christopher Lameter @ 2020-02-03 17:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jann Horn, Christian Borntraeger, Christoph Hellwig, Jiri Slaby,
	Julian Wiedmann, Ursula Braun, Alexander Viro, kernel list,
	David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch,
	Network Development, Kernel Hardening, Vlastimil Babka,
	Michal Kubecek

On Sat, 1 Feb 2020, Kees Cook wrote:
>
> I can't find where the address limit for dma-kmalloc is implemented.

include/linux/mmzones.h

enum zone_type {
        /*
         * ZONE_DMA and ZONE_DMA32 are used when there are peripherals not able
         * to DMA to all of the addressable memory (ZONE_NORMAL).
         * On architectures where this area covers the whole 32 bit address
         * space ZONE_DMA32 is used. ZONE_DMA is left for the ones with smaller
         * DMA addressing constraints. This distinction is important as a 32bit
         * DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
         * platforms may need both zones as they support peripherals with
         * different DMA addressing limitations.
         *
         * Some examples:
         *
         *  - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the
         *    rest of the lower 4G.
         *
         *  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
         *    the specific device.
         *
         *  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
         *    lower 4G.
         *
         *  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
         *    depending on the specific device.
         *
         *  - s390 uses ZONE_DMA fixed to the lower 2G.
         *
         *  - ia64 and riscv only use ZONE_DMA32.
         *
         *  - parisc uses neither.
         */
#ifdef CONFIG_ZONE_DMA
        ZONE_DMA,
#endif
#ifdef CONFIG_ZONE_DMA32
        ZONE_DMA32,
#endif
        /*
         * Normal addressable memory is in ZONE_NORMAL. DMA operations can
be
         * performed on pages in ZONE_NORMAL if the DMA devices support
         * transfers to all addressable memory.
         */
        ZONE_NORMAL,
#ifdef CONFIG_HIGHMEM
        /*
         * A memory area that is only addressable by the kernel through
         * mapping portions into its own address space. This is for example
         * used by i386 to allow the kernel to address the memory beyond
         * 900MB. The kernel will set up special mappings (page
         * table entries on i386) for each page that the kernel needs to
         * access.
         */
        ZONE_HIGHMEM,
#endif
        ZONE_MOVABLE,
#ifdef CONFIG_ZONE_DEVICE
        ZONE_DEVICE,
#endif
        __MAX_NR_ZONES

};


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-29 17:19                       ` Christian Borntraeger
  2020-01-30 19:23                         ` Kees Cook
@ 2020-02-03 17:36                         ` Christoph Hellwig
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2020-02-03 17:36 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Christopher Lameter, Kees Cook, Jiri Slaby,
	Julian Wiedmann, Ursula Braun, Alexander Viro, linux-kernel,
	David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek

On Wed, Jan 29, 2020 at 06:19:56PM +0100, Christian Borntraeger wrote:
> There is not necessarily a device for that. It is a hypervisor interface (an
> instruction that is interpreted by z/VM). We do have the netiucv driver that
> creates a virtual nic, but there is also AF_IUCV which works without a device.
> 
> But back to the original question: If we mark kmalloc caches as usercopy caches,
> we should do the same for DMA kmalloc caches. As outlined by Christoph, this has
> nothing to do with device DMA.

Oh well, s/390 with its weird mix of cpu and I/O again.  Everywhere else
where we have addressing limits we do treat that as a DMA address.

We've also had a bit of a lose plan to force ZONE_DMA as a public
interface out, as it is generally the wrong thing to do for drivers.
A ZONE_32 and/or ZONE_31 makes some sense as the backing for the
dma allocator, but it mostly shouldn't be exposed, especially not to
the slab allocator.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-30 19:23                         ` Kees Cook
  2020-01-31 12:03                           ` Jann Horn
@ 2020-02-03 17:38                           ` Christoph Hellwig
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2020-02-03 17:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christian Borntraeger, Christoph Hellwig, Christopher Lameter,
	Jiri Slaby, Julian Wiedmann, Ursula Braun, Alexander Viro,
	linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch, netdev,
	kernel-hardening, Vlastimil Babka, Michal Kubecek

On Thu, Jan 30, 2020 at 11:23:38AM -0800, Kees Cook wrote:
> Hm, looks like it's allocated from the low 16MB. Seems like poor naming!
> :) There seems to be a LOT of stuff using GFP_DMA, and it seems unlikely
> those are all expecting low addresses?

Most of that is either completely or partially bogus.  Besides the
weird S/390 stuff pretty much everything should be using the dma
allocators.  A couple really messed up drivers even pass GFP_DMA*
to the dma allocator, but my patches to fix that up seem to be stuck.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-02-03  7:46                                 ` Matthew Wilcox
@ 2020-02-03 17:41                                   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2020-02-03 17:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jann Horn, Kees Cook, Christian Borntraeger, Christoph Hellwig,
	Christopher Lameter, Jiri Slaby, Julian Wiedmann, Ursula Braun,
	Alexander Viro, kernel list, David Windsor, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Linux-MM, linux-xfs,
	Linus Torvalds, Andy Lutomirski, David S. Miller, Laura Abbott,
	Mark Rutland, Martin K. Petersen, Paolo Bonzini, Dave Kleikamp,
	Jan Kara, Marc Zyngier, Matthew Garrett, linux-fsdevel,
	linux-arch, Network Development, Kernel Hardening,
	Vlastimil Babka, Michal Kubecek

On Sun, Feb 02, 2020 at 11:46:44PM -0800, Matthew Wilcox wrote:
> > gives the hardware access to completely unrelated memory.) Instead,
> > they get pages from the page allocator, and these pages may e.g. be
> > allocated from the DMA, DMA32 or NORMAL zones depending on the
> > restrictions imposed by hardware. So I think the usercopy restriction
> > only affects a few oddball drivers (like this s390 stuff), which is
> > why you're not seeing more bug reports caused by this.
> 
> Getting pages from the page allocator is true for dma_alloc_coherent()
> and friends.

dma_alloc_coherent also has a few other memory sources than the page
allocator..

> But it's not true for streaming DMA mappings (dma_map_*)
> for which the memory usually comes from kmalloc().  If this is something
> we want to fix (and I have an awful feeling we're going to regret it
> if we say "no, we trust the hardware"), we're going to have to come up
> with a new memory allocation API for these cases.  Or bounce bugger the
> memory for devices we don't trust.

There aren't too many places that use slab allocations for streaming
mappings, and then do usercopies into them.  But I vaguely remember
some usb code getting the annotations for that a while ago.

But if you don't trust your hardware you will need to use IOMMUs and
page aligned memory, or IOMMUs plus bounce buffering for the kmalloc
allocations (we've recently grown code to do that for the intel-iommu
driver, which needs to be lifted into the generic IOMMU code).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-01-31 12:03                           ` Jann Horn
  2020-02-01 17:56                             ` Kees Cook
@ 2020-04-07  8:00                             ` Vlastimil Babka
  2020-04-07 11:05                               ` Christian Borntraeger
  2020-04-20  7:53                               ` Jiri Slaby
  1 sibling, 2 replies; 28+ messages in thread
From: Vlastimil Babka @ 2020-04-07  8:00 UTC (permalink / raw)
  To: Jann Horn, Kees Cook
  Cc: Christian Borntraeger, Christoph Hellwig, Christopher Lameter,
	Jiri Slaby, Julian Wiedmann, Ursula Braun, Alexander Viro,
	kernel list, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Linux-MM, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch,
	Network Development, Kernel Hardening, Michal Kubecek

On 1/31/20 1:03 PM, Jann Horn wrote:

> I think dma-kmalloc slabs should be handled the same way as normal
> kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
> just normal kernel memory - even if it might later be used for DMA -,
> and it should be perfectly fine to copy_from_user() into such
> allocations at that point, and to copy_to_user() out of them at the
> end. If you look at the places where such allocations are created, you
> can see things like kmemdup(), memcpy() and so on - all normal
> operations that shouldn't conceptually be different from usercopy in
> any relevant way.
 
So, let's do that?

----8<----
From d5190e4e871689a530da3c3fd327be45a88f006a Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 7 Apr 2020 09:58:00 +0200
Subject: [PATCH] usercopy: Mark dma-kmalloc caches as usercopy caches

We have seen a "usercopy: Kernel memory overwrite attempt detected to SLUB
object 'dma-kmalloc-1 k' (offset 0, size 11)!" error on s390x, as IUCV uses
kmalloc() with __GFP_DMA because of memory address restrictions.
The issue has been discussed [2] and it has been noted that if all the kmalloc
caches are marked as usercopy, there's little reason not to mark dma-kmalloc
caches too. The 'dma' part merely means that __GFP_DMA is used to restrict
memory address range.

As Jann Horn put it [3]:

"I think dma-kmalloc slabs should be handled the same way as normal
kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
just normal kernel memory - even if it might later be used for DMA -,
and it should be perfectly fine to copy_from_user() into such
allocations at that point, and to copy_to_user() out of them at the
end. If you look at the places where such allocations are created, you
can see things like kmemdup(), memcpy() and so on - all normal
operations that shouldn't conceptually be different from usercopy in
any relevant way."

Thus this patch marks the dma-kmalloc-* caches as usercopy.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1156053
[2] https://lore.kernel.org/kernel-hardening/bfca96db-bbd0-d958-7732-76e36c667c68@suse.cz/
[3] https://lore.kernel.org/kernel-hardening/CAG48ez1a4waGk9kB0WLaSbs4muSoK0AYAVk8=XYaKj4_+6e6Hg@mail.gmail.com/

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 5282f881d2f5..ae9486160594 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1303,7 +1303,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
 				kmalloc_info[i].name[KMALLOC_DMA],
 				kmalloc_info[i].size,
-				SLAB_CACHE_DMA | flags, 0, 0);
+				SLAB_CACHE_DMA | flags, 0,
+				kmalloc_info[i].size);
 		}
 	}
 #endif
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-04-07  8:00                             ` Vlastimil Babka
@ 2020-04-07 11:05                               ` Christian Borntraeger
  2020-04-20  7:53                               ` Jiri Slaby
  1 sibling, 0 replies; 28+ messages in thread
From: Christian Borntraeger @ 2020-04-07 11:05 UTC (permalink / raw)
  To: Vlastimil Babka, Jann Horn, Kees Cook
  Cc: Christoph Hellwig, Christopher Lameter, Jiri Slaby,
	Julian Wiedmann, Ursula Braun, Alexander Viro, kernel list,
	David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch,
	Network Development, Kernel Hardening, Michal Kubecek



On 07.04.20 10:00, Vlastimil Babka wrote:
> On 1/31/20 1:03 PM, Jann Horn wrote:
> 
>> I think dma-kmalloc slabs should be handled the same way as normal
>> kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
>> just normal kernel memory - even if it might later be used for DMA -,
>> and it should be perfectly fine to copy_from_user() into such
>> allocations at that point, and to copy_to_user() out of them at the
>> end. If you look at the places where such allocations are created, you
>> can see things like kmemdup(), memcpy() and so on - all normal
>> operations that shouldn't conceptually be different from usercopy in
>> any relevant way.
>  
> So, let's do that?
> 
> ----8<----
> From d5190e4e871689a530da3c3fd327be45a88f006a Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 7 Apr 2020 09:58:00 +0200
> Subject: [PATCH] usercopy: Mark dma-kmalloc caches as usercopy caches
> 
> We have seen a "usercopy: Kernel memory overwrite attempt detected to SLUB
> object 'dma-kmalloc-1 k' (offset 0, size 11)!" error on s390x, as IUCV uses
> kmalloc() with __GFP_DMA because of memory address restrictions.
> The issue has been discussed [2] and it has been noted that if all the kmalloc
> caches are marked as usercopy, there's little reason not to mark dma-kmalloc
> caches too. The 'dma' part merely means that __GFP_DMA is used to restrict
> memory address range.
> 
> As Jann Horn put it [3]:
> 
> "I think dma-kmalloc slabs should be handled the same way as normal
> kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
> just normal kernel memory - even if it might later be used for DMA -,
> and it should be perfectly fine to copy_from_user() into such
> allocations at that point, and to copy_to_user() out of them at the
> end. If you look at the places where such allocations are created, you
> can see things like kmemdup(), memcpy() and so on - all normal
> operations that shouldn't conceptually be different from usercopy in
> any relevant way."
> 
> Thus this patch marks the dma-kmalloc-* caches as usercopy.
> 
> [1] https://bugzilla.suse.com/show_bug.cgi?id=1156053
> [2] https://lore.kernel.org/kernel-hardening/bfca96db-bbd0-d958-7732-76e36c667c68@suse.cz/
> [3] https://lore.kernel.org/kernel-hardening/CAG48ez1a4waGk9kB0WLaSbs4muSoK0AYAVk8=XYaKj4_+6e6Hg@mail.gmail.com/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

> ---
>  mm/slab_common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 5282f881d2f5..ae9486160594 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1303,7 +1303,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>  			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
>  				kmalloc_info[i].name[KMALLOC_DMA],
>  				kmalloc_info[i].size,
> -				SLAB_CACHE_DMA | flags, 0, 0);
> +				SLAB_CACHE_DMA | flags, 0,
> +				kmalloc_info[i].size);
>  		}
>  	}
>  #endif
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-04-07  8:00                             ` Vlastimil Babka
  2020-04-07 11:05                               ` Christian Borntraeger
@ 2020-04-20  7:53                               ` Jiri Slaby
  2020-04-20 17:43                                 ` Kees Cook
  1 sibling, 1 reply; 28+ messages in thread
From: Jiri Slaby @ 2020-04-20  7:53 UTC (permalink / raw)
  To: Vlastimil Babka, Jann Horn, Kees Cook
  Cc: Christian Borntraeger, Christoph Hellwig, Christopher Lameter,
	Julian Wiedmann, Ursula Braun, Alexander Viro, kernel list,
	David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, linux-xfs, Linus Torvalds,
	Andy Lutomirski, David S. Miller, Laura Abbott, Mark Rutland,
	Martin K. Petersen, Paolo Bonzini, Christoffer Dall,
	Dave Kleikamp, Jan Kara, Luis de Bethencourt, Marc Zyngier,
	Rik van Riel, Matthew Garrett, linux-fsdevel, linux-arch,
	Network Development, Kernel Hardening, Michal Kubecek

On 07. 04. 20, 10:00, Vlastimil Babka wrote:
> From d5190e4e871689a530da3c3fd327be45a88f006a Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 7 Apr 2020 09:58:00 +0200
> Subject: [PATCH] usercopy: Mark dma-kmalloc caches as usercopy caches
> 
> We have seen a "usercopy: Kernel memory overwrite attempt detected to SLUB
> object 'dma-kmalloc-1 k' (offset 0, size 11)!" error on s390x, as IUCV uses
> kmalloc() with __GFP_DMA because of memory address restrictions.
> The issue has been discussed [2] and it has been noted that if all the kmalloc
> caches are marked as usercopy, there's little reason not to mark dma-kmalloc
> caches too. The 'dma' part merely means that __GFP_DMA is used to restrict
> memory address range.
> 
> As Jann Horn put it [3]:
> 
> "I think dma-kmalloc slabs should be handled the same way as normal
> kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
> just normal kernel memory - even if it might later be used for DMA -,
> and it should be perfectly fine to copy_from_user() into such
> allocations at that point, and to copy_to_user() out of them at the
> end. If you look at the places where such allocations are created, you
> can see things like kmemdup(), memcpy() and so on - all normal
> operations that shouldn't conceptually be different from usercopy in
> any relevant way."
> 
> Thus this patch marks the dma-kmalloc-* caches as usercopy.
> 
> [1] https://bugzilla.suse.com/show_bug.cgi?id=1156053
> [2] https://lore.kernel.org/kernel-hardening/bfca96db-bbd0-d958-7732-76e36c667c68@suse.cz/
> [3] https://lore.kernel.org/kernel-hardening/CAG48ez1a4waGk9kB0WLaSbs4muSoK0AYAVk8=XYaKj4_+6e6Hg@mail.gmail.com/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Friendly ping.

Acked-by: Jiri Slaby <jslaby@suse.cz>

> ---
>  mm/slab_common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 5282f881d2f5..ae9486160594 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1303,7 +1303,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>  			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
>  				kmalloc_info[i].name[KMALLOC_DMA],
>  				kmalloc_info[i].size,
> -				SLAB_CACHE_DMA | flags, 0, 0);
> +				SLAB_CACHE_DMA | flags, 0,
> +				kmalloc_info[i].size);
>  		}
>  	}
>  #endif
> 

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches
  2020-04-20  7:53                               ` Jiri Slaby
@ 2020-04-20 17:43                                 ` Kees Cook
  0 siblings, 0 replies; 28+ messages in thread
From: Kees Cook @ 2020-04-20 17:43 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Vlastimil Babka, Jann Horn, Christian Borntraeger,
	Christoph Hellwig, Christopher Lameter, Julian Wiedmann,
	Ursula Braun, Alexander Viro, kernel list, David Windsor,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Linux-MM, linux-xfs, Linus Torvalds, Andy Lutomirski,
	David S. Miller, Laura Abbott, Mark Rutland, Martin K. Petersen,
	Paolo Bonzini, Christoffer Dall, Dave Kleikamp, Jan Kara,
	Luis de Bethencourt, Marc Zyngier, Rik van Riel, Matthew Garrett,
	linux-fsdevel, linux-arch, Network Development, Kernel Hardening,
	Michal Kubecek

On Mon, Apr 20, 2020 at 09:53:20AM +0200, Jiri Slaby wrote:
> On 07. 04. 20, 10:00, Vlastimil Babka wrote:
> > From d5190e4e871689a530da3c3fd327be45a88f006a Mon Sep 17 00:00:00 2001
> > From: Vlastimil Babka <vbabka@suse.cz>
> > Date: Tue, 7 Apr 2020 09:58:00 +0200
> > Subject: [PATCH] usercopy: Mark dma-kmalloc caches as usercopy caches
> > 
> > We have seen a "usercopy: Kernel memory overwrite attempt detected to SLUB
> > object 'dma-kmalloc-1 k' (offset 0, size 11)!" error on s390x, as IUCV uses
> > kmalloc() with __GFP_DMA because of memory address restrictions.
> > The issue has been discussed [2] and it has been noted that if all the kmalloc
> > caches are marked as usercopy, there's little reason not to mark dma-kmalloc
> > caches too. The 'dma' part merely means that __GFP_DMA is used to restrict
> > memory address range.
> > 
> > As Jann Horn put it [3]:
> > 
> > "I think dma-kmalloc slabs should be handled the same way as normal
> > kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
> > just normal kernel memory - even if it might later be used for DMA -,
> > and it should be perfectly fine to copy_from_user() into such
> > allocations at that point, and to copy_to_user() out of them at the
> > end. If you look at the places where such allocations are created, you
> > can see things like kmemdup(), memcpy() and so on - all normal
> > operations that shouldn't conceptually be different from usercopy in
> > any relevant way."
> > 
> > Thus this patch marks the dma-kmalloc-* caches as usercopy.
> > 
> > [1] https://bugzilla.suse.com/show_bug.cgi?id=1156053
> > [2] https://lore.kernel.org/kernel-hardening/bfca96db-bbd0-d958-7732-76e36c667c68@suse.cz/
> > [3] https://lore.kernel.org/kernel-hardening/CAG48ez1a4waGk9kB0WLaSbs4muSoK0AYAVk8=XYaKj4_+6e6Hg@mail.gmail.com/
> > 
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Friendly ping.
> 
> Acked-by: Jiri Slaby <jslaby@suse.cz>

Should this go via -mm?

-Kees

> 
> > ---
> >  mm/slab_common.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 5282f881d2f5..ae9486160594 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1303,7 +1303,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
> >  			kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
> >  				kmalloc_info[i].name[KMALLOC_DMA],
> >  				kmalloc_info[i].size,
> > -				SLAB_CACHE_DMA | flags, 0, 0);
> > +				SLAB_CACHE_DMA | flags, 0,
> > +				kmalloc_info[i].size);
> >  		}
> >  	}
> >  #endif
> > 
> 
> thanks,
> -- 
> js
> suse labs

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-04-20 17:43 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1515636190-24061-1-git-send-email-keescook@chromium.org>
2018-01-11  2:02 ` [PATCH 06/38] usercopy: Prepare for usercopy whitelisting Kees Cook
2018-01-11  2:02 ` [PATCH 07/38] usercopy: WARN() on slab cache usercopy region violations Kees Cook
2018-01-11  2:02 ` [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches Kees Cook
2019-11-12  7:17   ` [kernel-hardening] " Jiri Slaby
2019-11-12 21:21     ` Kees Cook
2019-11-14 21:27       ` Kees Cook
2020-01-23  8:14         ` Jiri Slaby
2020-01-27 23:19           ` Kees Cook
2020-01-28  7:58             ` Christian Borntraeger
2020-01-28 23:01               ` Kees Cook
2020-01-29  9:26                 ` Ursula Braun
2020-01-29 16:43                 ` Christopher Lameter
2020-01-29 17:07                   ` Christian Borntraeger
2020-01-29 17:09                     ` Christoph Hellwig
2020-01-29 17:19                       ` Christian Borntraeger
2020-01-30 19:23                         ` Kees Cook
2020-01-31 12:03                           ` Jann Horn
2020-02-01 17:56                             ` Kees Cook
2020-02-01 19:27                               ` Jann Horn
2020-02-03  7:46                                 ` Matthew Wilcox
2020-02-03 17:41                                   ` Christoph Hellwig
2020-02-03 17:20                               ` Christopher Lameter
2020-04-07  8:00                             ` Vlastimil Babka
2020-04-07 11:05                               ` Christian Borntraeger
2020-04-20  7:53                               ` Jiri Slaby
2020-04-20 17:43                                 ` Kees Cook
2020-02-03 17:38                           ` Christoph Hellwig
2020-02-03 17:36                         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).