All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v3]mm: ro protection for data allocated dynamically
@ 2017-05-19 10:38 ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-19 10:38 UTC (permalink / raw)
  To: mhocko, dave.hansen, labbott
  Cc: linux-mm, kernel-hardening, linux-kernel, Igor Stoppa

Not all the data allocated dynamically needs to be altered frequently.
In some cases, it might be written just once, at initialization.

This RFC has the goal of improving memory integrity, by explicitly
making said data write-protected.

A reference implementation is provided.

During the previous 2 rounds, some concerns/questions were risen.
This iteration should address msot of them, if not all.

Basic idea behind the implementation: on systems with MMU, the MMU
supports associating various types of attribute to memory pages.

One of them is being read-only.
The MMU will cause an exception upon attempts to alter a read-only page.
This mechanism is already in use for protecting: kernel text and
constant data.
Relatively recently, it has become possible to have also statically
allocated data to become read-only, with the __ro_after_init annotation.

However nothing is done for variables allocated dynamically.

The catch for re-using the same mechanism, is that soon-to-be read only
variables must be grouped in dedicated memory pages, without any rw data
falling in the same range.

This can be achieved with a dedicated allocator.

The implementation proposed allows to create memory pools.
Each pool can be treated independently from the others, allowing fine
grained control about what data can be overwritten.

A pool is a kernel linked list, where the head contains a mutex used for
accessing the list, and the elements are nodes, providing the memory
actually used.

When a pool receives an allocation request for which it doesn't have
enough memory already available, it obtains a set of contiguous virtual
pages (node) that is large enough to cover the request being processed.
Such memory is likely to be significantly larger than what was required.
The slack is used for fulfilling further allocation requests, provided
that they fit the space available.

The pool ends up being a list of nodes, where each node contains a
request that, at the time it was received, could not be satisfied by
using the exisitng nodes, plus other requests that happened to fit in the
slack. Such requests handle each node as an individual linear pool.

When it's time to seal/unseal a pool, each element (node) of the list is
visited and the range of pages it comprises is passed ot set_memory_ro/rw.

Freeing memory is supported at pool level: if for some reason one or more
memory requests must be discarded, at some point, they are simply ignored.
Upon the pool tear down, then nodes are removed one by one and the
corresponding memory range freed for good with vfree.

This approach avoids the extra coplexity of tracking individual
allocations, yet it allows to control claim back pages when not needed
anymore (i.e. module unloading.)

The same design also supports isolation between different kernel modules:
each module can allocae one or more pools, to obtain the desired level of
granularity when managing portions of its data that need different handling.

The price for this flexibility is that some more slack is produced.
The exact amount depends on the sizes of allocations performed and in
which order they arrive.

Modules that do not want/need all of this flexibility can use the default
global pool provided by the allocator.

This pool is intended to provide consistency with __ro_after_init and
therefore would be sealed at the same time.

Some observations/questions:

* the backend of the memory allocation is done by using vmalloc.
  Is here any better way? the bpf uses module_alloc but that seems not
  exactly its purpose.

* because of the vmalloc backend, this is not suitable for cases where
  it is really needed to have physically contiguous memory regions,
  however the type of data that would use this interface is likely to
  not require interaction with HW devices that could rise such need.

* the allocator supports defining a preferred alignment (currently set
  to 8 bytes, using uint64_t) - is it useful/desirable?
  If yes, is it the correct granularity (global)?

* to get the size of the padded header of a node, the current code uses
  __align(align_t) and it seems to work, but is it correct?

* examples of uses for this new allcoator:
  - LSM Hooks
  - policy database of SE Linux (several different structure types)

Igor Stoppa (1):
  Sealable memory support

 mm/Makefile  |   2 +-
 mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/smalloc.h |  61 ++++++++++++++++++
 3 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 mm/smalloc.c
 create mode 100644 mm/smalloc.h

-- 
2.9.3

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC v3]mm: ro protection for data allocated dynamically
@ 2017-05-19 10:38 ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-19 10:38 UTC (permalink / raw)
  To: mhocko, dave.hansen, labbott
  Cc: linux-mm, kernel-hardening, linux-kernel, Igor Stoppa

Not all the data allocated dynamically needs to be altered frequently.
In some cases, it might be written just once, at initialization.

This RFC has the goal of improving memory integrity, by explicitly
making said data write-protected.

A reference implementation is provided.

During the previous 2 rounds, some concerns/questions were risen.
This iteration should address msot of them, if not all.

Basic idea behind the implementation: on systems with MMU, the MMU
supports associating various types of attribute to memory pages.

One of them is being read-only.
The MMU will cause an exception upon attempts to alter a read-only page.
This mechanism is already in use for protecting: kernel text and
constant data.
Relatively recently, it has become possible to have also statically
allocated data to become read-only, with the __ro_after_init annotation.

However nothing is done for variables allocated dynamically.

The catch for re-using the same mechanism, is that soon-to-be read only
variables must be grouped in dedicated memory pages, without any rw data
falling in the same range.

This can be achieved with a dedicated allocator.

The implementation proposed allows to create memory pools.
Each pool can be treated independently from the others, allowing fine
grained control about what data can be overwritten.

A pool is a kernel linked list, where the head contains a mutex used for
accessing the list, and the elements are nodes, providing the memory
actually used.

When a pool receives an allocation request for which it doesn't have
enough memory already available, it obtains a set of contiguous virtual
pages (node) that is large enough to cover the request being processed.
Such memory is likely to be significantly larger than what was required.
The slack is used for fulfilling further allocation requests, provided
that they fit the space available.

The pool ends up being a list of nodes, where each node contains a
request that, at the time it was received, could not be satisfied by
using the exisitng nodes, plus other requests that happened to fit in the
slack. Such requests handle each node as an individual linear pool.

When it's time to seal/unseal a pool, each element (node) of the list is
visited and the range of pages it comprises is passed ot set_memory_ro/rw.

Freeing memory is supported at pool level: if for some reason one or more
memory requests must be discarded, at some point, they are simply ignored.
Upon the pool tear down, then nodes are removed one by one and the
corresponding memory range freed for good with vfree.

This approach avoids the extra coplexity of tracking individual
allocations, yet it allows to control claim back pages when not needed
anymore (i.e. module unloading.)

The same design also supports isolation between different kernel modules:
each module can allocae one or more pools, to obtain the desired level of
granularity when managing portions of its data that need different handling.

The price for this flexibility is that some more slack is produced.
The exact amount depends on the sizes of allocations performed and in
which order they arrive.

Modules that do not want/need all of this flexibility can use the default
global pool provided by the allocator.

This pool is intended to provide consistency with __ro_after_init and
therefore would be sealed at the same time.

Some observations/questions:

* the backend of the memory allocation is done by using vmalloc.
  Is here any better way? the bpf uses module_alloc but that seems not
  exactly its purpose.

* because of the vmalloc backend, this is not suitable for cases where
  it is really needed to have physically contiguous memory regions,
  however the type of data that would use this interface is likely to
  not require interaction with HW devices that could rise such need.

* the allocator supports defining a preferred alignment (currently set
  to 8 bytes, using uint64_t) - is it useful/desirable?
  If yes, is it the correct granularity (global)?

* to get the size of the padded header of a node, the current code uses
  __align(align_t) and it seems to work, but is it correct?

* examples of uses for this new allcoator:
  - LSM Hooks
  - policy database of SE Linux (several different structure types)

Igor Stoppa (1):
  Sealable memory support

 mm/Makefile  |   2 +-
 mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/smalloc.h |  61 ++++++++++++++++++
 3 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 mm/smalloc.c
 create mode 100644 mm/smalloc.h

-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] [RFC v3]mm: ro protection for data allocated dynamically
@ 2017-05-19 10:38 ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-19 10:38 UTC (permalink / raw)
  To: mhocko, dave.hansen, labbott
  Cc: linux-mm, kernel-hardening, linux-kernel, Igor Stoppa

Not all the data allocated dynamically needs to be altered frequently.
In some cases, it might be written just once, at initialization.

This RFC has the goal of improving memory integrity, by explicitly
making said data write-protected.

A reference implementation is provided.

During the previous 2 rounds, some concerns/questions were risen.
This iteration should address msot of them, if not all.

Basic idea behind the implementation: on systems with MMU, the MMU
supports associating various types of attribute to memory pages.

One of them is being read-only.
The MMU will cause an exception upon attempts to alter a read-only page.
This mechanism is already in use for protecting: kernel text and
constant data.
Relatively recently, it has become possible to have also statically
allocated data to become read-only, with the __ro_after_init annotation.

However nothing is done for variables allocated dynamically.

The catch for re-using the same mechanism, is that soon-to-be read only
variables must be grouped in dedicated memory pages, without any rw data
falling in the same range.

This can be achieved with a dedicated allocator.

The implementation proposed allows to create memory pools.
Each pool can be treated independently from the others, allowing fine
grained control about what data can be overwritten.

A pool is a kernel linked list, where the head contains a mutex used for
accessing the list, and the elements are nodes, providing the memory
actually used.

When a pool receives an allocation request for which it doesn't have
enough memory already available, it obtains a set of contiguous virtual
pages (node) that is large enough to cover the request being processed.
Such memory is likely to be significantly larger than what was required.
The slack is used for fulfilling further allocation requests, provided
that they fit the space available.

The pool ends up being a list of nodes, where each node contains a
request that, at the time it was received, could not be satisfied by
using the exisitng nodes, plus other requests that happened to fit in the
slack. Such requests handle each node as an individual linear pool.

When it's time to seal/unseal a pool, each element (node) of the list is
visited and the range of pages it comprises is passed ot set_memory_ro/rw.

Freeing memory is supported at pool level: if for some reason one or more
memory requests must be discarded, at some point, they are simply ignored.
Upon the pool tear down, then nodes are removed one by one and the
corresponding memory range freed for good with vfree.

This approach avoids the extra coplexity of tracking individual
allocations, yet it allows to control claim back pages when not needed
anymore (i.e. module unloading.)

The same design also supports isolation between different kernel modules:
each module can allocae one or more pools, to obtain the desired level of
granularity when managing portions of its data that need different handling.

The price for this flexibility is that some more slack is produced.
The exact amount depends on the sizes of allocations performed and in
which order they arrive.

Modules that do not want/need all of this flexibility can use the default
global pool provided by the allocator.

This pool is intended to provide consistency with __ro_after_init and
therefore would be sealed at the same time.

Some observations/questions:

* the backend of the memory allocation is done by using vmalloc.
  Is here any better way? the bpf uses module_alloc but that seems not
  exactly its purpose.

* because of the vmalloc backend, this is not suitable for cases where
  it is really needed to have physically contiguous memory regions,
  however the type of data that would use this interface is likely to
  not require interaction with HW devices that could rise such need.

* the allocator supports defining a preferred alignment (currently set
  to 8 bytes, using uint64_t) - is it useful/desirable?
  If yes, is it the correct granularity (global)?

* to get the size of the padded header of a node, the current code uses
  __align(align_t) and it seems to work, but is it correct?

* examples of uses for this new allcoator:
  - LSM Hooks
  - policy database of SE Linux (several different structure types)

Igor Stoppa (1):
  Sealable memory support

 mm/Makefile  |   2 +-
 mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/smalloc.h |  61 ++++++++++++++++++
 3 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 mm/smalloc.c
 create mode 100644 mm/smalloc.h

-- 
2.9.3

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/1] Sealable memory support
  2017-05-19 10:38 ` Igor Stoppa
  (?)
@ 2017-05-19 10:38   ` Igor Stoppa
  -1 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-19 10:38 UTC (permalink / raw)
  To: mhocko, dave.hansen, labbott
  Cc: linux-mm, kernel-hardening, linux-kernel, Igor Stoppa

Dynamically allocated variables can be made read only,
after they have been initialized, provided that they reside in memory
pages devoid of any RW data.

The implementation supplies means to create independent pools of memory,
which can be individually created, sealed/unsealed and destroyed.

A global pool is made available for those kernel modules that do not
need to manage an independent pool.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 mm/Makefile  |   2 +-
 mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/smalloc.h |  61 ++++++++++++++++++
 3 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 mm/smalloc.c
 create mode 100644 mm/smalloc.h

diff --git a/mm/Makefile b/mm/Makefile
index 026f6a8..737c42a 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -39,7 +39,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o \
 			   mm_init.o mmu_context.o percpu.o slab_common.o \
 			   compaction.o vmacache.o swap_slots.o \
 			   interval_tree.o list_lru.o workingset.o \
-			   debug.o $(mmu-y)
+			   debug.o smalloc.o $(mmu-y)
 
 obj-y += init-mm.o
 
diff --git a/mm/smalloc.c b/mm/smalloc.c
new file mode 100644
index 0000000..fa04cc5
--- /dev/null
+++ b/mm/smalloc.c
@@ -0,0 +1,200 @@
+/*
+ * smalloc.c: Sealable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+
+
+#include <linux/vmalloc.h>
+#include <asm/cacheflush.h>
+#include "smalloc.h"
+
+#define page_roundup(size) (((size) + !(size) - 1 + PAGE_SIZE) & PAGE_MASK)
+
+#define pages_nr(size) (page_roundup(size) / PAGE_SIZE)
+
+static struct smalloc_pool *global_pool;
+
+struct smalloc_node *__smalloc_create_node(unsigned long words)
+{
+	struct smalloc_node *node;
+	unsigned long size;
+
+	/* Calculate the size to ask from vmalloc, page aligned. */
+	size = page_roundup(NODE_HEADER_SIZE + words * sizeof(align_t));
+	node = vmalloc(size);
+	if (!node) {
+		pr_err("No memory for allocating smalloc node.");
+		return NULL;
+	}
+	/* Initialize the node.*/
+	INIT_LIST_HEAD(&node->list);
+	node->free = node->data;
+	node->available_words = (size - NODE_HEADER_SIZE) / sizeof(align_t);
+	return node;
+}
+
+static __always_inline
+void *node_alloc(struct smalloc_node *node, unsigned long words)
+{
+	register align_t *old_free = node->free;
+
+	node->available_words -= words;
+	node->free += words;
+	return old_free;
+}
+
+void *smalloc(unsigned long size, struct smalloc_pool *pool)
+{
+	struct list_head *pos;
+	struct smalloc_node *node;
+	void *ptr;
+	unsigned long words;
+
+	/* If no pool specified, use the global one. */
+	if (!pool)
+		pool = global_pool;
+
+	mutex_lock(&pool->lock);
+
+	/* If the pool is sealed, then return NULL. */
+	if (pool->seal == SMALLOC_SEALED) {
+		mutex_unlock(&pool->lock);
+		return NULL;
+	}
+
+	/* Calculate minimum number of words required. */
+	words = (size + sizeof(align_t) - 1) / sizeof(align_t);
+
+	/* Look for slot that is large enough, in the existing pool.*/
+	list_for_each(pos, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		if (node->available_words >= words) {
+			ptr = node_alloc(node, words);
+			mutex_unlock(&pool->lock);
+			return ptr;
+		}
+	}
+
+	/* No slot found, get a new chunk of virtual memory. */
+	node = __smalloc_create_node(words);
+	if (!node) {
+		mutex_unlock(&pool->lock);
+		return NULL;
+	}
+
+	list_add(&node->list, &pool->list);
+	ptr = node_alloc(node, words);
+	mutex_unlock(&pool->lock);
+	return ptr;
+}
+
+static __always_inline
+unsigned long get_node_size(struct smalloc_node *node)
+{
+	if (!node)
+		return 0;
+	return page_roundup((((void *)node->free) - (void *)node) +
+			    node->available_words * sizeof(align_t));
+}
+
+static __always_inline
+unsigned long get_node_pages_nr(struct smalloc_node *node)
+{
+	return pages_nr(get_node_size(node));
+}
+void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool)
+{
+	struct list_head *pos;
+	struct smalloc_node *node;
+
+	if (!pool)
+		pool = global_pool;
+	mutex_lock(&pool->lock);
+	if (pool->seal == seal) {
+		mutex_unlock(&pool->lock);
+		return;
+	}
+	list_for_each(pos, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		if (seal == SMALLOC_SEALED)
+			set_memory_ro((unsigned long)node,
+				      get_node_pages_nr(node));
+		else if (seal == SMALLOC_UNSEALED)
+			set_memory_rw((unsigned long)node,
+				      get_node_pages_nr(node));
+	}
+	pool->seal = seal;
+	mutex_unlock(&pool->lock);
+}
+
+int smalloc_initialize(struct smalloc_pool *pool)
+{
+	if (!pool)
+		return -EINVAL;
+	INIT_LIST_HEAD(&pool->list);
+	pool->seal = SMALLOC_UNSEALED;
+	mutex_init(&pool->lock);
+	return 0;
+}
+
+struct smalloc_pool *smalloc_create(void)
+{
+	struct smalloc_pool *pool = vmalloc(sizeof(struct smalloc_pool));
+
+	if (!pool) {
+		pr_err("No memory for allocating pool.");
+		return NULL;
+	}
+	smalloc_initialize(pool);
+	return pool;
+}
+
+int smalloc_destroy(struct smalloc_pool *pool)
+{
+	struct list_head *pos, *q;
+	struct smalloc_node *node;
+
+	if (!pool)
+		return -EINVAL;
+	list_for_each_safe(pos, q, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		list_del(pos);
+		vfree(node);
+	}
+	return 0;
+}
+
+static int __init smalloc_init(void)
+{
+	global_pool = smalloc_create();
+	if (!global_pool) {
+		pr_err("Module smalloc initialization failed: no memory.\n");
+		return -ENOMEM;
+	}
+	pr_info("Module smalloc initialized successfully.\n");
+	return 0;
+}
+
+static void __exit smalloc_exit(void)
+{
+	pr_info("Module smalloc un initialized successfully.\n");
+}
+
+module_init(smalloc_init);
+module_exit(smalloc_exit);
+MODULE_LICENSE("GPL");
diff --git a/mm/smalloc.h b/mm/smalloc.h
new file mode 100644
index 0000000..344d962
--- /dev/null
+++ b/mm/smalloc.h
@@ -0,0 +1,61 @@
+/*
+ * smalloc.h: Header for Sealable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _SMALLOC_H
+#define _SMALLOC_H
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+typedef uint64_t align_t;
+
+enum seal_t {
+	SMALLOC_UNSEALED,
+	SMALLOC_SEALED,
+};
+
+#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
+
+#define NODE_HEADER					\
+	struct {					\
+		__SMALLOC_ALIGNED__ struct {		\
+			struct list_head list;		\
+			align_t *free;			\
+			unsigned long available_words;	\
+		};					\
+	}
+
+#define NODE_HEADER_SIZE sizeof(NODE_HEADER)
+
+struct smalloc_pool {
+	struct list_head list;
+	struct mutex lock;
+	enum seal_t seal;
+};
+
+struct smalloc_node {
+	NODE_HEADER;
+	__SMALLOC_ALIGNED__ align_t data[];
+};
+
+#define smalloc_seal(pool) \
+	smalloc_seal_set(SMALLOC_SEALED, pool)
+
+#define smalloc_unseal(pool) \
+	smalloc_seal_set(SMALLOC_UNSEALED, pool)
+
+struct smalloc_pool *smalloc_create(void);
+int smalloc_destroy(struct smalloc_pool *pool);
+int smalloc_initialize(struct smalloc_pool *pool);
+void *smalloc(unsigned long size, struct smalloc_pool *pool);
+void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool);
+#endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 1/1] Sealable memory support
@ 2017-05-19 10:38   ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-19 10:38 UTC (permalink / raw)
  To: mhocko, dave.hansen, labbott
  Cc: linux-mm, kernel-hardening, linux-kernel, Igor Stoppa

Dynamically allocated variables can be made read only,
after they have been initialized, provided that they reside in memory
pages devoid of any RW data.

The implementation supplies means to create independent pools of memory,
which can be individually created, sealed/unsealed and destroyed.

A global pool is made available for those kernel modules that do not
need to manage an independent pool.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 mm/Makefile  |   2 +-
 mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/smalloc.h |  61 ++++++++++++++++++
 3 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 mm/smalloc.c
 create mode 100644 mm/smalloc.h

diff --git a/mm/Makefile b/mm/Makefile
index 026f6a8..737c42a 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -39,7 +39,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o \
 			   mm_init.o mmu_context.o percpu.o slab_common.o \
 			   compaction.o vmacache.o swap_slots.o \
 			   interval_tree.o list_lru.o workingset.o \
-			   debug.o $(mmu-y)
+			   debug.o smalloc.o $(mmu-y)
 
 obj-y += init-mm.o
 
diff --git a/mm/smalloc.c b/mm/smalloc.c
new file mode 100644
index 0000000..fa04cc5
--- /dev/null
+++ b/mm/smalloc.c
@@ -0,0 +1,200 @@
+/*
+ * smalloc.c: Sealable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+
+
+#include <linux/vmalloc.h>
+#include <asm/cacheflush.h>
+#include "smalloc.h"
+
+#define page_roundup(size) (((size) + !(size) - 1 + PAGE_SIZE) & PAGE_MASK)
+
+#define pages_nr(size) (page_roundup(size) / PAGE_SIZE)
+
+static struct smalloc_pool *global_pool;
+
+struct smalloc_node *__smalloc_create_node(unsigned long words)
+{
+	struct smalloc_node *node;
+	unsigned long size;
+
+	/* Calculate the size to ask from vmalloc, page aligned. */
+	size = page_roundup(NODE_HEADER_SIZE + words * sizeof(align_t));
+	node = vmalloc(size);
+	if (!node) {
+		pr_err("No memory for allocating smalloc node.");
+		return NULL;
+	}
+	/* Initialize the node.*/
+	INIT_LIST_HEAD(&node->list);
+	node->free = node->data;
+	node->available_words = (size - NODE_HEADER_SIZE) / sizeof(align_t);
+	return node;
+}
+
+static __always_inline
+void *node_alloc(struct smalloc_node *node, unsigned long words)
+{
+	register align_t *old_free = node->free;
+
+	node->available_words -= words;
+	node->free += words;
+	return old_free;
+}
+
+void *smalloc(unsigned long size, struct smalloc_pool *pool)
+{
+	struct list_head *pos;
+	struct smalloc_node *node;
+	void *ptr;
+	unsigned long words;
+
+	/* If no pool specified, use the global one. */
+	if (!pool)
+		pool = global_pool;
+
+	mutex_lock(&pool->lock);
+
+	/* If the pool is sealed, then return NULL. */
+	if (pool->seal == SMALLOC_SEALED) {
+		mutex_unlock(&pool->lock);
+		return NULL;
+	}
+
+	/* Calculate minimum number of words required. */
+	words = (size + sizeof(align_t) - 1) / sizeof(align_t);
+
+	/* Look for slot that is large enough, in the existing pool.*/
+	list_for_each(pos, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		if (node->available_words >= words) {
+			ptr = node_alloc(node, words);
+			mutex_unlock(&pool->lock);
+			return ptr;
+		}
+	}
+
+	/* No slot found, get a new chunk of virtual memory. */
+	node = __smalloc_create_node(words);
+	if (!node) {
+		mutex_unlock(&pool->lock);
+		return NULL;
+	}
+
+	list_add(&node->list, &pool->list);
+	ptr = node_alloc(node, words);
+	mutex_unlock(&pool->lock);
+	return ptr;
+}
+
+static __always_inline
+unsigned long get_node_size(struct smalloc_node *node)
+{
+	if (!node)
+		return 0;
+	return page_roundup((((void *)node->free) - (void *)node) +
+			    node->available_words * sizeof(align_t));
+}
+
+static __always_inline
+unsigned long get_node_pages_nr(struct smalloc_node *node)
+{
+	return pages_nr(get_node_size(node));
+}
+void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool)
+{
+	struct list_head *pos;
+	struct smalloc_node *node;
+
+	if (!pool)
+		pool = global_pool;
+	mutex_lock(&pool->lock);
+	if (pool->seal == seal) {
+		mutex_unlock(&pool->lock);
+		return;
+	}
+	list_for_each(pos, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		if (seal == SMALLOC_SEALED)
+			set_memory_ro((unsigned long)node,
+				      get_node_pages_nr(node));
+		else if (seal == SMALLOC_UNSEALED)
+			set_memory_rw((unsigned long)node,
+				      get_node_pages_nr(node));
+	}
+	pool->seal = seal;
+	mutex_unlock(&pool->lock);
+}
+
+int smalloc_initialize(struct smalloc_pool *pool)
+{
+	if (!pool)
+		return -EINVAL;
+	INIT_LIST_HEAD(&pool->list);
+	pool->seal = SMALLOC_UNSEALED;
+	mutex_init(&pool->lock);
+	return 0;
+}
+
+struct smalloc_pool *smalloc_create(void)
+{
+	struct smalloc_pool *pool = vmalloc(sizeof(struct smalloc_pool));
+
+	if (!pool) {
+		pr_err("No memory for allocating pool.");
+		return NULL;
+	}
+	smalloc_initialize(pool);
+	return pool;
+}
+
+int smalloc_destroy(struct smalloc_pool *pool)
+{
+	struct list_head *pos, *q;
+	struct smalloc_node *node;
+
+	if (!pool)
+		return -EINVAL;
+	list_for_each_safe(pos, q, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		list_del(pos);
+		vfree(node);
+	}
+	return 0;
+}
+
+static int __init smalloc_init(void)
+{
+	global_pool = smalloc_create();
+	if (!global_pool) {
+		pr_err("Module smalloc initialization failed: no memory.\n");
+		return -ENOMEM;
+	}
+	pr_info("Module smalloc initialized successfully.\n");
+	return 0;
+}
+
+static void __exit smalloc_exit(void)
+{
+	pr_info("Module smalloc un initialized successfully.\n");
+}
+
+module_init(smalloc_init);
+module_exit(smalloc_exit);
+MODULE_LICENSE("GPL");
diff --git a/mm/smalloc.h b/mm/smalloc.h
new file mode 100644
index 0000000..344d962
--- /dev/null
+++ b/mm/smalloc.h
@@ -0,0 +1,61 @@
+/*
+ * smalloc.h: Header for Sealable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _SMALLOC_H
+#define _SMALLOC_H
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+typedef uint64_t align_t;
+
+enum seal_t {
+	SMALLOC_UNSEALED,
+	SMALLOC_SEALED,
+};
+
+#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
+
+#define NODE_HEADER					\
+	struct {					\
+		__SMALLOC_ALIGNED__ struct {		\
+			struct list_head list;		\
+			align_t *free;			\
+			unsigned long available_words;	\
+		};					\
+	}
+
+#define NODE_HEADER_SIZE sizeof(NODE_HEADER)
+
+struct smalloc_pool {
+	struct list_head list;
+	struct mutex lock;
+	enum seal_t seal;
+};
+
+struct smalloc_node {
+	NODE_HEADER;
+	__SMALLOC_ALIGNED__ align_t data[];
+};
+
+#define smalloc_seal(pool) \
+	smalloc_seal_set(SMALLOC_SEALED, pool)
+
+#define smalloc_unseal(pool) \
+	smalloc_seal_set(SMALLOC_UNSEALED, pool)
+
+struct smalloc_pool *smalloc_create(void);
+int smalloc_destroy(struct smalloc_pool *pool);
+int smalloc_initialize(struct smalloc_pool *pool);
+void *smalloc(unsigned long size, struct smalloc_pool *pool);
+void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool);
+#endif
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [kernel-hardening] [PATCH 1/1] Sealable memory support
@ 2017-05-19 10:38   ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-19 10:38 UTC (permalink / raw)
  To: mhocko, dave.hansen, labbott
  Cc: linux-mm, kernel-hardening, linux-kernel, Igor Stoppa

Dynamically allocated variables can be made read only,
after they have been initialized, provided that they reside in memory
pages devoid of any RW data.

The implementation supplies means to create independent pools of memory,
which can be individually created, sealed/unsealed and destroyed.

A global pool is made available for those kernel modules that do not
need to manage an independent pool.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 mm/Makefile  |   2 +-
 mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/smalloc.h |  61 ++++++++++++++++++
 3 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 mm/smalloc.c
 create mode 100644 mm/smalloc.h

diff --git a/mm/Makefile b/mm/Makefile
index 026f6a8..737c42a 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -39,7 +39,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o \
 			   mm_init.o mmu_context.o percpu.o slab_common.o \
 			   compaction.o vmacache.o swap_slots.o \
 			   interval_tree.o list_lru.o workingset.o \
-			   debug.o $(mmu-y)
+			   debug.o smalloc.o $(mmu-y)
 
 obj-y += init-mm.o
 
diff --git a/mm/smalloc.c b/mm/smalloc.c
new file mode 100644
index 0000000..fa04cc5
--- /dev/null
+++ b/mm/smalloc.c
@@ -0,0 +1,200 @@
+/*
+ * smalloc.c: Sealable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+
+
+#include <linux/vmalloc.h>
+#include <asm/cacheflush.h>
+#include "smalloc.h"
+
+#define page_roundup(size) (((size) + !(size) - 1 + PAGE_SIZE) & PAGE_MASK)
+
+#define pages_nr(size) (page_roundup(size) / PAGE_SIZE)
+
+static struct smalloc_pool *global_pool;
+
+struct smalloc_node *__smalloc_create_node(unsigned long words)
+{
+	struct smalloc_node *node;
+	unsigned long size;
+
+	/* Calculate the size to ask from vmalloc, page aligned. */
+	size = page_roundup(NODE_HEADER_SIZE + words * sizeof(align_t));
+	node = vmalloc(size);
+	if (!node) {
+		pr_err("No memory for allocating smalloc node.");
+		return NULL;
+	}
+	/* Initialize the node.*/
+	INIT_LIST_HEAD(&node->list);
+	node->free = node->data;
+	node->available_words = (size - NODE_HEADER_SIZE) / sizeof(align_t);
+	return node;
+}
+
+static __always_inline
+void *node_alloc(struct smalloc_node *node, unsigned long words)
+{
+	register align_t *old_free = node->free;
+
+	node->available_words -= words;
+	node->free += words;
+	return old_free;
+}
+
+void *smalloc(unsigned long size, struct smalloc_pool *pool)
+{
+	struct list_head *pos;
+	struct smalloc_node *node;
+	void *ptr;
+	unsigned long words;
+
+	/* If no pool specified, use the global one. */
+	if (!pool)
+		pool = global_pool;
+
+	mutex_lock(&pool->lock);
+
+	/* If the pool is sealed, then return NULL. */
+	if (pool->seal == SMALLOC_SEALED) {
+		mutex_unlock(&pool->lock);
+		return NULL;
+	}
+
+	/* Calculate minimum number of words required. */
+	words = (size + sizeof(align_t) - 1) / sizeof(align_t);
+
+	/* Look for slot that is large enough, in the existing pool.*/
+	list_for_each(pos, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		if (node->available_words >= words) {
+			ptr = node_alloc(node, words);
+			mutex_unlock(&pool->lock);
+			return ptr;
+		}
+	}
+
+	/* No slot found, get a new chunk of virtual memory. */
+	node = __smalloc_create_node(words);
+	if (!node) {
+		mutex_unlock(&pool->lock);
+		return NULL;
+	}
+
+	list_add(&node->list, &pool->list);
+	ptr = node_alloc(node, words);
+	mutex_unlock(&pool->lock);
+	return ptr;
+}
+
+static __always_inline
+unsigned long get_node_size(struct smalloc_node *node)
+{
+	if (!node)
+		return 0;
+	return page_roundup((((void *)node->free) - (void *)node) +
+			    node->available_words * sizeof(align_t));
+}
+
+static __always_inline
+unsigned long get_node_pages_nr(struct smalloc_node *node)
+{
+	return pages_nr(get_node_size(node));
+}
+void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool)
+{
+	struct list_head *pos;
+	struct smalloc_node *node;
+
+	if (!pool)
+		pool = global_pool;
+	mutex_lock(&pool->lock);
+	if (pool->seal == seal) {
+		mutex_unlock(&pool->lock);
+		return;
+	}
+	list_for_each(pos, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		if (seal == SMALLOC_SEALED)
+			set_memory_ro((unsigned long)node,
+				      get_node_pages_nr(node));
+		else if (seal == SMALLOC_UNSEALED)
+			set_memory_rw((unsigned long)node,
+				      get_node_pages_nr(node));
+	}
+	pool->seal = seal;
+	mutex_unlock(&pool->lock);
+}
+
+int smalloc_initialize(struct smalloc_pool *pool)
+{
+	if (!pool)
+		return -EINVAL;
+	INIT_LIST_HEAD(&pool->list);
+	pool->seal = SMALLOC_UNSEALED;
+	mutex_init(&pool->lock);
+	return 0;
+}
+
+struct smalloc_pool *smalloc_create(void)
+{
+	struct smalloc_pool *pool = vmalloc(sizeof(struct smalloc_pool));
+
+	if (!pool) {
+		pr_err("No memory for allocating pool.");
+		return NULL;
+	}
+	smalloc_initialize(pool);
+	return pool;
+}
+
+int smalloc_destroy(struct smalloc_pool *pool)
+{
+	struct list_head *pos, *q;
+	struct smalloc_node *node;
+
+	if (!pool)
+		return -EINVAL;
+	list_for_each_safe(pos, q, &pool->list) {
+		node = list_entry(pos, struct smalloc_node, list);
+		list_del(pos);
+		vfree(node);
+	}
+	return 0;
+}
+
+static int __init smalloc_init(void)
+{
+	global_pool = smalloc_create();
+	if (!global_pool) {
+		pr_err("Module smalloc initialization failed: no memory.\n");
+		return -ENOMEM;
+	}
+	pr_info("Module smalloc initialized successfully.\n");
+	return 0;
+}
+
+static void __exit smalloc_exit(void)
+{
+	pr_info("Module smalloc un initialized successfully.\n");
+}
+
+module_init(smalloc_init);
+module_exit(smalloc_exit);
+MODULE_LICENSE("GPL");
diff --git a/mm/smalloc.h b/mm/smalloc.h
new file mode 100644
index 0000000..344d962
--- /dev/null
+++ b/mm/smalloc.h
@@ -0,0 +1,61 @@
+/*
+ * smalloc.h: Header for Sealable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _SMALLOC_H
+#define _SMALLOC_H
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+typedef uint64_t align_t;
+
+enum seal_t {
+	SMALLOC_UNSEALED,
+	SMALLOC_SEALED,
+};
+
+#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
+
+#define NODE_HEADER					\
+	struct {					\
+		__SMALLOC_ALIGNED__ struct {		\
+			struct list_head list;		\
+			align_t *free;			\
+			unsigned long available_words;	\
+		};					\
+	}
+
+#define NODE_HEADER_SIZE sizeof(NODE_HEADER)
+
+struct smalloc_pool {
+	struct list_head list;
+	struct mutex lock;
+	enum seal_t seal;
+};
+
+struct smalloc_node {
+	NODE_HEADER;
+	__SMALLOC_ALIGNED__ align_t data[];
+};
+
+#define smalloc_seal(pool) \
+	smalloc_seal_set(SMALLOC_SEALED, pool)
+
+#define smalloc_unseal(pool) \
+	smalloc_seal_set(SMALLOC_UNSEALED, pool)
+
+struct smalloc_pool *smalloc_create(void);
+int smalloc_destroy(struct smalloc_pool *pool);
+int smalloc_initialize(struct smalloc_pool *pool);
+void *smalloc(unsigned long size, struct smalloc_pool *pool);
+void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool);
+#endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] [PATCH 1/1] Sealable memory support
  2017-05-19 10:38   ` Igor Stoppa
@ 2017-05-20  8:51     ` Greg KH
  -1 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2017-05-20  8:51 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: mhocko, dave.hansen, labbott, linux-mm, kernel-hardening, linux-kernel

On Fri, May 19, 2017 at 01:38:11PM +0300, Igor Stoppa wrote:
> Dynamically allocated variables can be made read only,
> after they have been initialized, provided that they reside in memory
> pages devoid of any RW data.
> 
> The implementation supplies means to create independent pools of memory,
> which can be individually created, sealed/unsealed and destroyed.
> 
> A global pool is made available for those kernel modules that do not
> need to manage an independent pool.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  mm/Makefile  |   2 +-
>  mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/smalloc.h |  61 ++++++++++++++++++
>  3 files changed, 262 insertions(+), 1 deletion(-)
>  create mode 100644 mm/smalloc.c
>  create mode 100644 mm/smalloc.h

This is really nice, do you have a follow-on patch showing how any of
the kernel can be changed to use this new subsystem?  Without that, it
might be hard to get this approved (we don't like adding new apis
without users.)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] [PATCH 1/1] Sealable memory support
@ 2017-05-20  8:51     ` Greg KH
  0 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2017-05-20  8:51 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: mhocko, dave.hansen, labbott, linux-mm, kernel-hardening, linux-kernel

On Fri, May 19, 2017 at 01:38:11PM +0300, Igor Stoppa wrote:
> Dynamically allocated variables can be made read only,
> after they have been initialized, provided that they reside in memory
> pages devoid of any RW data.
> 
> The implementation supplies means to create independent pools of memory,
> which can be individually created, sealed/unsealed and destroyed.
> 
> A global pool is made available for those kernel modules that do not
> need to manage an independent pool.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  mm/Makefile  |   2 +-
>  mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/smalloc.h |  61 ++++++++++++++++++
>  3 files changed, 262 insertions(+), 1 deletion(-)
>  create mode 100644 mm/smalloc.c
>  create mode 100644 mm/smalloc.h

This is really nice, do you have a follow-on patch showing how any of
the kernel can be changed to use this new subsystem?  Without that, it
might be hard to get this approved (we don't like adding new apis
without users.)

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
  2017-05-20  8:51     ` Greg KH
  (?)
  (?)
@ 2017-05-21 11:14       ` Tetsuo Handa
  -1 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-21 11:14 UTC (permalink / raw)
  To: linux-security-module
  Cc: linux-mm, kernel-hardening, linux-kernel, Tetsuo Handa,
	Casey Schaufler, Greg KH, Igor Stoppa, James Morris, Kees Cook,
	Paul Moore, Stephen Smalley

A sealable memory allocator patch was proposed at
http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
and is waiting for a follow-on patch showing how any of the kernel
can be changed to use this new subsystem. So, here it is for LSM hooks.

The LSM hooks ("struct security_hook_heads security_hook_heads" and
"struct security_hook_list ...[]") will benefit from this allocator via
protection using set_memory_ro()/set_memory_rw(), and it will remove
CONFIG_SECURITY_WRITABLE_HOOKS config option.

This means that these structures will be allocated at run time using
smalloc(), and therefore the address of these structures will be
determined at run time rather than compile time.

But currently, LSM_HOOK_INIT() macro depends on the address of
security_hook_heads being known at compile time. But we already
initialize security_hook_heads as an array of "struct list_head".

Therefore, let's use index number (or relative offset from the head
of security_hook_heads) instead of absolute address of
security_hook_heads so that LSM_HOOK_INIT() macro does not need to
know absolute address of security_hook_heads. Then, security_add_hooks()
will be able to allocate and copy "struct security_hook_list ...[]" using
smalloc().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Igor Stoppa <igor.stoppa@huawei.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
---
 include/linux/lsm_hooks.h |  6 +++---
 security/security.c       | 10 ++++++++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 080f34e..865c11d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1884,8 +1884,8 @@ struct security_hook_heads {
  */
 struct security_hook_list {
 	struct list_head		list;
-	struct list_head		*head;
 	union security_list_options	hook;
+	const unsigned int		idx;
 	char				*lsm;
 };
 
@@ -1896,9 +1896,9 @@ struct security_hook_list {
  * text involved.
  */
 #define LSM_HOOK_INIT(HEAD, HOOK) \
-	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
+	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
+		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
 
-extern struct security_hook_heads security_hook_heads;
 extern char *lsm_names;
 
 extern void security_add_hooks(struct security_hook_list *hooks, int count,
diff --git a/security/security.c b/security/security.c
index 54b1e39..d6883ce 100644
--- a/security/security.c
+++ b/security/security.c
@@ -33,7 +33,7 @@
 /* Maximum number of letters for an LSM name string */
 #define SECURITY_NAME_MAX	10
 
-struct security_hook_heads security_hook_heads __lsm_ro_after_init;
+static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
 char *lsm_names;
 /* Boot-time LSM user choice */
 static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
@@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
 				char *lsm)
 {
 	int i;
+	struct list_head *list = (struct list_head *) &security_hook_heads;
 
 	for (i = 0; i < count; i++) {
+		const unsigned int idx = hooks[i].idx;
+
+		if (WARN_ON(idx >= sizeof(security_hook_heads) /
+			    sizeof(struct list_head)))
+			continue;
 		hooks[i].lsm = lsm;
-		list_add_tail_rcu(&hooks[i].list, hooks[i].head);
+		list_add_tail_rcu(&hooks[i].list, &list[idx]);
 	}
 	if (lsm_append(lsm, &lsm_names) < 0)
 		panic("%s - Cannot get early memory.\n", __func__);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-21 11:14       ` Tetsuo Handa
  0 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-21 11:14 UTC (permalink / raw)
  To: linux-security-module

A sealable memory allocator patch was proposed at
http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa at huawei.com ,
and is waiting for a follow-on patch showing how any of the kernel
can be changed to use this new subsystem. So, here it is for LSM hooks.

The LSM hooks ("struct security_hook_heads security_hook_heads" and
"struct security_hook_list ...[]") will benefit from this allocator via
protection using set_memory_ro()/set_memory_rw(), and it will remove
CONFIG_SECURITY_WRITABLE_HOOKS config option.

This means that these structures will be allocated at run time using
smalloc(), and therefore the address of these structures will be
determined at run time rather than compile time.

But currently, LSM_HOOK_INIT() macro depends on the address of
security_hook_heads being known at compile time. But we already
initialize security_hook_heads as an array of "struct list_head".

Therefore, let's use index number (or relative offset from the head
of security_hook_heads) instead of absolute address of
security_hook_heads so that LSM_HOOK_INIT() macro does not need to
know absolute address of security_hook_heads. Then, security_add_hooks()
will be able to allocate and copy "struct security_hook_list ...[]" using
smalloc().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Igor Stoppa <igor.stoppa@huawei.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
---
 include/linux/lsm_hooks.h |  6 +++---
 security/security.c       | 10 ++++++++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 080f34e..865c11d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1884,8 +1884,8 @@ struct security_hook_heads {
  */
 struct security_hook_list {
 	struct list_head		list;
-	struct list_head		*head;
 	union security_list_options	hook;
+	const unsigned int		idx;
 	char				*lsm;
 };
 
@@ -1896,9 +1896,9 @@ struct security_hook_list {
  * text involved.
  */
 #define LSM_HOOK_INIT(HEAD, HOOK) \
-	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
+	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
+		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
 
-extern struct security_hook_heads security_hook_heads;
 extern char *lsm_names;
 
 extern void security_add_hooks(struct security_hook_list *hooks, int count,
diff --git a/security/security.c b/security/security.c
index 54b1e39..d6883ce 100644
--- a/security/security.c
+++ b/security/security.c
@@ -33,7 +33,7 @@
 /* Maximum number of letters for an LSM name string */
 #define SECURITY_NAME_MAX	10
 
-struct security_hook_heads security_hook_heads __lsm_ro_after_init;
+static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
 char *lsm_names;
 /* Boot-time LSM user choice */
 static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
@@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
 				char *lsm)
 {
 	int i;
+	struct list_head *list = (struct list_head *) &security_hook_heads;
 
 	for (i = 0; i < count; i++) {
+		const unsigned int idx = hooks[i].idx;
+
+		if (WARN_ON(idx >= sizeof(security_hook_heads) /
+			    sizeof(struct list_head)))
+			continue;
 		hooks[i].lsm = lsm;
-		list_add_tail_rcu(&hooks[i].list, hooks[i].head);
+		list_add_tail_rcu(&hooks[i].list, &list[idx]);
 	}
 	if (lsm_append(lsm, &lsm_names) < 0)
 		panic("%s - Cannot get early memory.\n", __func__);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info@ http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-21 11:14       ` Tetsuo Handa
  0 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-21 11:14 UTC (permalink / raw)
  To: linux-security-module
  Cc: linux-mm, kernel-hardening, linux-kernel, Tetsuo Handa,
	Casey Schaufler, Greg KH, Igor Stoppa, James Morris, Kees Cook,
	Paul Moore, Stephen Smalley

A sealable memory allocator patch was proposed at
http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
and is waiting for a follow-on patch showing how any of the kernel
can be changed to use this new subsystem. So, here it is for LSM hooks.

The LSM hooks ("struct security_hook_heads security_hook_heads" and
"struct security_hook_list ...[]") will benefit from this allocator via
protection using set_memory_ro()/set_memory_rw(), and it will remove
CONFIG_SECURITY_WRITABLE_HOOKS config option.

This means that these structures will be allocated at run time using
smalloc(), and therefore the address of these structures will be
determined at run time rather than compile time.

But currently, LSM_HOOK_INIT() macro depends on the address of
security_hook_heads being known at compile time. But we already
initialize security_hook_heads as an array of "struct list_head".

Therefore, let's use index number (or relative offset from the head
of security_hook_heads) instead of absolute address of
security_hook_heads so that LSM_HOOK_INIT() macro does not need to
know absolute address of security_hook_heads. Then, security_add_hooks()
will be able to allocate and copy "struct security_hook_list ...[]" using
smalloc().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Igor Stoppa <igor.stoppa@huawei.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
---
 include/linux/lsm_hooks.h |  6 +++---
 security/security.c       | 10 ++++++++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 080f34e..865c11d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1884,8 +1884,8 @@ struct security_hook_heads {
  */
 struct security_hook_list {
 	struct list_head		list;
-	struct list_head		*head;
 	union security_list_options	hook;
+	const unsigned int		idx;
 	char				*lsm;
 };
 
@@ -1896,9 +1896,9 @@ struct security_hook_list {
  * text involved.
  */
 #define LSM_HOOK_INIT(HEAD, HOOK) \
-	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
+	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
+		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
 
-extern struct security_hook_heads security_hook_heads;
 extern char *lsm_names;
 
 extern void security_add_hooks(struct security_hook_list *hooks, int count,
diff --git a/security/security.c b/security/security.c
index 54b1e39..d6883ce 100644
--- a/security/security.c
+++ b/security/security.c
@@ -33,7 +33,7 @@
 /* Maximum number of letters for an LSM name string */
 #define SECURITY_NAME_MAX	10
 
-struct security_hook_heads security_hook_heads __lsm_ro_after_init;
+static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
 char *lsm_names;
 /* Boot-time LSM user choice */
 static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
@@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
 				char *lsm)
 {
 	int i;
+	struct list_head *list = (struct list_head *) &security_hook_heads;
 
 	for (i = 0; i < count; i++) {
+		const unsigned int idx = hooks[i].idx;
+
+		if (WARN_ON(idx >= sizeof(security_hook_heads) /
+			    sizeof(struct list_head)))
+			continue;
 		hooks[i].lsm = lsm;
-		list_add_tail_rcu(&hooks[i].list, hooks[i].head);
+		list_add_tail_rcu(&hooks[i].list, &list[idx]);
 	}
 	if (lsm_append(lsm, &lsm_names) < 0)
 		panic("%s - Cannot get early memory.\n", __func__);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [kernel-hardening] [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-21 11:14       ` Tetsuo Handa
  0 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-21 11:14 UTC (permalink / raw)
  To: linux-security-module
  Cc: linux-mm, kernel-hardening, linux-kernel, Tetsuo Handa,
	Casey Schaufler, Greg KH, Igor Stoppa, James Morris, Kees Cook,
	Paul Moore, Stephen Smalley

A sealable memory allocator patch was proposed at
http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
and is waiting for a follow-on patch showing how any of the kernel
can be changed to use this new subsystem. So, here it is for LSM hooks.

The LSM hooks ("struct security_hook_heads security_hook_heads" and
"struct security_hook_list ...[]") will benefit from this allocator via
protection using set_memory_ro()/set_memory_rw(), and it will remove
CONFIG_SECURITY_WRITABLE_HOOKS config option.

This means that these structures will be allocated at run time using
smalloc(), and therefore the address of these structures will be
determined at run time rather than compile time.

But currently, LSM_HOOK_INIT() macro depends on the address of
security_hook_heads being known at compile time. But we already
initialize security_hook_heads as an array of "struct list_head".

Therefore, let's use index number (or relative offset from the head
of security_hook_heads) instead of absolute address of
security_hook_heads so that LSM_HOOK_INIT() macro does not need to
know absolute address of security_hook_heads. Then, security_add_hooks()
will be able to allocate and copy "struct security_hook_list ...[]" using
smalloc().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Igor Stoppa <igor.stoppa@huawei.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
---
 include/linux/lsm_hooks.h |  6 +++---
 security/security.c       | 10 ++++++++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 080f34e..865c11d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1884,8 +1884,8 @@ struct security_hook_heads {
  */
 struct security_hook_list {
 	struct list_head		list;
-	struct list_head		*head;
 	union security_list_options	hook;
+	const unsigned int		idx;
 	char				*lsm;
 };
 
@@ -1896,9 +1896,9 @@ struct security_hook_list {
  * text involved.
  */
 #define LSM_HOOK_INIT(HEAD, HOOK) \
-	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
+	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
+		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
 
-extern struct security_hook_heads security_hook_heads;
 extern char *lsm_names;
 
 extern void security_add_hooks(struct security_hook_list *hooks, int count,
diff --git a/security/security.c b/security/security.c
index 54b1e39..d6883ce 100644
--- a/security/security.c
+++ b/security/security.c
@@ -33,7 +33,7 @@
 /* Maximum number of letters for an LSM name string */
 #define SECURITY_NAME_MAX	10
 
-struct security_hook_heads security_hook_heads __lsm_ro_after_init;
+static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
 char *lsm_names;
 /* Boot-time LSM user choice */
 static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
@@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
 				char *lsm)
 {
 	int i;
+	struct list_head *list = (struct list_head *) &security_hook_heads;
 
 	for (i = 0; i < count; i++) {
+		const unsigned int idx = hooks[i].idx;
+
+		if (WARN_ON(idx >= sizeof(security_hook_heads) /
+			    sizeof(struct list_head)))
+			continue;
 		hooks[i].lsm = lsm;
-		list_add_tail_rcu(&hooks[i].list, hooks[i].head);
+		list_add_tail_rcu(&hooks[i].list, &list[idx]);
 	}
 	if (lsm_append(lsm, &lsm_names) < 0)
 		panic("%s - Cannot get early memory.\n", __func__);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
  2017-05-21 11:14       ` Tetsuo Handa
  (?)
  (?)
@ 2017-05-22 14:03         ` Christoph Hellwig
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2017-05-22 14:03 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Casey Schaufler, Greg KH, Igor Stoppa, James Morris, Kees Cook,
	Paul Moore, Stephen Smalley

On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
> A sealable memory allocator patch was proposed at
> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
> and is waiting for a follow-on patch showing how any of the kernel
> can be changed to use this new subsystem. So, here it is for LSM hooks.
> 
> The LSM hooks ("struct security_hook_heads security_hook_heads" and
> "struct security_hook_list ...[]") will benefit from this allocator via
> protection using set_memory_ro()/set_memory_rw(), and it will remove
> CONFIG_SECURITY_WRITABLE_HOOKS config option.
> 
> This means that these structures will be allocated at run time using
> smalloc(), and therefore the address of these structures will be
> determined at run time rather than compile time.
> 
> But currently, LSM_HOOK_INIT() macro depends on the address of
> security_hook_heads being known at compile time. But we already
> initialize security_hook_heads as an array of "struct list_head".
> 
> Therefore, let's use index number (or relative offset from the head
> of security_hook_heads) instead of absolute address of
> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
> know absolute address of security_hook_heads. Then, security_add_hooks()
> will be able to allocate and copy "struct security_hook_list ...[]" using
> smalloc().
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: Casey Schaufler <casey@schaufler-ca.com>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Igor Stoppa <igor.stoppa@huawei.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> ---
>  include/linux/lsm_hooks.h |  6 +++---
>  security/security.c       | 10 ++++++++--
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 080f34e..865c11d 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>   */
>  struct security_hook_list {
>  	struct list_head		list;
> -	struct list_head		*head;
>  	union security_list_options	hook;
> +	const unsigned int		idx;
>  	char				*lsm;
>  };
>  
> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>   * text involved.
>   */
>  #define LSM_HOOK_INIT(HEAD, HOOK) \
> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>  
> -extern struct security_hook_heads security_hook_heads;
>  extern char *lsm_names;
>  
>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
> diff --git a/security/security.c b/security/security.c
> index 54b1e39..d6883ce 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -33,7 +33,7 @@
>  /* Maximum number of letters for an LSM name string */
>  #define SECURITY_NAME_MAX	10
>  
> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>  char *lsm_names;
>  /* Boot-time LSM user choice */
>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>  				char *lsm)
>  {
>  	int i;
> +	struct list_head *list = (struct list_head *) &security_hook_heads;

Eww, struct casts.  This whole security_hook_heads scheme stink,
even with the slight improvements from Tetsuo.  It has everything we
shouldn't do - function pointers in structures that are not hard
read-only, structure casts, etc.

What's the reason why can't just have good old const function tables?
Yeah, stackable LSM make that a little harder, but they should not be
enable by default anyway.  But even with those we can still chain
them together with a list with external linkage.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 14:03         ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2017-05-22 14:03 UTC (permalink / raw)
  To: linux-security-module

On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
> A sealable memory allocator patch was proposed at
> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa at huawei.com ,
> and is waiting for a follow-on patch showing how any of the kernel
> can be changed to use this new subsystem. So, here it is for LSM hooks.
> 
> The LSM hooks ("struct security_hook_heads security_hook_heads" and
> "struct security_hook_list ...[]") will benefit from this allocator via
> protection using set_memory_ro()/set_memory_rw(), and it will remove
> CONFIG_SECURITY_WRITABLE_HOOKS config option.
> 
> This means that these structures will be allocated at run time using
> smalloc(), and therefore the address of these structures will be
> determined at run time rather than compile time.
> 
> But currently, LSM_HOOK_INIT() macro depends on the address of
> security_hook_heads being known at compile time. But we already
> initialize security_hook_heads as an array of "struct list_head".
> 
> Therefore, let's use index number (or relative offset from the head
> of security_hook_heads) instead of absolute address of
> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
> know absolute address of security_hook_heads. Then, security_add_hooks()
> will be able to allocate and copy "struct security_hook_list ...[]" using
> smalloc().
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: Casey Schaufler <casey@schaufler-ca.com>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Igor Stoppa <igor.stoppa@huawei.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> ---
>  include/linux/lsm_hooks.h |  6 +++---
>  security/security.c       | 10 ++++++++--
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 080f34e..865c11d 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>   */
>  struct security_hook_list {
>  	struct list_head		list;
> -	struct list_head		*head;
>  	union security_list_options	hook;
> +	const unsigned int		idx;
>  	char				*lsm;
>  };
>  
> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>   * text involved.
>   */
>  #define LSM_HOOK_INIT(HEAD, HOOK) \
> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>  
> -extern struct security_hook_heads security_hook_heads;
>  extern char *lsm_names;
>  
>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
> diff --git a/security/security.c b/security/security.c
> index 54b1e39..d6883ce 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -33,7 +33,7 @@
>  /* Maximum number of letters for an LSM name string */
>  #define SECURITY_NAME_MAX	10
>  
> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>  char *lsm_names;
>  /* Boot-time LSM user choice */
>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>  				char *lsm)
>  {
>  	int i;
> +	struct list_head *list = (struct list_head *) &security_hook_heads;

Eww, struct casts.  This whole security_hook_heads scheme stink,
even with the slight improvements from Tetsuo.  It has everything we
shouldn't do - function pointers in structures that are not hard
read-only, structure casts, etc.

What's the reason why can't just have good old const function tables?
Yeah, stackable LSM make that a little harder, but they should not be
enable by default anyway.  But even with those we can still chain
them together with a list with external linkage.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 14:03         ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2017-05-22 14:03 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Casey Schaufler, Greg KH, Igor Stoppa, James Morris, Kees Cook,
	Paul Moore, Stephen Smalley

On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
> A sealable memory allocator patch was proposed at
> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
> and is waiting for a follow-on patch showing how any of the kernel
> can be changed to use this new subsystem. So, here it is for LSM hooks.
> 
> The LSM hooks ("struct security_hook_heads security_hook_heads" and
> "struct security_hook_list ...[]") will benefit from this allocator via
> protection using set_memory_ro()/set_memory_rw(), and it will remove
> CONFIG_SECURITY_WRITABLE_HOOKS config option.
> 
> This means that these structures will be allocated at run time using
> smalloc(), and therefore the address of these structures will be
> determined at run time rather than compile time.
> 
> But currently, LSM_HOOK_INIT() macro depends on the address of
> security_hook_heads being known at compile time. But we already
> initialize security_hook_heads as an array of "struct list_head".
> 
> Therefore, let's use index number (or relative offset from the head
> of security_hook_heads) instead of absolute address of
> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
> know absolute address of security_hook_heads. Then, security_add_hooks()
> will be able to allocate and copy "struct security_hook_list ...[]" using
> smalloc().
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: Casey Schaufler <casey@schaufler-ca.com>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Igor Stoppa <igor.stoppa@huawei.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> ---
>  include/linux/lsm_hooks.h |  6 +++---
>  security/security.c       | 10 ++++++++--
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 080f34e..865c11d 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>   */
>  struct security_hook_list {
>  	struct list_head		list;
> -	struct list_head		*head;
>  	union security_list_options	hook;
> +	const unsigned int		idx;
>  	char				*lsm;
>  };
>  
> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>   * text involved.
>   */
>  #define LSM_HOOK_INIT(HEAD, HOOK) \
> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>  
> -extern struct security_hook_heads security_hook_heads;
>  extern char *lsm_names;
>  
>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
> diff --git a/security/security.c b/security/security.c
> index 54b1e39..d6883ce 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -33,7 +33,7 @@
>  /* Maximum number of letters for an LSM name string */
>  #define SECURITY_NAME_MAX	10
>  
> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>  char *lsm_names;
>  /* Boot-time LSM user choice */
>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>  				char *lsm)
>  {
>  	int i;
> +	struct list_head *list = (struct list_head *) &security_hook_heads;

Eww, struct casts.  This whole security_hook_heads scheme stink,
even with the slight improvements from Tetsuo.  It has everything we
shouldn't do - function pointers in structures that are not hard
read-only, structure casts, etc.

What's the reason why can't just have good old const function tables?
Yeah, stackable LSM make that a little harder, but they should not be
enable by default anyway.  But even with those we can still chain
them together with a list with external linkage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 14:03         ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2017-05-22 14:03 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Casey Schaufler, Greg KH, Igor Stoppa, James Morris, Kees Cook,
	Paul Moore, Stephen Smalley

On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
> A sealable memory allocator patch was proposed at
> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
> and is waiting for a follow-on patch showing how any of the kernel
> can be changed to use this new subsystem. So, here it is for LSM hooks.
> 
> The LSM hooks ("struct security_hook_heads security_hook_heads" and
> "struct security_hook_list ...[]") will benefit from this allocator via
> protection using set_memory_ro()/set_memory_rw(), and it will remove
> CONFIG_SECURITY_WRITABLE_HOOKS config option.
> 
> This means that these structures will be allocated at run time using
> smalloc(), and therefore the address of these structures will be
> determined at run time rather than compile time.
> 
> But currently, LSM_HOOK_INIT() macro depends on the address of
> security_hook_heads being known at compile time. But we already
> initialize security_hook_heads as an array of "struct list_head".
> 
> Therefore, let's use index number (or relative offset from the head
> of security_hook_heads) instead of absolute address of
> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
> know absolute address of security_hook_heads. Then, security_add_hooks()
> will be able to allocate and copy "struct security_hook_list ...[]" using
> smalloc().
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: Casey Schaufler <casey@schaufler-ca.com>
> Cc: James Morris <james.l.morris@oracle.com>
> Cc: Igor Stoppa <igor.stoppa@huawei.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> ---
>  include/linux/lsm_hooks.h |  6 +++---
>  security/security.c       | 10 ++++++++--
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 080f34e..865c11d 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>   */
>  struct security_hook_list {
>  	struct list_head		list;
> -	struct list_head		*head;
>  	union security_list_options	hook;
> +	const unsigned int		idx;
>  	char				*lsm;
>  };
>  
> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>   * text involved.
>   */
>  #define LSM_HOOK_INIT(HEAD, HOOK) \
> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>  
> -extern struct security_hook_heads security_hook_heads;
>  extern char *lsm_names;
>  
>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
> diff --git a/security/security.c b/security/security.c
> index 54b1e39..d6883ce 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -33,7 +33,7 @@
>  /* Maximum number of letters for an LSM name string */
>  #define SECURITY_NAME_MAX	10
>  
> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>  char *lsm_names;
>  /* Boot-time LSM user choice */
>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>  				char *lsm)
>  {
>  	int i;
> +	struct list_head *list = (struct list_head *) &security_hook_heads;

Eww, struct casts.  This whole security_hook_heads scheme stink,
even with the slight improvements from Tetsuo.  It has everything we
shouldn't do - function pointers in structures that are not hard
read-only, structure casts, etc.

What's the reason why can't just have good old const function tables?
Yeah, stackable LSM make that a little harder, but they should not be
enable by default anyway.  But even with those we can still chain
them together with a list with external linkage.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
  2017-05-22 14:03         ` Christoph Hellwig
  (?)
  (?)
@ 2017-05-22 15:09           ` Casey Schaufler
  -1 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 15:09 UTC (permalink / raw)
  To: Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, Igor Stoppa, James Morris, Kees Cook, Paul Moore,
	Stephen Smalley

On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
>> A sealable memory allocator patch was proposed at
>> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
>> and is waiting for a follow-on patch showing how any of the kernel
>> can be changed to use this new subsystem. So, here it is for LSM hooks.
>>
>> The LSM hooks ("struct security_hook_heads security_hook_heads" and
>> "struct security_hook_list ...[]") will benefit from this allocator via
>> protection using set_memory_ro()/set_memory_rw(), and it will remove
>> CONFIG_SECURITY_WRITABLE_HOOKS config option.
>>
>> This means that these structures will be allocated at run time using
>> smalloc(), and therefore the address of these structures will be
>> determined at run time rather than compile time.
>>
>> But currently, LSM_HOOK_INIT() macro depends on the address of
>> security_hook_heads being known at compile time. But we already
>> initialize security_hook_heads as an array of "struct list_head".
>>
>> Therefore, let's use index number (or relative offset from the head
>> of security_hook_heads) instead of absolute address of
>> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
>> know absolute address of security_hook_heads. Then, security_add_hooks()
>> will be able to allocate and copy "struct security_hook_list ...[]" using
>> smalloc().
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Paul Moore <paul@paul-moore.com>
>> Cc: Stephen Smalley <sds@tycho.nsa.gov>
>> Cc: Casey Schaufler <casey@schaufler-ca.com>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Igor Stoppa <igor.stoppa@huawei.com>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> ---
>>  include/linux/lsm_hooks.h |  6 +++---
>>  security/security.c       | 10 ++++++++--
>>  2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 080f34e..865c11d 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>>   */
>>  struct security_hook_list {
>>  	struct list_head		list;
>> -	struct list_head		*head;
>>  	union security_list_options	hook;
>> +	const unsigned int		idx;
>>  	char				*lsm;
>>  };
>>  
>> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>>   * text involved.
>>   */
>>  #define LSM_HOOK_INIT(HEAD, HOOK) \
>> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
>> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
>> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>>  
>> -extern struct security_hook_heads security_hook_heads;
>>  extern char *lsm_names;
>>  
>>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
>> diff --git a/security/security.c b/security/security.c
>> index 54b1e39..d6883ce 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -33,7 +33,7 @@
>>  /* Maximum number of letters for an LSM name string */
>>  #define SECURITY_NAME_MAX	10
>>  
>> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>>  char *lsm_names;
>>  /* Boot-time LSM user choice */
>>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
>> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>>  				char *lsm)
>>  {
>>  	int i;
>> +	struct list_head *list = (struct list_head *) &security_hook_heads;
> Eww, struct casts.  This whole security_hook_heads scheme stink,
> even with the slight improvements from Tetsuo.  It has everything we
> shouldn't do - function pointers in structures that are not hard
> read-only, structure casts, etc.
>
> What's the reason why can't just have good old const function tables?

The set of hooks used by most security modules are sparse.

> Yeah, stackable LSM make that a little harder, but they should not be
> enable by default anyway.

With the number of security modules queued up behind full stacking
I can't say that I agree with your assertion.

> But even with those we can still chain
> them together with a list with external linkage.

I gave up that approach in 2012. Too many unnecessary calls to
null functions, and massive function vectors with a tiny number
of non-null entries. From a data structure standpoint, it was
just wrong. The list scheme is exactly right for the task at
hand.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 15:09           ` Casey Schaufler
  0 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 15:09 UTC (permalink / raw)
  To: linux-security-module

On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
>> A sealable memory allocator patch was proposed at
>> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa at huawei.com ,
>> and is waiting for a follow-on patch showing how any of the kernel
>> can be changed to use this new subsystem. So, here it is for LSM hooks.
>>
>> The LSM hooks ("struct security_hook_heads security_hook_heads" and
>> "struct security_hook_list ...[]") will benefit from this allocator via
>> protection using set_memory_ro()/set_memory_rw(), and it will remove
>> CONFIG_SECURITY_WRITABLE_HOOKS config option.
>>
>> This means that these structures will be allocated at run time using
>> smalloc(), and therefore the address of these structures will be
>> determined at run time rather than compile time.
>>
>> But currently, LSM_HOOK_INIT() macro depends on the address of
>> security_hook_heads being known at compile time. But we already
>> initialize security_hook_heads as an array of "struct list_head".
>>
>> Therefore, let's use index number (or relative offset from the head
>> of security_hook_heads) instead of absolute address of
>> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
>> know absolute address of security_hook_heads. Then, security_add_hooks()
>> will be able to allocate and copy "struct security_hook_list ...[]" using
>> smalloc().
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Paul Moore <paul@paul-moore.com>
>> Cc: Stephen Smalley <sds@tycho.nsa.gov>
>> Cc: Casey Schaufler <casey@schaufler-ca.com>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Igor Stoppa <igor.stoppa@huawei.com>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> ---
>>  include/linux/lsm_hooks.h |  6 +++---
>>  security/security.c       | 10 ++++++++--
>>  2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 080f34e..865c11d 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>>   */
>>  struct security_hook_list {
>>  	struct list_head		list;
>> -	struct list_head		*head;
>>  	union security_list_options	hook;
>> +	const unsigned int		idx;
>>  	char				*lsm;
>>  };
>>  
>> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>>   * text involved.
>>   */
>>  #define LSM_HOOK_INIT(HEAD, HOOK) \
>> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
>> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
>> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>>  
>> -extern struct security_hook_heads security_hook_heads;
>>  extern char *lsm_names;
>>  
>>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
>> diff --git a/security/security.c b/security/security.c
>> index 54b1e39..d6883ce 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -33,7 +33,7 @@
>>  /* Maximum number of letters for an LSM name string */
>>  #define SECURITY_NAME_MAX	10
>>  
>> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>>  char *lsm_names;
>>  /* Boot-time LSM user choice */
>>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
>> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>>  				char *lsm)
>>  {
>>  	int i;
>> +	struct list_head *list = (struct list_head *) &security_hook_heads;
> Eww, struct casts.  This whole security_hook_heads scheme stink,
> even with the slight improvements from Tetsuo.  It has everything we
> shouldn't do - function pointers in structures that are not hard
> read-only, structure casts, etc.
>
> What's the reason why can't just have good old const function tables?

The set of hooks used by most security modules are sparse.

> Yeah, stackable LSM make that a little harder, but they should not be
> enable by default anyway.

With the number of security modules queued up behind full stacking
I can't say that I agree with your assertion.

> But even with those we can still chain
> them together with a list with external linkage.

I gave up that approach in 2012. Too many unnecessary calls to
null functions, and massive function vectors with a tiny number
of non-null entries. From a data structure standpoint, it was
just wrong. The list scheme is exactly right for the task at
hand.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 15:09           ` Casey Schaufler
  0 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 15:09 UTC (permalink / raw)
  To: Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, Igor Stoppa, James Morris, Kees Cook, Paul Moore,
	Stephen Smalley

On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
>> A sealable memory allocator patch was proposed at
>> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
>> and is waiting for a follow-on patch showing how any of the kernel
>> can be changed to use this new subsystem. So, here it is for LSM hooks.
>>
>> The LSM hooks ("struct security_hook_heads security_hook_heads" and
>> "struct security_hook_list ...[]") will benefit from this allocator via
>> protection using set_memory_ro()/set_memory_rw(), and it will remove
>> CONFIG_SECURITY_WRITABLE_HOOKS config option.
>>
>> This means that these structures will be allocated at run time using
>> smalloc(), and therefore the address of these structures will be
>> determined at run time rather than compile time.
>>
>> But currently, LSM_HOOK_INIT() macro depends on the address of
>> security_hook_heads being known at compile time. But we already
>> initialize security_hook_heads as an array of "struct list_head".
>>
>> Therefore, let's use index number (or relative offset from the head
>> of security_hook_heads) instead of absolute address of
>> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
>> know absolute address of security_hook_heads. Then, security_add_hooks()
>> will be able to allocate and copy "struct security_hook_list ...[]" using
>> smalloc().
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Paul Moore <paul@paul-moore.com>
>> Cc: Stephen Smalley <sds@tycho.nsa.gov>
>> Cc: Casey Schaufler <casey@schaufler-ca.com>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Igor Stoppa <igor.stoppa@huawei.com>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> ---
>>  include/linux/lsm_hooks.h |  6 +++---
>>  security/security.c       | 10 ++++++++--
>>  2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 080f34e..865c11d 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>>   */
>>  struct security_hook_list {
>>  	struct list_head		list;
>> -	struct list_head		*head;
>>  	union security_list_options	hook;
>> +	const unsigned int		idx;
>>  	char				*lsm;
>>  };
>>  
>> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>>   * text involved.
>>   */
>>  #define LSM_HOOK_INIT(HEAD, HOOK) \
>> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
>> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
>> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>>  
>> -extern struct security_hook_heads security_hook_heads;
>>  extern char *lsm_names;
>>  
>>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
>> diff --git a/security/security.c b/security/security.c
>> index 54b1e39..d6883ce 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -33,7 +33,7 @@
>>  /* Maximum number of letters for an LSM name string */
>>  #define SECURITY_NAME_MAX	10
>>  
>> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>>  char *lsm_names;
>>  /* Boot-time LSM user choice */
>>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
>> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>>  				char *lsm)
>>  {
>>  	int i;
>> +	struct list_head *list = (struct list_head *) &security_hook_heads;
> Eww, struct casts.  This whole security_hook_heads scheme stink,
> even with the slight improvements from Tetsuo.  It has everything we
> shouldn't do - function pointers in structures that are not hard
> read-only, structure casts, etc.
>
> What's the reason why can't just have good old const function tables?

The set of hooks used by most security modules are sparse.

> Yeah, stackable LSM make that a little harder, but they should not be
> enable by default anyway.

With the number of security modules queued up behind full stacking
I can't say that I agree with your assertion.

> But even with those we can still chain
> them together with a list with external linkage.

I gave up that approach in 2012. Too many unnecessary calls to
null functions, and massive function vectors with a tiny number
of non-null entries. From a data structure standpoint, it was
just wrong. The list scheme is exactly right for the task at
hand.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 15:09           ` Casey Schaufler
  0 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 15:09 UTC (permalink / raw)
  To: Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, Igor Stoppa, James Morris, Kees Cook, Paul Moore,
	Stephen Smalley

On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> On Sun, May 21, 2017 at 08:14:05PM +0900, Tetsuo Handa wrote:
>> A sealable memory allocator patch was proposed at
>> http://lkml.kernel.org/r/20170519103811.2183-1-igor.stoppa@huawei.com ,
>> and is waiting for a follow-on patch showing how any of the kernel
>> can be changed to use this new subsystem. So, here it is for LSM hooks.
>>
>> The LSM hooks ("struct security_hook_heads security_hook_heads" and
>> "struct security_hook_list ...[]") will benefit from this allocator via
>> protection using set_memory_ro()/set_memory_rw(), and it will remove
>> CONFIG_SECURITY_WRITABLE_HOOKS config option.
>>
>> This means that these structures will be allocated at run time using
>> smalloc(), and therefore the address of these structures will be
>> determined at run time rather than compile time.
>>
>> But currently, LSM_HOOK_INIT() macro depends on the address of
>> security_hook_heads being known at compile time. But we already
>> initialize security_hook_heads as an array of "struct list_head".
>>
>> Therefore, let's use index number (or relative offset from the head
>> of security_hook_heads) instead of absolute address of
>> security_hook_heads so that LSM_HOOK_INIT() macro does not need to
>> know absolute address of security_hook_heads. Then, security_add_hooks()
>> will be able to allocate and copy "struct security_hook_list ...[]" using
>> smalloc().
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Paul Moore <paul@paul-moore.com>
>> Cc: Stephen Smalley <sds@tycho.nsa.gov>
>> Cc: Casey Schaufler <casey@schaufler-ca.com>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Igor Stoppa <igor.stoppa@huawei.com>
>> Cc: Greg KH <gregkh@linuxfoundation.org>
>> ---
>>  include/linux/lsm_hooks.h |  6 +++---
>>  security/security.c       | 10 ++++++++--
>>  2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 080f34e..865c11d 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -1884,8 +1884,8 @@ struct security_hook_heads {
>>   */
>>  struct security_hook_list {
>>  	struct list_head		list;
>> -	struct list_head		*head;
>>  	union security_list_options	hook;
>> +	const unsigned int		idx;
>>  	char				*lsm;
>>  };
>>  
>> @@ -1896,9 +1896,9 @@ struct security_hook_list {
>>   * text involved.
>>   */
>>  #define LSM_HOOK_INIT(HEAD, HOOK) \
>> -	{ .head = &security_hook_heads.HEAD, .hook = { .HEAD = HOOK } }
>> +	{ .idx = offsetof(struct security_hook_heads, HEAD) / \
>> +		sizeof(struct list_head), .hook = { .HEAD = HOOK } }
>>  
>> -extern struct security_hook_heads security_hook_heads;
>>  extern char *lsm_names;
>>  
>>  extern void security_add_hooks(struct security_hook_list *hooks, int count,
>> diff --git a/security/security.c b/security/security.c
>> index 54b1e39..d6883ce 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -33,7 +33,7 @@
>>  /* Maximum number of letters for an LSM name string */
>>  #define SECURITY_NAME_MAX	10
>>  
>> -struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>> +static struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>>  char *lsm_names;
>>  /* Boot-time LSM user choice */
>>  static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
>> @@ -152,10 +152,16 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>>  				char *lsm)
>>  {
>>  	int i;
>> +	struct list_head *list = (struct list_head *) &security_hook_heads;
> Eww, struct casts.  This whole security_hook_heads scheme stink,
> even with the slight improvements from Tetsuo.  It has everything we
> shouldn't do - function pointers in structures that are not hard
> read-only, structure casts, etc.
>
> What's the reason why can't just have good old const function tables?

The set of hooks used by most security modules are sparse.

> Yeah, stackable LSM make that a little harder, but they should not be
> enable by default anyway.

With the number of security modules queued up behind full stacking
I can't say that I agree with your assertion.

> But even with those we can still chain
> them together with a list with external linkage.

I gave up that approach in 2012. Too many unnecessary calls to
null functions, and massive function vectors with a tiny number
of non-null entries. From a data structure standpoint, it was
just wrong. The list scheme is exactly right for the task at
hand.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] [PATCH 1/1] Sealable memory support
  2017-05-20  8:51     ` Greg KH
  (?)
@ 2017-05-22 19:45       ` Igor Stoppa
  -1 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-22 19:45 UTC (permalink / raw)
  To: Greg KH
  Cc: mhocko, dave.hansen, labbott, linux-mm, kernel-hardening, linux-kernel

On 20/05/17 11:51, Greg KH wrote:
> On Fri, May 19, 2017 at 01:38:11PM +0300, Igor Stoppa wrote:
>> Dynamically allocated variables can be made read only,

[...]

> This is really nice, do you have a follow-on patch showing how any of
> the kernel can be changed to use this new subsystem?

Yes, actually I started from the need of turning into R/O some data
structures in both LSM Hooks and SE Linux.

> Without that, it
> might be hard to get this approved (we don't like adding new apis
> without users.)

Yes, I just wanted to give an early preview of the current
implementation, since it is significantly different from what I
initially proposed. So I was looking for early feedback.

Right now, I'm fixing it up and adding some more structured debugging.

Then I'll re-submit it together with the LSM Hooks example.

---
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] [PATCH 1/1] Sealable memory support
@ 2017-05-22 19:45       ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-22 19:45 UTC (permalink / raw)
  To: Greg KH
  Cc: mhocko, dave.hansen, labbott, linux-mm, kernel-hardening, linux-kernel

On 20/05/17 11:51, Greg KH wrote:
> On Fri, May 19, 2017 at 01:38:11PM +0300, Igor Stoppa wrote:
>> Dynamically allocated variables can be made read only,

[...]

> This is really nice, do you have a follow-on patch showing how any of
> the kernel can be changed to use this new subsystem?

Yes, actually I started from the need of turning into R/O some data
structures in both LSM Hooks and SE Linux.

> Without that, it
> might be hard to get this approved (we don't like adding new apis
> without users.)

Yes, I just wanted to give an early preview of the current
implementation, since it is significantly different from what I
initially proposed. So I was looking for early feedback.

Right now, I'm fixing it up and adding some more structured debugging.

Then I'll re-submit it together with the LSM Hooks example.

---
thanks, igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] [PATCH 1/1] Sealable memory support
@ 2017-05-22 19:45       ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-22 19:45 UTC (permalink / raw)
  To: Greg KH
  Cc: mhocko, dave.hansen, labbott, linux-mm, kernel-hardening, linux-kernel

On 20/05/17 11:51, Greg KH wrote:
> On Fri, May 19, 2017 at 01:38:11PM +0300, Igor Stoppa wrote:
>> Dynamically allocated variables can be made read only,

[...]

> This is really nice, do you have a follow-on patch showing how any of
> the kernel can be changed to use this new subsystem?

Yes, actually I started from the need of turning into R/O some data
structures in both LSM Hooks and SE Linux.

> Without that, it
> might be hard to get this approved (we don't like adding new apis
> without users.)

Yes, I just wanted to give an early preview of the current
implementation, since it is significantly different from what I
initially proposed. So I was looking for early feedback.

Right now, I'm fixing it up and adding some more structured debugging.

Then I'll re-submit it together with the LSM Hooks example.

---
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
  2017-05-22 15:09           ` Casey Schaufler
  (?)
  (?)
@ 2017-05-22 19:50             ` Igor Stoppa
  -1 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-22 19:50 UTC (permalink / raw)
  To: Casey Schaufler, Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, James Morris, Kees Cook, Paul Moore, Stephen Smalley

On 22/05/17 18:09, Casey Schaufler wrote:
> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:

[...]

>> But even with those we can still chain
>> them together with a list with external linkage.
> 
> I gave up that approach in 2012. Too many unnecessary calls to
> null functions, and massive function vectors with a tiny number
> of non-null entries. From a data structure standpoint, it was
> just wrong. The list scheme is exactly right for the task at
> hand.

I understand this as a green light, for me to continue with the plan of
using LSM Hooks as example for making dynamically allocated data become
read-only, using also Tetsuo's patch (thanks, btw).

Is that correct?

---
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 19:50             ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-22 19:50 UTC (permalink / raw)
  To: linux-security-module

On 22/05/17 18:09, Casey Schaufler wrote:
> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:

[...]

>> But even with those we can still chain
>> them together with a list with external linkage.
> 
> I gave up that approach in 2012. Too many unnecessary calls to
> null functions, and massive function vectors with a tiny number
> of non-null entries. From a data structure standpoint, it was
> just wrong. The list scheme is exactly right for the task at
> hand.

I understand this as a green light, for me to continue with the plan of
using LSM Hooks as example for making dynamically allocated data become
read-only, using also Tetsuo's patch (thanks, btw).

Is that correct?

---
thanks, igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 19:50             ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-22 19:50 UTC (permalink / raw)
  To: Casey Schaufler, Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, James Morris, Kees Cook, Paul Moore, Stephen Smalley

On 22/05/17 18:09, Casey Schaufler wrote:
> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:

[...]

>> But even with those we can still chain
>> them together with a list with external linkage.
> 
> I gave up that approach in 2012. Too many unnecessary calls to
> null functions, and massive function vectors with a tiny number
> of non-null entries. From a data structure standpoint, it was
> just wrong. The list scheme is exactly right for the task at
> hand.

I understand this as a green light, for me to continue with the plan of
using LSM Hooks as example for making dynamically allocated data become
read-only, using also Tetsuo's patch (thanks, btw).

Is that correct?

---
thanks, igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 19:50             ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-22 19:50 UTC (permalink / raw)
  To: Casey Schaufler, Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, James Morris, Kees Cook, Paul Moore, Stephen Smalley

On 22/05/17 18:09, Casey Schaufler wrote:
> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:

[...]

>> But even with those we can still chain
>> them together with a list with external linkage.
> 
> I gave up that approach in 2012. Too many unnecessary calls to
> null functions, and massive function vectors with a tiny number
> of non-null entries. From a data structure standpoint, it was
> just wrong. The list scheme is exactly right for the task at
> hand.

I understand this as a green light, for me to continue with the plan of
using LSM Hooks as example for making dynamically allocated data become
read-only, using also Tetsuo's patch (thanks, btw).

Is that correct?

---
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
  2017-05-22 19:50             ` Igor Stoppa
  (?)
  (?)
@ 2017-05-22 20:32               ` Casey Schaufler
  -1 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 20:32 UTC (permalink / raw)
  To: Igor Stoppa, Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, James Morris, Kees Cook, Paul Moore, Stephen Smalley

On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> On 22/05/17 18:09, Casey Schaufler wrote:
>> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> [...]
>
>>> But even with those we can still chain
>>> them together with a list with external linkage.
>> I gave up that approach in 2012. Too many unnecessary calls to
>> null functions, and massive function vectors with a tiny number
>> of non-null entries. From a data structure standpoint, it was
>> just wrong. The list scheme is exactly right for the task at
>> hand.
> I understand this as a green light, for me to continue with the plan of
> using LSM Hooks as example for making dynamically allocated data become
> read-only, using also Tetsuo's patch (thanks, btw).

I still don't like the assumption that a structure of
N elements can be assumed to be the same as an array
of N elements. Putting on my hardening hat, however, I
like the smalloc() solution to keeping the hook lists
safe, so I am willing to swallow the objection to using
offsets to address the existing exposure.

>
> Is that correct?
>
> ---
> thanks, igor
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 20:32               ` Casey Schaufler
  0 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 20:32 UTC (permalink / raw)
  To: linux-security-module

On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> On 22/05/17 18:09, Casey Schaufler wrote:
>> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> [...]
>
>>> But even with those we can still chain
>>> them together with a list with external linkage.
>> I gave up that approach in 2012. Too many unnecessary calls to
>> null functions, and massive function vectors with a tiny number
>> of non-null entries. From a data structure standpoint, it was
>> just wrong. The list scheme is exactly right for the task at
>> hand.
> I understand this as a green light, for me to continue with the plan of
> using LSM Hooks as example for making dynamically allocated data become
> read-only, using also Tetsuo's patch (thanks, btw).

I still don't like the assumption that a structure of
N elements can be assumed to be the same as an array
of N elements. Putting on my hardening hat, however, I
like the smalloc() solution to keeping the hook lists
safe, so I am willing to swallow the objection to using
offsets to address the existing exposure.

>
> Is that correct?
>
> ---
> thanks, igor
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 20:32               ` Casey Schaufler
  0 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 20:32 UTC (permalink / raw)
  To: Igor Stoppa, Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, James Morris, Kees Cook, Paul Moore, Stephen Smalley

On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> On 22/05/17 18:09, Casey Schaufler wrote:
>> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> [...]
>
>>> But even with those we can still chain
>>> them together with a list with external linkage.
>> I gave up that approach in 2012. Too many unnecessary calls to
>> null functions, and massive function vectors with a tiny number
>> of non-null entries. From a data structure standpoint, it was
>> just wrong. The list scheme is exactly right for the task at
>> hand.
> I understand this as a green light, for me to continue with the plan of
> using LSM Hooks as example for making dynamically allocated data become
> read-only, using also Tetsuo's patch (thanks, btw).

I still don't like the assumption that a structure of
N elements can be assumed to be the same as an array
of N elements. Putting on my hardening hat, however, I
like the smalloc() solution to keeping the hook lists
safe, so I am willing to swallow the objection to using
offsets to address the existing exposure.

>
> Is that correct?
>
> ---
> thanks, igor
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 20:32               ` Casey Schaufler
  0 siblings, 0 replies; 68+ messages in thread
From: Casey Schaufler @ 2017-05-22 20:32 UTC (permalink / raw)
  To: Igor Stoppa, Christoph Hellwig, Tetsuo Handa
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	Greg KH, James Morris, Kees Cook, Paul Moore, Stephen Smalley

On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> On 22/05/17 18:09, Casey Schaufler wrote:
>> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> [...]
>
>>> But even with those we can still chain
>>> them together with a list with external linkage.
>> I gave up that approach in 2012. Too many unnecessary calls to
>> null functions, and massive function vectors with a tiny number
>> of non-null entries. From a data structure standpoint, it was
>> just wrong. The list scheme is exactly right for the task at
>> hand.
> I understand this as a green light, for me to continue with the plan of
> using LSM Hooks as example for making dynamically allocated data become
> read-only, using also Tetsuo's patch (thanks, btw).

I still don't like the assumption that a structure of
N elements can be assumed to be the same as an array
of N elements. Putting on my hardening hat, however, I
like the smalloc() solution to keeping the hook lists
safe, so I am willing to swallow the objection to using
offsets to address the existing exposure.

>
> Is that correct?
>
> ---
> thanks, igor
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
  2017-05-22 20:32               ` Casey Schaufler
  (?)
  (?)
@ 2017-05-22 20:43                 ` Tetsuo Handa
  -1 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-22 20:43 UTC (permalink / raw)
  To: casey, igor.stoppa, hch
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	gregkh, james.l.morris, keescook, paul, sds

Casey Schaufler wrote:
> On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> > On 22/05/17 18:09, Casey Schaufler wrote:
> >> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> > [...]
> >
> >>> But even with those we can still chain
> >>> them together with a list with external linkage.
> >> I gave up that approach in 2012. Too many unnecessary calls to
> >> null functions, and massive function vectors with a tiny number
> >> of non-null entries. From a data structure standpoint, it was
> >> just wrong. The list scheme is exactly right for the task at
> >> hand.
> > I understand this as a green light, for me to continue with the plan of
> > using LSM Hooks as example for making dynamically allocated data become
> > read-only, using also Tetsuo's patch (thanks, btw).
> 
> I still don't like the assumption that a structure of
> N elements can be assumed to be the same as an array
> of N elements.

I think we can use "enum" and call via index numbers while preserving
current "union" for type checking purpose.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 20:43                 ` Tetsuo Handa
  0 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-22 20:43 UTC (permalink / raw)
  To: linux-security-module

Casey Schaufler wrote:
> On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> > On 22/05/17 18:09, Casey Schaufler wrote:
> >> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> > [...]
> >
> >>> But even with those we can still chain
> >>> them together with a list with external linkage.
> >> I gave up that approach in 2012. Too many unnecessary calls to
> >> null functions, and massive function vectors with a tiny number
> >> of non-null entries. From a data structure standpoint, it was
> >> just wrong. The list scheme is exactly right for the task at
> >> hand.
> > I understand this as a green light, for me to continue with the plan of
> > using LSM Hooks as example for making dynamically allocated data become
> > read-only, using also Tetsuo's patch (thanks, btw).
> 
> I still don't like the assumption that a structure of
> N elements can be assumed to be the same as an array
> of N elements.

I think we can use "enum" and call via index numbers while preserving
current "union" for type checking purpose.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 20:43                 ` Tetsuo Handa
  0 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-22 20:43 UTC (permalink / raw)
  To: casey, igor.stoppa, hch
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	gregkh, james.l.morris, keescook, paul, sds

Casey Schaufler wrote:
> On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> > On 22/05/17 18:09, Casey Schaufler wrote:
> >> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> > [...]
> >
> >>> But even with those we can still chain
> >>> them together with a list with external linkage.
> >> I gave up that approach in 2012. Too many unnecessary calls to
> >> null functions, and massive function vectors with a tiny number
> >> of non-null entries. From a data structure standpoint, it was
> >> just wrong. The list scheme is exactly right for the task at
> >> hand.
> > I understand this as a green light, for me to continue with the plan of
> > using LSM Hooks as example for making dynamically allocated data become
> > read-only, using also Tetsuo's patch (thanks, btw).
> 
> I still don't like the assumption that a structure of
> N elements can be assumed to be the same as an array
> of N elements.

I think we can use "enum" and call via index numbers while preserving
current "union" for type checking purpose.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH] LSM: Make security_hook_heads a local variable.
@ 2017-05-22 20:43                 ` Tetsuo Handa
  0 siblings, 0 replies; 68+ messages in thread
From: Tetsuo Handa @ 2017-05-22 20:43 UTC (permalink / raw)
  To: casey, igor.stoppa, hch
  Cc: linux-security-module, linux-mm, kernel-hardening, linux-kernel,
	gregkh, james.l.morris, keescook, paul, sds

Casey Schaufler wrote:
> On 5/22/2017 12:50 PM, Igor Stoppa wrote:
> > On 22/05/17 18:09, Casey Schaufler wrote:
> >> On 5/22/2017 7:03 AM, Christoph Hellwig wrote:
> > [...]
> >
> >>> But even with those we can still chain
> >>> them together with a list with external linkage.
> >> I gave up that approach in 2012. Too many unnecessary calls to
> >> null functions, and massive function vectors with a tiny number
> >> of non-null entries. From a data structure standpoint, it was
> >> just wrong. The list scheme is exactly right for the task at
> >> hand.
> > I understand this as a green light, for me to continue with the plan of
> > using LSM Hooks as example for making dynamically allocated data become
> > read-only, using also Tetsuo's patch (thanks, btw).
> 
> I still don't like the assumption that a structure of
> N elements can be assumed to be the same as an array
> of N elements.

I think we can use "enum" and call via index numbers while preserving
current "union" for type checking purpose.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-19 10:38   ` Igor Stoppa
  (?)
@ 2017-05-22 21:38     ` Kees Cook
  -1 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-22 21:38 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Michal Hocko, Dave Hansen, Laura Abbott, Linux-MM,
	kernel-hardening, LKML, Daniel Micay

On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> Dynamically allocated variables can be made read only,
> after they have been initialized, provided that they reside in memory
> pages devoid of any RW data.
>
> The implementation supplies means to create independent pools of memory,
> which can be individually created, sealed/unsealed and destroyed.

This would be a welcome addition, thanks for posting it! I have a
bunch of feedback, here and below:

For the first bit of bikeshedding, should this really be called
seal/unseal? My mind is probably just broken from having read TPM
documentation, but this isn't really "sealing" as I'd understand it
(it's not tied to a credential, for example). It's "only" rw/ro.
Perhaps "protect/unprotect" or just simply "readonly/writable", and
call the base function "romalloc"?

This is fundamentally a heap allocator, with linked lists, etc. I'd
like to see as much attention as possible given to hardening it
against attacks, especially adding redzoning around the metadata at
least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled. And as
part of that, I'd like hardened usercopy to grow knowledge of these
allocations so we can bounds-check objects. Right now, mm/usercopy.c
just looks at PageSlab(page) to decide if it should do slab checks. I
think adding a check for this type of object would be very important
there.

The ro/rw granularity here is the _entire_ pool, not a specific
allocation (or page containing the allocation). I'm concerned that
makes this very open to race conditions where, especially in the
global pool, one thread can be trying to write to ro data in a pool
and another has made the pool writable.

> A global pool is made available for those kernel modules that do not
> need to manage an independent pool.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  mm/Makefile  |   2 +-
>  mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/smalloc.h |  61 ++++++++++++++++++
>  3 files changed, 262 insertions(+), 1 deletion(-)
>  create mode 100644 mm/smalloc.c
>  create mode 100644 mm/smalloc.h
>
> diff --git a/mm/Makefile b/mm/Makefile
> index 026f6a8..737c42a 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -39,7 +39,7 @@ obj-y                 := filemap.o mempool.o oom_kill.o \
>                            mm_init.o mmu_context.o percpu.o slab_common.o \
>                            compaction.o vmacache.o swap_slots.o \
>                            interval_tree.o list_lru.o workingset.o \
> -                          debug.o $(mmu-y)
> +                          debug.o smalloc.o $(mmu-y)
>
>  obj-y += init-mm.o
>
> diff --git a/mm/smalloc.c b/mm/smalloc.c
> new file mode 100644
> index 0000000..fa04cc5
> --- /dev/null
> +++ b/mm/smalloc.c
> @@ -0,0 +1,200 @@
> +/*
> + * smalloc.c: Sealable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/string.h>
> +
> +
> +#include <linux/vmalloc.h>
> +#include <asm/cacheflush.h>
> +#include "smalloc.h"

Shouldn't this just be <linux/smalloc.h> ?

> +#define page_roundup(size) (((size) + !(size) - 1 + PAGE_SIZE) & PAGE_MASK)
> +
> +#define pages_nr(size) (page_roundup(size) / PAGE_SIZE)
> +
> +static struct smalloc_pool *global_pool;
> +
> +struct smalloc_node *__smalloc_create_node(unsigned long words)
> +{
> +       struct smalloc_node *node;
> +       unsigned long size;
> +
> +       /* Calculate the size to ask from vmalloc, page aligned. */
> +       size = page_roundup(NODE_HEADER_SIZE + words * sizeof(align_t));
> +       node = vmalloc(size);
> +       if (!node) {
> +               pr_err("No memory for allocating smalloc node.");
> +               return NULL;
> +       }
> +       /* Initialize the node.*/
> +       INIT_LIST_HEAD(&node->list);
> +       node->free = node->data;
> +       node->available_words = (size - NODE_HEADER_SIZE) / sizeof(align_t);
> +       return node;
> +}
> +
> +static __always_inline
> +void *node_alloc(struct smalloc_node *node, unsigned long words)
> +{
> +       register align_t *old_free = node->free;
> +
> +       node->available_words -= words;
> +       node->free += words;
> +       return old_free;
> +}
> +
> +void *smalloc(unsigned long size, struct smalloc_pool *pool)
> +{
> +       struct list_head *pos;
> +       struct smalloc_node *node;
> +       void *ptr;
> +       unsigned long words;
> +
> +       /* If no pool specified, use the global one. */
> +       if (!pool)
> +               pool = global_pool;

It should be impossible, but this should check for global_pool == NULL too, IMO.

> +       mutex_lock(&pool->lock);
> +
> +       /* If the pool is sealed, then return NULL. */
> +       if (pool->seal == SMALLOC_SEALED) {
> +               mutex_unlock(&pool->lock);
> +               return NULL;
> +       }
> +
> +       /* Calculate minimum number of words required. */
> +       words = (size + sizeof(align_t) - 1) / sizeof(align_t);
> +
> +       /* Look for slot that is large enough, in the existing pool.*/
> +       list_for_each(pos, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               if (node->available_words >= words) {
> +                       ptr = node_alloc(node, words);
> +                       mutex_unlock(&pool->lock);
> +                       return ptr;
> +               }
> +       }
> +
> +       /* No slot found, get a new chunk of virtual memory. */
> +       node = __smalloc_create_node(words);
> +       if (!node) {
> +               mutex_unlock(&pool->lock);
> +               return NULL;
> +       }
> +
> +       list_add(&node->list, &pool->list);
> +       ptr = node_alloc(node, words);
> +       mutex_unlock(&pool->lock);
> +       return ptr;
> +}
> +
> +static __always_inline
> +unsigned long get_node_size(struct smalloc_node *node)
> +{
> +       if (!node)
> +               return 0;
> +       return page_roundup((((void *)node->free) - (void *)node) +
> +                           node->available_words * sizeof(align_t));
> +}
> +
> +static __always_inline
> +unsigned long get_node_pages_nr(struct smalloc_node *node)
> +{
> +       return pages_nr(get_node_size(node));
> +}
> +void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool)
> +{
> +       struct list_head *pos;
> +       struct smalloc_node *node;
> +
> +       if (!pool)
> +               pool = global_pool;
> +       mutex_lock(&pool->lock);
> +       if (pool->seal == seal) {
> +               mutex_unlock(&pool->lock);
> +               return;

I actually think this should be a BUG condition, since this means a
mismatched seal/unseal happened. The pool should never be left
writable at rest, a user should create a pool, write to it, seal. Any
updates should unseal, write, seal. To attempt an unseal and find it
already unsealed seems bad.

Finding a user for this would help clarify its protection properties,
too. (The LSM example is likely not the best starting point for that,
as it would depend on other changes that are under discussion.)

> +       }
> +       list_for_each(pos, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               if (seal == SMALLOC_SEALED)
> +                       set_memory_ro((unsigned long)node,
> +                                     get_node_pages_nr(node));
> +               else if (seal == SMALLOC_UNSEALED)
> +                       set_memory_rw((unsigned long)node,
> +                                     get_node_pages_nr(node));
> +       }
> +       pool->seal = seal;
> +       mutex_unlock(&pool->lock);
> +}
> +
> +int smalloc_initialize(struct smalloc_pool *pool)
> +{
> +       if (!pool)
> +               return -EINVAL;
> +       INIT_LIST_HEAD(&pool->list);
> +       pool->seal = SMALLOC_UNSEALED;
> +       mutex_init(&pool->lock);
> +       return 0;
> +}
> +
> +struct smalloc_pool *smalloc_create(void)
> +{
> +       struct smalloc_pool *pool = vmalloc(sizeof(struct smalloc_pool));
> +
> +       if (!pool) {
> +               pr_err("No memory for allocating pool.");

It might be handy to have pools named like they are for the slab allocator.

> +               return NULL;
> +       }
> +       smalloc_initialize(pool);
> +       return pool;
> +}
> +
> +int smalloc_destroy(struct smalloc_pool *pool)
> +{
> +       struct list_head *pos, *q;
> +       struct smalloc_node *node;
> +
> +       if (!pool)
> +               return -EINVAL;
> +       list_for_each_safe(pos, q, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               list_del(pos);
> +               vfree(node);
> +       }
> +       return 0;
> +}
> +
> +static int __init smalloc_init(void)
> +{
> +       global_pool = smalloc_create();
> +       if (!global_pool) {
> +               pr_err("Module smalloc initialization failed: no memory.\n");
> +               return -ENOMEM;
> +       }
> +       pr_info("Module smalloc initialized successfully.\n");
> +       return 0;
> +}
> +
> +static void __exit smalloc_exit(void)
> +{
> +       pr_info("Module smalloc un initialized successfully.\n");

typo: space after "un"

> +}
> +
> +module_init(smalloc_init);
> +module_exit(smalloc_exit);
> +MODULE_LICENSE("GPL");
> diff --git a/mm/smalloc.h b/mm/smalloc.h
> new file mode 100644
> index 0000000..344d962
> --- /dev/null
> +++ b/mm/smalloc.h
> @@ -0,0 +1,61 @@
> +/*
> + * smalloc.h: Header for Sealable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#ifndef _SMALLOC_H
> +#define _SMALLOC_H
> +
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +
> +typedef uint64_t align_t;
> +
> +enum seal_t {
> +       SMALLOC_UNSEALED,
> +       SMALLOC_SEALED,
> +};
> +
> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
> +
> +#define NODE_HEADER                                    \
> +       struct {                                        \
> +               __SMALLOC_ALIGNED__ struct {            \
> +                       struct list_head list;          \
> +                       align_t *free;                  \
> +                       unsigned long available_words;  \
> +               };                                      \
> +       }
> +
> +#define NODE_HEADER_SIZE sizeof(NODE_HEADER)
> +
> +struct smalloc_pool {
> +       struct list_head list;
> +       struct mutex lock;
> +       enum seal_t seal;
> +};
> +
> +struct smalloc_node {
> +       NODE_HEADER;
> +       __SMALLOC_ALIGNED__ align_t data[];
> +};
> +
> +#define smalloc_seal(pool) \
> +       smalloc_seal_set(SMALLOC_SEALED, pool)
> +
> +#define smalloc_unseal(pool) \
> +       smalloc_seal_set(SMALLOC_UNSEALED, pool)
> +
> +struct smalloc_pool *smalloc_create(void);
> +int smalloc_destroy(struct smalloc_pool *pool);
> +int smalloc_initialize(struct smalloc_pool *pool);
> +void *smalloc(unsigned long size, struct smalloc_pool *pool);
> +void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool);

I'd really like to see kernel-doc for the API functions (likely in the .c file).

> +#endif
> --
> 2.9.3

Thanks again for working on this! If you can find examples of file
operations living in the heap, those would be great examples for using
this API (assuming the other properties can be improved).

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-05-22 21:38     ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-22 21:38 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Michal Hocko, Dave Hansen, Laura Abbott, Linux-MM,
	kernel-hardening, LKML, Daniel Micay

On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> Dynamically allocated variables can be made read only,
> after they have been initialized, provided that they reside in memory
> pages devoid of any RW data.
>
> The implementation supplies means to create independent pools of memory,
> which can be individually created, sealed/unsealed and destroyed.

This would be a welcome addition, thanks for posting it! I have a
bunch of feedback, here and below:

For the first bit of bikeshedding, should this really be called
seal/unseal? My mind is probably just broken from having read TPM
documentation, but this isn't really "sealing" as I'd understand it
(it's not tied to a credential, for example). It's "only" rw/ro.
Perhaps "protect/unprotect" or just simply "readonly/writable", and
call the base function "romalloc"?

This is fundamentally a heap allocator, with linked lists, etc. I'd
like to see as much attention as possible given to hardening it
against attacks, especially adding redzoning around the metadata at
least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled. And as
part of that, I'd like hardened usercopy to grow knowledge of these
allocations so we can bounds-check objects. Right now, mm/usercopy.c
just looks at PageSlab(page) to decide if it should do slab checks. I
think adding a check for this type of object would be very important
there.

The ro/rw granularity here is the _entire_ pool, not a specific
allocation (or page containing the allocation). I'm concerned that
makes this very open to race conditions where, especially in the
global pool, one thread can be trying to write to ro data in a pool
and another has made the pool writable.

> A global pool is made available for those kernel modules that do not
> need to manage an independent pool.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  mm/Makefile  |   2 +-
>  mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/smalloc.h |  61 ++++++++++++++++++
>  3 files changed, 262 insertions(+), 1 deletion(-)
>  create mode 100644 mm/smalloc.c
>  create mode 100644 mm/smalloc.h
>
> diff --git a/mm/Makefile b/mm/Makefile
> index 026f6a8..737c42a 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -39,7 +39,7 @@ obj-y                 := filemap.o mempool.o oom_kill.o \
>                            mm_init.o mmu_context.o percpu.o slab_common.o \
>                            compaction.o vmacache.o swap_slots.o \
>                            interval_tree.o list_lru.o workingset.o \
> -                          debug.o $(mmu-y)
> +                          debug.o smalloc.o $(mmu-y)
>
>  obj-y += init-mm.o
>
> diff --git a/mm/smalloc.c b/mm/smalloc.c
> new file mode 100644
> index 0000000..fa04cc5
> --- /dev/null
> +++ b/mm/smalloc.c
> @@ -0,0 +1,200 @@
> +/*
> + * smalloc.c: Sealable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/string.h>
> +
> +
> +#include <linux/vmalloc.h>
> +#include <asm/cacheflush.h>
> +#include "smalloc.h"

Shouldn't this just be <linux/smalloc.h> ?

> +#define page_roundup(size) (((size) + !(size) - 1 + PAGE_SIZE) & PAGE_MASK)
> +
> +#define pages_nr(size) (page_roundup(size) / PAGE_SIZE)
> +
> +static struct smalloc_pool *global_pool;
> +
> +struct smalloc_node *__smalloc_create_node(unsigned long words)
> +{
> +       struct smalloc_node *node;
> +       unsigned long size;
> +
> +       /* Calculate the size to ask from vmalloc, page aligned. */
> +       size = page_roundup(NODE_HEADER_SIZE + words * sizeof(align_t));
> +       node = vmalloc(size);
> +       if (!node) {
> +               pr_err("No memory for allocating smalloc node.");
> +               return NULL;
> +       }
> +       /* Initialize the node.*/
> +       INIT_LIST_HEAD(&node->list);
> +       node->free = node->data;
> +       node->available_words = (size - NODE_HEADER_SIZE) / sizeof(align_t);
> +       return node;
> +}
> +
> +static __always_inline
> +void *node_alloc(struct smalloc_node *node, unsigned long words)
> +{
> +       register align_t *old_free = node->free;
> +
> +       node->available_words -= words;
> +       node->free += words;
> +       return old_free;
> +}
> +
> +void *smalloc(unsigned long size, struct smalloc_pool *pool)
> +{
> +       struct list_head *pos;
> +       struct smalloc_node *node;
> +       void *ptr;
> +       unsigned long words;
> +
> +       /* If no pool specified, use the global one. */
> +       if (!pool)
> +               pool = global_pool;

It should be impossible, but this should check for global_pool == NULL too, IMO.

> +       mutex_lock(&pool->lock);
> +
> +       /* If the pool is sealed, then return NULL. */
> +       if (pool->seal == SMALLOC_SEALED) {
> +               mutex_unlock(&pool->lock);
> +               return NULL;
> +       }
> +
> +       /* Calculate minimum number of words required. */
> +       words = (size + sizeof(align_t) - 1) / sizeof(align_t);
> +
> +       /* Look for slot that is large enough, in the existing pool.*/
> +       list_for_each(pos, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               if (node->available_words >= words) {
> +                       ptr = node_alloc(node, words);
> +                       mutex_unlock(&pool->lock);
> +                       return ptr;
> +               }
> +       }
> +
> +       /* No slot found, get a new chunk of virtual memory. */
> +       node = __smalloc_create_node(words);
> +       if (!node) {
> +               mutex_unlock(&pool->lock);
> +               return NULL;
> +       }
> +
> +       list_add(&node->list, &pool->list);
> +       ptr = node_alloc(node, words);
> +       mutex_unlock(&pool->lock);
> +       return ptr;
> +}
> +
> +static __always_inline
> +unsigned long get_node_size(struct smalloc_node *node)
> +{
> +       if (!node)
> +               return 0;
> +       return page_roundup((((void *)node->free) - (void *)node) +
> +                           node->available_words * sizeof(align_t));
> +}
> +
> +static __always_inline
> +unsigned long get_node_pages_nr(struct smalloc_node *node)
> +{
> +       return pages_nr(get_node_size(node));
> +}
> +void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool)
> +{
> +       struct list_head *pos;
> +       struct smalloc_node *node;
> +
> +       if (!pool)
> +               pool = global_pool;
> +       mutex_lock(&pool->lock);
> +       if (pool->seal == seal) {
> +               mutex_unlock(&pool->lock);
> +               return;

I actually think this should be a BUG condition, since this means a
mismatched seal/unseal happened. The pool should never be left
writable at rest, a user should create a pool, write to it, seal. Any
updates should unseal, write, seal. To attempt an unseal and find it
already unsealed seems bad.

Finding a user for this would help clarify its protection properties,
too. (The LSM example is likely not the best starting point for that,
as it would depend on other changes that are under discussion.)

> +       }
> +       list_for_each(pos, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               if (seal == SMALLOC_SEALED)
> +                       set_memory_ro((unsigned long)node,
> +                                     get_node_pages_nr(node));
> +               else if (seal == SMALLOC_UNSEALED)
> +                       set_memory_rw((unsigned long)node,
> +                                     get_node_pages_nr(node));
> +       }
> +       pool->seal = seal;
> +       mutex_unlock(&pool->lock);
> +}
> +
> +int smalloc_initialize(struct smalloc_pool *pool)
> +{
> +       if (!pool)
> +               return -EINVAL;
> +       INIT_LIST_HEAD(&pool->list);
> +       pool->seal = SMALLOC_UNSEALED;
> +       mutex_init(&pool->lock);
> +       return 0;
> +}
> +
> +struct smalloc_pool *smalloc_create(void)
> +{
> +       struct smalloc_pool *pool = vmalloc(sizeof(struct smalloc_pool));
> +
> +       if (!pool) {
> +               pr_err("No memory for allocating pool.");

It might be handy to have pools named like they are for the slab allocator.

> +               return NULL;
> +       }
> +       smalloc_initialize(pool);
> +       return pool;
> +}
> +
> +int smalloc_destroy(struct smalloc_pool *pool)
> +{
> +       struct list_head *pos, *q;
> +       struct smalloc_node *node;
> +
> +       if (!pool)
> +               return -EINVAL;
> +       list_for_each_safe(pos, q, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               list_del(pos);
> +               vfree(node);
> +       }
> +       return 0;
> +}
> +
> +static int __init smalloc_init(void)
> +{
> +       global_pool = smalloc_create();
> +       if (!global_pool) {
> +               pr_err("Module smalloc initialization failed: no memory.\n");
> +               return -ENOMEM;
> +       }
> +       pr_info("Module smalloc initialized successfully.\n");
> +       return 0;
> +}
> +
> +static void __exit smalloc_exit(void)
> +{
> +       pr_info("Module smalloc un initialized successfully.\n");

typo: space after "un"

> +}
> +
> +module_init(smalloc_init);
> +module_exit(smalloc_exit);
> +MODULE_LICENSE("GPL");
> diff --git a/mm/smalloc.h b/mm/smalloc.h
> new file mode 100644
> index 0000000..344d962
> --- /dev/null
> +++ b/mm/smalloc.h
> @@ -0,0 +1,61 @@
> +/*
> + * smalloc.h: Header for Sealable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#ifndef _SMALLOC_H
> +#define _SMALLOC_H
> +
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +
> +typedef uint64_t align_t;
> +
> +enum seal_t {
> +       SMALLOC_UNSEALED,
> +       SMALLOC_SEALED,
> +};
> +
> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
> +
> +#define NODE_HEADER                                    \
> +       struct {                                        \
> +               __SMALLOC_ALIGNED__ struct {            \
> +                       struct list_head list;          \
> +                       align_t *free;                  \
> +                       unsigned long available_words;  \
> +               };                                      \
> +       }
> +
> +#define NODE_HEADER_SIZE sizeof(NODE_HEADER)
> +
> +struct smalloc_pool {
> +       struct list_head list;
> +       struct mutex lock;
> +       enum seal_t seal;
> +};
> +
> +struct smalloc_node {
> +       NODE_HEADER;
> +       __SMALLOC_ALIGNED__ align_t data[];
> +};
> +
> +#define smalloc_seal(pool) \
> +       smalloc_seal_set(SMALLOC_SEALED, pool)
> +
> +#define smalloc_unseal(pool) \
> +       smalloc_seal_set(SMALLOC_UNSEALED, pool)
> +
> +struct smalloc_pool *smalloc_create(void);
> +int smalloc_destroy(struct smalloc_pool *pool);
> +int smalloc_initialize(struct smalloc_pool *pool);
> +void *smalloc(unsigned long size, struct smalloc_pool *pool);
> +void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool);

I'd really like to see kernel-doc for the API functions (likely in the .c file).

> +#endif
> --
> 2.9.3

Thanks again for working on this! If you can find examples of file
operations living in the heap, those would be great examples for using
this API (assuming the other properties can be improved).

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-22 21:38     ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-22 21:38 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Michal Hocko, Dave Hansen, Laura Abbott, Linux-MM,
	kernel-hardening, LKML, Daniel Micay

On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> Dynamically allocated variables can be made read only,
> after they have been initialized, provided that they reside in memory
> pages devoid of any RW data.
>
> The implementation supplies means to create independent pools of memory,
> which can be individually created, sealed/unsealed and destroyed.

This would be a welcome addition, thanks for posting it! I have a
bunch of feedback, here and below:

For the first bit of bikeshedding, should this really be called
seal/unseal? My mind is probably just broken from having read TPM
documentation, but this isn't really "sealing" as I'd understand it
(it's not tied to a credential, for example). It's "only" rw/ro.
Perhaps "protect/unprotect" or just simply "readonly/writable", and
call the base function "romalloc"?

This is fundamentally a heap allocator, with linked lists, etc. I'd
like to see as much attention as possible given to hardening it
against attacks, especially adding redzoning around the metadata at
least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled. And as
part of that, I'd like hardened usercopy to grow knowledge of these
allocations so we can bounds-check objects. Right now, mm/usercopy.c
just looks at PageSlab(page) to decide if it should do slab checks. I
think adding a check for this type of object would be very important
there.

The ro/rw granularity here is the _entire_ pool, not a specific
allocation (or page containing the allocation). I'm concerned that
makes this very open to race conditions where, especially in the
global pool, one thread can be trying to write to ro data in a pool
and another has made the pool writable.

> A global pool is made available for those kernel modules that do not
> need to manage an independent pool.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  mm/Makefile  |   2 +-
>  mm/smalloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/smalloc.h |  61 ++++++++++++++++++
>  3 files changed, 262 insertions(+), 1 deletion(-)
>  create mode 100644 mm/smalloc.c
>  create mode 100644 mm/smalloc.h
>
> diff --git a/mm/Makefile b/mm/Makefile
> index 026f6a8..737c42a 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -39,7 +39,7 @@ obj-y                 := filemap.o mempool.o oom_kill.o \
>                            mm_init.o mmu_context.o percpu.o slab_common.o \
>                            compaction.o vmacache.o swap_slots.o \
>                            interval_tree.o list_lru.o workingset.o \
> -                          debug.o $(mmu-y)
> +                          debug.o smalloc.o $(mmu-y)
>
>  obj-y += init-mm.o
>
> diff --git a/mm/smalloc.c b/mm/smalloc.c
> new file mode 100644
> index 0000000..fa04cc5
> --- /dev/null
> +++ b/mm/smalloc.c
> @@ -0,0 +1,200 @@
> +/*
> + * smalloc.c: Sealable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/string.h>
> +
> +
> +#include <linux/vmalloc.h>
> +#include <asm/cacheflush.h>
> +#include "smalloc.h"

Shouldn't this just be <linux/smalloc.h> ?

> +#define page_roundup(size) (((size) + !(size) - 1 + PAGE_SIZE) & PAGE_MASK)
> +
> +#define pages_nr(size) (page_roundup(size) / PAGE_SIZE)
> +
> +static struct smalloc_pool *global_pool;
> +
> +struct smalloc_node *__smalloc_create_node(unsigned long words)
> +{
> +       struct smalloc_node *node;
> +       unsigned long size;
> +
> +       /* Calculate the size to ask from vmalloc, page aligned. */
> +       size = page_roundup(NODE_HEADER_SIZE + words * sizeof(align_t));
> +       node = vmalloc(size);
> +       if (!node) {
> +               pr_err("No memory for allocating smalloc node.");
> +               return NULL;
> +       }
> +       /* Initialize the node.*/
> +       INIT_LIST_HEAD(&node->list);
> +       node->free = node->data;
> +       node->available_words = (size - NODE_HEADER_SIZE) / sizeof(align_t);
> +       return node;
> +}
> +
> +static __always_inline
> +void *node_alloc(struct smalloc_node *node, unsigned long words)
> +{
> +       register align_t *old_free = node->free;
> +
> +       node->available_words -= words;
> +       node->free += words;
> +       return old_free;
> +}
> +
> +void *smalloc(unsigned long size, struct smalloc_pool *pool)
> +{
> +       struct list_head *pos;
> +       struct smalloc_node *node;
> +       void *ptr;
> +       unsigned long words;
> +
> +       /* If no pool specified, use the global one. */
> +       if (!pool)
> +               pool = global_pool;

It should be impossible, but this should check for global_pool == NULL too, IMO.

> +       mutex_lock(&pool->lock);
> +
> +       /* If the pool is sealed, then return NULL. */
> +       if (pool->seal == SMALLOC_SEALED) {
> +               mutex_unlock(&pool->lock);
> +               return NULL;
> +       }
> +
> +       /* Calculate minimum number of words required. */
> +       words = (size + sizeof(align_t) - 1) / sizeof(align_t);
> +
> +       /* Look for slot that is large enough, in the existing pool.*/
> +       list_for_each(pos, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               if (node->available_words >= words) {
> +                       ptr = node_alloc(node, words);
> +                       mutex_unlock(&pool->lock);
> +                       return ptr;
> +               }
> +       }
> +
> +       /* No slot found, get a new chunk of virtual memory. */
> +       node = __smalloc_create_node(words);
> +       if (!node) {
> +               mutex_unlock(&pool->lock);
> +               return NULL;
> +       }
> +
> +       list_add(&node->list, &pool->list);
> +       ptr = node_alloc(node, words);
> +       mutex_unlock(&pool->lock);
> +       return ptr;
> +}
> +
> +static __always_inline
> +unsigned long get_node_size(struct smalloc_node *node)
> +{
> +       if (!node)
> +               return 0;
> +       return page_roundup((((void *)node->free) - (void *)node) +
> +                           node->available_words * sizeof(align_t));
> +}
> +
> +static __always_inline
> +unsigned long get_node_pages_nr(struct smalloc_node *node)
> +{
> +       return pages_nr(get_node_size(node));
> +}
> +void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool)
> +{
> +       struct list_head *pos;
> +       struct smalloc_node *node;
> +
> +       if (!pool)
> +               pool = global_pool;
> +       mutex_lock(&pool->lock);
> +       if (pool->seal == seal) {
> +               mutex_unlock(&pool->lock);
> +               return;

I actually think this should be a BUG condition, since this means a
mismatched seal/unseal happened. The pool should never be left
writable at rest, a user should create a pool, write to it, seal. Any
updates should unseal, write, seal. To attempt an unseal and find it
already unsealed seems bad.

Finding a user for this would help clarify its protection properties,
too. (The LSM example is likely not the best starting point for that,
as it would depend on other changes that are under discussion.)

> +       }
> +       list_for_each(pos, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               if (seal == SMALLOC_SEALED)
> +                       set_memory_ro((unsigned long)node,
> +                                     get_node_pages_nr(node));
> +               else if (seal == SMALLOC_UNSEALED)
> +                       set_memory_rw((unsigned long)node,
> +                                     get_node_pages_nr(node));
> +       }
> +       pool->seal = seal;
> +       mutex_unlock(&pool->lock);
> +}
> +
> +int smalloc_initialize(struct smalloc_pool *pool)
> +{
> +       if (!pool)
> +               return -EINVAL;
> +       INIT_LIST_HEAD(&pool->list);
> +       pool->seal = SMALLOC_UNSEALED;
> +       mutex_init(&pool->lock);
> +       return 0;
> +}
> +
> +struct smalloc_pool *smalloc_create(void)
> +{
> +       struct smalloc_pool *pool = vmalloc(sizeof(struct smalloc_pool));
> +
> +       if (!pool) {
> +               pr_err("No memory for allocating pool.");

It might be handy to have pools named like they are for the slab allocator.

> +               return NULL;
> +       }
> +       smalloc_initialize(pool);
> +       return pool;
> +}
> +
> +int smalloc_destroy(struct smalloc_pool *pool)
> +{
> +       struct list_head *pos, *q;
> +       struct smalloc_node *node;
> +
> +       if (!pool)
> +               return -EINVAL;
> +       list_for_each_safe(pos, q, &pool->list) {
> +               node = list_entry(pos, struct smalloc_node, list);
> +               list_del(pos);
> +               vfree(node);
> +       }
> +       return 0;
> +}
> +
> +static int __init smalloc_init(void)
> +{
> +       global_pool = smalloc_create();
> +       if (!global_pool) {
> +               pr_err("Module smalloc initialization failed: no memory.\n");
> +               return -ENOMEM;
> +       }
> +       pr_info("Module smalloc initialized successfully.\n");
> +       return 0;
> +}
> +
> +static void __exit smalloc_exit(void)
> +{
> +       pr_info("Module smalloc un initialized successfully.\n");

typo: space after "un"

> +}
> +
> +module_init(smalloc_init);
> +module_exit(smalloc_exit);
> +MODULE_LICENSE("GPL");
> diff --git a/mm/smalloc.h b/mm/smalloc.h
> new file mode 100644
> index 0000000..344d962
> --- /dev/null
> +++ b/mm/smalloc.h
> @@ -0,0 +1,61 @@
> +/*
> + * smalloc.h: Header for Sealable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#ifndef _SMALLOC_H
> +#define _SMALLOC_H
> +
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +
> +typedef uint64_t align_t;
> +
> +enum seal_t {
> +       SMALLOC_UNSEALED,
> +       SMALLOC_SEALED,
> +};
> +
> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
> +
> +#define NODE_HEADER                                    \
> +       struct {                                        \
> +               __SMALLOC_ALIGNED__ struct {            \
> +                       struct list_head list;          \
> +                       align_t *free;                  \
> +                       unsigned long available_words;  \
> +               };                                      \
> +       }
> +
> +#define NODE_HEADER_SIZE sizeof(NODE_HEADER)
> +
> +struct smalloc_pool {
> +       struct list_head list;
> +       struct mutex lock;
> +       enum seal_t seal;
> +};
> +
> +struct smalloc_node {
> +       NODE_HEADER;
> +       __SMALLOC_ALIGNED__ align_t data[];
> +};
> +
> +#define smalloc_seal(pool) \
> +       smalloc_seal_set(SMALLOC_SEALED, pool)
> +
> +#define smalloc_unseal(pool) \
> +       smalloc_seal_set(SMALLOC_UNSEALED, pool)
> +
> +struct smalloc_pool *smalloc_create(void);
> +int smalloc_destroy(struct smalloc_pool *pool);
> +int smalloc_initialize(struct smalloc_pool *pool);
> +void *smalloc(unsigned long size, struct smalloc_pool *pool);
> +void smalloc_seal_set(enum seal_t seal, struct smalloc_pool *pool);

I'd really like to see kernel-doc for the API functions (likely in the .c file).

> +#endif
> --
> 2.9.3

Thanks again for working on this! If you can find examples of file
operations living in the heap, those would be great examples for using
this API (assuming the other properties can be improved).

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-22 21:38     ` Kees Cook
  (?)
@ 2017-05-23  9:43       ` Igor Stoppa
  -1 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-23  9:43 UTC (permalink / raw)
  To: Kees Cook, Casey Schaufler
  Cc: Michal Hocko, Dave Hansen, Laura Abbott, Linux-MM,
	kernel-hardening, LKML, Daniel Micay, Laura Abbott, Greg KH,
	James Morris, Stephen Smalley

On 23/05/17 00:38, Kees Cook wrote:
> On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

> For the first bit of bikeshedding, should this really be called
> seal/unseal? My mind is probably just broken from having read TPM
> documentation, but this isn't really "sealing" as I'd understand it
> (it's not tied to a credential, for example). It's "only" rw/ro.
> Perhaps "protect/unprotect" or just simply "readonly/writable", and
> call the base function "romalloc"?

I was not aware of the specific mean of "seal", in this context.
The term was implicitly proposed by Michal Hocko, while discussing about
the mechanism and I liked it more than what I was using initially:
"lockable".

tbh I like the sound of "smalloc" better than "romalloc"

But this is really the least of my worries :-P

> This is fundamentally a heap allocator, with linked lists, etc. I'd
> like to see as much attention as possible given to hardening it
> against attacks, especially adding redzoning around the metadata at
> least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled.

My initial goal was to provide something that is useful without
affecting performance.

You seem to be pushing for a more extreme approach.
While I have nothing against it and I actually agree that it can be
useful, I would not make it mandatory.

More on this later.

> And as
> part of that, I'd like hardened usercopy to grow knowledge of these
> allocations so we can bounds-check objects. Right now, mm/usercopy.c
> just looks at PageSlab(page) to decide if it should do slab checks. I
> think adding a check for this type of object would be very important
> there.

I am not familiar with this and I need to study it, however I still
think that if there is a significant trade-off in terms of performance
vs resilience, it should be optional, for those who want it.

Maybe there could be a master toggle for these options, if it makes
sense to group them logically. How does this sound?

> The ro/rw granularity here is the _entire_ pool, not a specific
> allocation (or page containing the allocation). I'm concerned that
> makes this very open to race conditions where, especially in the
> global pool, one thread can be trying to write to ro data in a pool
> and another has made the pool writable.

I have the impression we are thinking to something different.
Close, but different enough.

First of all, how using a mutex can create races?
Do you mean with respect of other resources that might be held by
competing users of the pool?

That, imho, is a locking problem that cannot be solved here.
You can try to mitigate it, by reducing the chances it will happen, but
basically you are trying to make do with an user of the API that is not
implementing locking correctly.
I'd say that it's better to fix the user.

If you meant something else, I really didn't get it :-)

More about the frequency of the access: you seem to expect very often
seal/unseal - or lock/unlock, while I don't.

What I envision as primary case for unlocking is the tear down of the
entire pool, for example preceding the unloading of the module.

Think about __ro_after_init : once it is r/o, it stays r/o.

Which also means that there would be possibly a busy transient, when
allocations are made. That is true.
But only for shared pools. If a module uses one pool, I would expect the
initialization of the module to be mostly sequential.
So, no chance of races. Do you agree?

Furthermore, if a module _really_ wants to do parallel allocation, then
maybe it's simpler and cleaner to have one pool per "thread" or whatever
is the mechanism used.


The global pool is mostly there for completing the offering of
__ro_after_init. If one wants ot use it, it's available.
It saves the trouble of having own pool, it means competing with the
rest of the kernel for locking.

If it seems a bad idea, it can be easily removed.

[...]

>> +#include "smalloc.h"
> 
> Shouldn't this just be <linux/smalloc.h> ?

yes, I have fixed it in the next revision

[...]

>> +       /* If no pool specified, use the global one. */
>> +       if (!pool)
>> +               pool = global_pool;
> 
> It should be impossible, but this should check for global_pool == NULL too, IMO.

I'm curious: why do you think it could happen?
I'd rather throw an exception, if I cannot initialize global_pool.

[...]

>> +       if (pool->seal == seal) {
>> +               mutex_unlock(&pool->lock);
>> +               return;
> 
> I actually think this should be a BUG condition, since this means a
> mismatched seal/unseal happened. The pool should never be left
> writable at rest, a user should create a pool, write to it, seal. 

On this I agree.

> Any
> updates should unseal, write, seal. To attempt an unseal and find it
> already unsealed seems bad.

But what you are describing seems exactly the case where there could be
a race.
EX: 2 writers try to unseal and add/change something.
They could have a collision both when trying to seal and unseal.

One could argue that there should be an atomic unseal-update-seal and
maybe unseal-allocate-seal.

I would understand the need for that - and maybe that's really how it
should be implemented, for doing what you seem to envision.

But that's where I think we have different use-cases in mind.

I'd rather not add extra locking to something that doesn't need it:
Allocate - write - seal - read, read, read, ... - unseal - destroy.

What you describe is more complex and does need extra logging, but I
would advocate a different implementation, at least initially.
Later, if it makes sense, fold them into one.

> Finding a user for this would help clarify its protection properties,
> too. (The LSM example is likely not the best starting point for that,
> as it would depend on other changes that are under discussion.)

Well, I *have* to do the LSM - and SE Linux too.

That's where this started and why I'm here with this patch.

>From what I could understand about the discussion, the debate about the
changes is settled (is it?).

Casey seemed to be ok with this.

If something else is needed, ok, I can do that too, but I didn't see any
specific reason why LSM should not be good enough as first example.

If you can provide a specific reason why it's not suitable, I can
reconsider.

[...]

>> +       if (!pool) {
>> +               pr_err("No memory for allocating pool.");
> 
> It might be handy to have pools named like they are for the slab allocator.

I considered this.
The only reason I could come up with why it might be desirable is if the
same pool needs to be accessed from two or more places that do not share
the pointer. It doesn't seem particularly useful.
The only upside I can think of is that it would save a memory page, vs
the case of creating 2 separate pools.
It would also introduce locking cross-dependency.

Did I overlook some reason why it would be desirable?

[...]

>> +int smalloc_destroy(struct smalloc_pool *pool)
>> +{
>> +       struct list_head *pos, *q;
>> +       struct smalloc_node *node;
>> +
>> +       if (!pool)
>> +               return -EINVAL;

locking was missing, I added it in the new version
also I moved to goto-like error handling, since there were several
similar exit paths.

>> +       list_for_each_safe(pos, q, &pool->list) {
>> +               node = list_entry(pos, struct smalloc_node, list);

btw, here and in other places I have switched to list_for_each_entry

[...]

> typo: space after "un"

ok

[...]

>> +typedef uint64_t align_t;
>> +
>> +enum seal_t {
>> +       SMALLOC_UNSEALED,
>> +       SMALLOC_SEALED,
>> +};
>> +
>> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))

How about the alignment? Is it a desirable feature?
Did I overlook some reason why it would not work?

>> +#define NODE_HEADER                                    \
>> +       struct {                                        \
>> +               __SMALLOC_ALIGNED__ struct {            \
>> +                       struct list_head list;          \
>> +                       align_t *free;                  \
>> +                       unsigned long available_words;  \
>> +               };                                      \
>> +       }

Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[...]

> I'd really like to see kernel-doc for the API functions (likely in the .c file).

Yes, I just see no point right now, since the API doesn't seem to be
agreed/finalized yet.

> Thanks again for working on this! If you can find examples of file
> operations living in the heap, those would be great examples for using
> this API (assuming the other properties can be improved).

As I explained above, I would prefer to continue with LSM, unless you
have some specific reason against it.

If you have some example in mind about file operations living on the
heap - which I suspect you do have :) - we could discuss also about
those and if the locking needs to be modified, but from my perspective
the use case is significantly different and I wouldn't pile it up with this.
Of course I might be missing the point you are trying to make.
In that case, I'm afraid that further explanation is needed from you, as
I do not get it :-)


Thanks a lot for the review, this is exactly the sort of early feedback
I was hoping to receive.

--
igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-05-23  9:43       ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-23  9:43 UTC (permalink / raw)
  To: Kees Cook, Casey Schaufler
  Cc: Michal Hocko, Dave Hansen, Laura Abbott, Linux-MM,
	kernel-hardening, LKML, Daniel Micay

On 23/05/17 00:38, Kees Cook wrote:
> On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

> For the first bit of bikeshedding, should this really be called
> seal/unseal? My mind is probably just broken from having read TPM
> documentation, but this isn't really "sealing" as I'd understand it
> (it's not tied to a credential, for example). It's "only" rw/ro.
> Perhaps "protect/unprotect" or just simply "readonly/writable", and
> call the base function "romalloc"?

I was not aware of the specific mean of "seal", in this context.
The term was implicitly proposed by Michal Hocko, while discussing about
the mechanism and I liked it more than what I was using initially:
"lockable".

tbh I like the sound of "smalloc" better than "romalloc"

But this is really the least of my worries :-P

> This is fundamentally a heap allocator, with linked lists, etc. I'd
> like to see as much attention as possible given to hardening it
> against attacks, especially adding redzoning around the metadata at
> least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled.

My initial goal was to provide something that is useful without
affecting performance.

You seem to be pushing for a more extreme approach.
While I have nothing against it and I actually agree that it can be
useful, I would not make it mandatory.

More on this later.

> And as
> part of that, I'd like hardened usercopy to grow knowledge of these
> allocations so we can bounds-check objects. Right now, mm/usercopy.c
> just looks at PageSlab(page) to decide if it should do slab checks. I
> think adding a check for this type of object would be very important
> there.

I am not familiar with this and I need to study it, however I still
think that if there is a significant trade-off in terms of performance
vs resilience, it should be optional, for those who want it.

Maybe there could be a master toggle for these options, if it makes
sense to group them logically. How does this sound?

> The ro/rw granularity here is the _entire_ pool, not a specific
> allocation (or page containing the allocation). I'm concerned that
> makes this very open to race conditions where, especially in the
> global pool, one thread can be trying to write to ro data in a pool
> and another has made the pool writable.

I have the impression we are thinking to something different.
Close, but different enough.

First of all, how using a mutex can create races?
Do you mean with respect of other resources that might be held by
competing users of the pool?

That, imho, is a locking problem that cannot be solved here.
You can try to mitigate it, by reducing the chances it will happen, but
basically you are trying to make do with an user of the API that is not
implementing locking correctly.
I'd say that it's better to fix the user.

If you meant something else, I really didn't get it :-)

More about the frequency of the access: you seem to expect very often
seal/unseal - or lock/unlock, while I don't.

What I envision as primary case for unlocking is the tear down of the
entire pool, for example preceding the unloading of the module.

Think about __ro_after_init : once it is r/o, it stays r/o.

Which also means that there would be possibly a busy transient, when
allocations are made. That is true.
But only for shared pools. If a module uses one pool, I would expect the
initialization of the module to be mostly sequential.
So, no chance of races. Do you agree?

Furthermore, if a module _really_ wants to do parallel allocation, then
maybe it's simpler and cleaner to have one pool per "thread" or whatever
is the mechanism used.


The global pool is mostly there for completing the offering of
__ro_after_init. If one wants ot use it, it's available.
It saves the trouble of having own pool, it means competing with the
rest of the kernel for locking.

If it seems a bad idea, it can be easily removed.

[...]

>> +#include "smalloc.h"
> 
> Shouldn't this just be <linux/smalloc.h> ?

yes, I have fixed it in the next revision

[...]

>> +       /* If no pool specified, use the global one. */
>> +       if (!pool)
>> +               pool = global_pool;
> 
> It should be impossible, but this should check for global_pool == NULL too, IMO.

I'm curious: why do you think it could happen?
I'd rather throw an exception, if I cannot initialize global_pool.

[...]

>> +       if (pool->seal == seal) {
>> +               mutex_unlock(&pool->lock);
>> +               return;
> 
> I actually think this should be a BUG condition, since this means a
> mismatched seal/unseal happened. The pool should never be left
> writable at rest, a user should create a pool, write to it, seal. 

On this I agree.

> Any
> updates should unseal, write, seal. To attempt an unseal and find it
> already unsealed seems bad.

But what you are describing seems exactly the case where there could be
a race.
EX: 2 writers try to unseal and add/change something.
They could have a collision both when trying to seal and unseal.

One could argue that there should be an atomic unseal-update-seal and
maybe unseal-allocate-seal.

I would understand the need for that - and maybe that's really how it
should be implemented, for doing what you seem to envision.

But that's where I think we have different use-cases in mind.

I'd rather not add extra locking to something that doesn't need it:
Allocate - write - seal - read, read, read, ... - unseal - destroy.

What you describe is more complex and does need extra logging, but I
would advocate a different implementation, at least initially.
Later, if it makes sense, fold them into one.

> Finding a user for this would help clarify its protection properties,
> too. (The LSM example is likely not the best starting point for that,
> as it would depend on other changes that are under discussion.)

Well, I *have* to do the LSM - and SE Linux too.

That's where this started and why I'm here with this patch.

>From what I could understand about the discussion, the debate about the
changes is settled (is it?).

Casey seemed to be ok with this.

If something else is needed, ok, I can do that too, but I didn't see any
specific reason why LSM should not be good enough as first example.

If you can provide a specific reason why it's not suitable, I can
reconsider.

[...]

>> +       if (!pool) {
>> +               pr_err("No memory for allocating pool.");
> 
> It might be handy to have pools named like they are for the slab allocator.

I considered this.
The only reason I could come up with why it might be desirable is if the
same pool needs to be accessed from two or more places that do not share
the pointer. It doesn't seem particularly useful.
The only upside I can think of is that it would save a memory page, vs
the case of creating 2 separate pools.
It would also introduce locking cross-dependency.

Did I overlook some reason why it would be desirable?

[...]

>> +int smalloc_destroy(struct smalloc_pool *pool)
>> +{
>> +       struct list_head *pos, *q;
>> +       struct smalloc_node *node;
>> +
>> +       if (!pool)
>> +               return -EINVAL;

locking was missing, I added it in the new version
also I moved to goto-like error handling, since there were several
similar exit paths.

>> +       list_for_each_safe(pos, q, &pool->list) {
>> +               node = list_entry(pos, struct smalloc_node, list);

btw, here and in other places I have switched to list_for_each_entry

[...]

> typo: space after "un"

ok

[...]

>> +typedef uint64_t align_t;
>> +
>> +enum seal_t {
>> +       SMALLOC_UNSEALED,
>> +       SMALLOC_SEALED,
>> +};
>> +
>> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))

How about the alignment? Is it a desirable feature?
Did I overlook some reason why it would not work?

>> +#define NODE_HEADER                                    \
>> +       struct {                                        \
>> +               __SMALLOC_ALIGNED__ struct {            \
>> +                       struct list_head list;          \
>> +                       align_t *free;                  \
>> +                       unsigned long available_words;  \
>> +               };                                      \
>> +       }

Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[...]

> I'd really like to see kernel-doc for the API functions (likely in the .c file).

Yes, I just see no point right now, since the API doesn't seem to be
agreed/finalized yet.

> Thanks again for working on this! If you can find examples of file
> operations living in the heap, those would be great examples for using
> this API (assuming the other properties can be improved).

As I explained above, I would prefer to continue with LSM, unless you
have some specific reason against it.

If you have some example in mind about file operations living on the
heap - which I suspect you do have :) - we could discuss also about
those and if the locking needs to be modified, but from my perspective
the use case is significantly different and I wouldn't pile it up with this.
Of course I might be missing the point you are trying to make.
In that case, I'm afraid that further explanation is needed from you, as
I do not get it :-)


Thanks a lot for the review, this is exactly the sort of early feedback
I was hoping to receive.

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-23  9:43       ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-23  9:43 UTC (permalink / raw)
  To: Kees Cook, Casey Schaufler
  Cc: Michal Hocko, Dave Hansen, Laura Abbott, Linux-MM,
	kernel-hardening, LKML, Daniel Micay

On 23/05/17 00:38, Kees Cook wrote:
> On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

> For the first bit of bikeshedding, should this really be called
> seal/unseal? My mind is probably just broken from having read TPM
> documentation, but this isn't really "sealing" as I'd understand it
> (it's not tied to a credential, for example). It's "only" rw/ro.
> Perhaps "protect/unprotect" or just simply "readonly/writable", and
> call the base function "romalloc"?

I was not aware of the specific mean of "seal", in this context.
The term was implicitly proposed by Michal Hocko, while discussing about
the mechanism and I liked it more than what I was using initially:
"lockable".

tbh I like the sound of "smalloc" better than "romalloc"

But this is really the least of my worries :-P

> This is fundamentally a heap allocator, with linked lists, etc. I'd
> like to see as much attention as possible given to hardening it
> against attacks, especially adding redzoning around the metadata at
> least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled.

My initial goal was to provide something that is useful without
affecting performance.

You seem to be pushing for a more extreme approach.
While I have nothing against it and I actually agree that it can be
useful, I would not make it mandatory.

More on this later.

> And as
> part of that, I'd like hardened usercopy to grow knowledge of these
> allocations so we can bounds-check objects. Right now, mm/usercopy.c
> just looks at PageSlab(page) to decide if it should do slab checks. I
> think adding a check for this type of object would be very important
> there.

I am not familiar with this and I need to study it, however I still
think that if there is a significant trade-off in terms of performance
vs resilience, it should be optional, for those who want it.

Maybe there could be a master toggle for these options, if it makes
sense to group them logically. How does this sound?

> The ro/rw granularity here is the _entire_ pool, not a specific
> allocation (or page containing the allocation). I'm concerned that
> makes this very open to race conditions where, especially in the
> global pool, one thread can be trying to write to ro data in a pool
> and another has made the pool writable.

I have the impression we are thinking to something different.
Close, but different enough.

First of all, how using a mutex can create races?
Do you mean with respect of other resources that might be held by
competing users of the pool?

That, imho, is a locking problem that cannot be solved here.
You can try to mitigate it, by reducing the chances it will happen, but
basically you are trying to make do with an user of the API that is not
implementing locking correctly.
I'd say that it's better to fix the user.

If you meant something else, I really didn't get it :-)

More about the frequency of the access: you seem to expect very often
seal/unseal - or lock/unlock, while I don't.

What I envision as primary case for unlocking is the tear down of the
entire pool, for example preceding the unloading of the module.

Think about __ro_after_init : once it is r/o, it stays r/o.

Which also means that there would be possibly a busy transient, when
allocations are made. That is true.
But only for shared pools. If a module uses one pool, I would expect the
initialization of the module to be mostly sequential.
So, no chance of races. Do you agree?

Furthermore, if a module _really_ wants to do parallel allocation, then
maybe it's simpler and cleaner to have one pool per "thread" or whatever
is the mechanism used.


The global pool is mostly there for completing the offering of
__ro_after_init. If one wants ot use it, it's available.
It saves the trouble of having own pool, it means competing with the
rest of the kernel for locking.

If it seems a bad idea, it can be easily removed.

[...]

>> +#include "smalloc.h"
> 
> Shouldn't this just be <linux/smalloc.h> ?

yes, I have fixed it in the next revision

[...]

>> +       /* If no pool specified, use the global one. */
>> +       if (!pool)
>> +               pool = global_pool;
> 
> It should be impossible, but this should check for global_pool == NULL too, IMO.

I'm curious: why do you think it could happen?
I'd rather throw an exception, if I cannot initialize global_pool.

[...]

>> +       if (pool->seal == seal) {
>> +               mutex_unlock(&pool->lock);
>> +               return;
> 
> I actually think this should be a BUG condition, since this means a
> mismatched seal/unseal happened. The pool should never be left
> writable at rest, a user should create a pool, write to it, seal. 

On this I agree.

> Any
> updates should unseal, write, seal. To attempt an unseal and find it
> already unsealed seems bad.

But what you are describing seems exactly the case where there could be
a race.
EX: 2 writers try to unseal and add/change something.
They could have a collision both when trying to seal and unseal.

One could argue that there should be an atomic unseal-update-seal and
maybe unseal-allocate-seal.

I would understand the need for that - and maybe that's really how it
should be implemented, for doing what you seem to envision.

But that's where I think we have different use-cases in mind.

I'd rather not add extra locking to something that doesn't need it:
Allocate - write - seal - read, read, read, ... - unseal - destroy.

What you describe is more complex and does need extra logging, but I
would advocate a different implementation, at least initially.
Later, if it makes sense, fold them into one.

> Finding a user for this would help clarify its protection properties,
> too. (The LSM example is likely not the best starting point for that,
> as it would depend on other changes that are under discussion.)

Well, I *have* to do the LSM - and SE Linux too.

That's where this started and why I'm here with this patch.

>From what I could understand about the discussion, the debate about the
changes is settled (is it?).

Casey seemed to be ok with this.

If something else is needed, ok, I can do that too, but I didn't see any
specific reason why LSM should not be good enough as first example.

If you can provide a specific reason why it's not suitable, I can
reconsider.

[...]

>> +       if (!pool) {
>> +               pr_err("No memory for allocating pool.");
> 
> It might be handy to have pools named like they are for the slab allocator.

I considered this.
The only reason I could come up with why it might be desirable is if the
same pool needs to be accessed from two or more places that do not share
the pointer. It doesn't seem particularly useful.
The only upside I can think of is that it would save a memory page, vs
the case of creating 2 separate pools.
It would also introduce locking cross-dependency.

Did I overlook some reason why it would be desirable?

[...]

>> +int smalloc_destroy(struct smalloc_pool *pool)
>> +{
>> +       struct list_head *pos, *q;
>> +       struct smalloc_node *node;
>> +
>> +       if (!pool)
>> +               return -EINVAL;

locking was missing, I added it in the new version
also I moved to goto-like error handling, since there were several
similar exit paths.

>> +       list_for_each_safe(pos, q, &pool->list) {
>> +               node = list_entry(pos, struct smalloc_node, list);

btw, here and in other places I have switched to list_for_each_entry

[...]

> typo: space after "un"

ok

[...]

>> +typedef uint64_t align_t;
>> +
>> +enum seal_t {
>> +       SMALLOC_UNSEALED,
>> +       SMALLOC_SEALED,
>> +};
>> +
>> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))

How about the alignment? Is it a desirable feature?
Did I overlook some reason why it would not work?

>> +#define NODE_HEADER                                    \
>> +       struct {                                        \
>> +               __SMALLOC_ALIGNED__ struct {            \
>> +                       struct list_head list;          \
>> +                       align_t *free;                  \
>> +                       unsigned long available_words;  \
>> +               };                                      \
>> +       }

Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[...]

> I'd really like to see kernel-doc for the API functions (likely in the .c file).

Yes, I just see no point right now, since the API doesn't seem to be
agreed/finalized yet.

> Thanks again for working on this! If you can find examples of file
> operations living in the heap, those would be great examples for using
> this API (assuming the other properties can be improved).

As I explained above, I would prefer to continue with LSM, unless you
have some specific reason against it.

If you have some example in mind about file operations living on the
heap - which I suspect you do have :) - we could discuss also about
those and if the locking needs to be modified, but from my perspective
the use case is significantly different and I wouldn't pile it up with this.
Of course I might be missing the point you are trying to make.
In that case, I'm afraid that further explanation is needed from you, as
I do not get it :-)


Thanks a lot for the review, this is exactly the sort of early feedback
I was hoping to receive.

--
igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-23  9:43       ` Igor Stoppa
  (?)
@ 2017-05-23 20:11         ` Kees Cook
  -1 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-23 20:11 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> On 23/05/17 00:38, Kees Cook wrote:
>> On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>
> [...]
>
>> For the first bit of bikeshedding, should this really be called
>> seal/unseal? My mind is probably just broken from having read TPM
>> documentation, but this isn't really "sealing" as I'd understand it
>> (it's not tied to a credential, for example). It's "only" rw/ro.
>> Perhaps "protect/unprotect" or just simply "readonly/writable", and
>> call the base function "romalloc"?
>
> I was not aware of the specific mean of "seal", in this context.
> The term was implicitly proposed by Michal Hocko, while discussing about
> the mechanism and I liked it more than what I was using initially:
> "lockable".
>
> tbh I like the sound of "smalloc" better than "romalloc"
>
> But this is really the least of my worries :-P
>
>> This is fundamentally a heap allocator, with linked lists, etc. I'd
>> like to see as much attention as possible given to hardening it
>> against attacks, especially adding redzoning around the metadata at
>> least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled.
>
> My initial goal was to provide something that is useful without
> affecting performance.
>
> You seem to be pushing for a more extreme approach.

I don't know about pushing, but it seemed like the API provided for
arbitrary unsealing, etc. More below...

> While I have nothing against it and I actually agree that it can be
> useful, I would not make it mandatory.
>
> More on this later.
>
>> And as
>> part of that, I'd like hardened usercopy to grow knowledge of these
>> allocations so we can bounds-check objects. Right now, mm/usercopy.c
>> just looks at PageSlab(page) to decide if it should do slab checks. I
>> think adding a check for this type of object would be very important
>> there.
>
> I am not familiar with this and I need to study it, however I still
> think that if there is a significant trade-off in terms of performance
> vs resilience, it should be optional, for those who want it.

I would want hardened usercopy support as a requirement for using
smalloc(). Without it, we're regressing the over-read protection that
already exists for slab objects, if kernel code switched from slab to
smalloc. It should be very similar to the existing slab checks. "Is
this a smalloc object? Have we read beyond the end of a given object?"
etc. The metadata is all there, except for an efficient way to mark a
page as a smalloc page, but I think that just requires a new Page$Name
bit test, as done for slab.

> Maybe there could be a master toggle for these options, if it makes
> sense to group them logically. How does this sound?
>
>> The ro/rw granularity here is the _entire_ pool, not a specific
>> allocation (or page containing the allocation). I'm concerned that
>> makes this very open to race conditions where, especially in the
>> global pool, one thread can be trying to write to ro data in a pool
>> and another has made the pool writable.
>
> I have the impression we are thinking to something different.
> Close, but different enough.
>
> First of all, how using a mutex can create races?
> Do you mean with respect of other resources that might be held by
> competing users of the pool?

I meant this:

CPU 1     CPU 2
create
alloc
write
seal
...
unseal
                write
write
seal

The CPU 2 write would be, for example, an attacker using a
vulnerability to attempt to write to memory in the sealed area. All it
would need to do to succeed would be to trigger an action in the
kernel that would do a "legitimate" write (which requires the unseal),
and race it. Unsealing should be CPU-local, if the API is going to
support this kind of access.

> That, imho, is a locking problem that cannot be solved here.
> You can try to mitigate it, by reducing the chances it will happen, but
> basically you are trying to make do with an user of the API that is not
> implementing locking correctly.
> I'd say that it's better to fix the user.
>
> If you meant something else, I really didn't get it :-)
>
> More about the frequency of the access: you seem to expect very often
> seal/unseal - or lock/unlock, while I don't.

I am more concerned about _any_ unseal after initial seal. And even
then, it'd be nice to keep things CPU-local. My concerns are related
to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
which is kind of like this, but focused on the .data section, not
dynamic memory. It has similar concerns about CPU-locality.
Additionally, even writing to memory and then making it read-only
later runs risks (see threads about BPF JIT races vs making things
read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
doesn't change the risk this series is fixing: races with attacker
writes during assignment but before read-only marking).

So, while smalloc would hugely reduce the window an attacker has
available to change data contents, this API doesn't eliminate it. (To
eliminate it, there would need to be a CPU-local page permission view
that let only the current CPU to the page, and then restore it to
read-only to match the global read-only view.)

> What I envision as primary case for unlocking is the tear down of the
> entire pool, for example preceding the unloading of the module.
>
> Think about __ro_after_init : once it is r/o, it stays r/o.

Ah! In that case, sure. This isn't what the proposed API provided,
though, so let's adjust it to only perform the unseal at destroy time.
That makes it much saner, IMO. "Write once" dynamic allocations, or
"read-only after seal". woalloc? :P yay naming

> Which also means that there would be possibly a busy transient, when
> allocations are made. That is true.
> But only for shared pools. If a module uses one pool, I would expect the
> initialization of the module to be mostly sequential.
> So, no chance of races. Do you agree?

I think a shared global pool would need to be eliminated for true
write-once semantics.

> Furthermore, if a module _really_ wants to do parallel allocation, then
> maybe it's simpler and cleaner to have one pool per "thread" or whatever
> is the mechanism used.

Right.

> The global pool is mostly there for completing the offering of
> __ro_after_init. If one wants ot use it, it's available.
> It saves the trouble of having own pool, it means competing with the
> rest of the kernel for locking.
>
> If it seems a bad idea, it can be easily removed.

I think they need to be explicit, yes.

> I'd rather not add extra locking to something that doesn't need it:
> Allocate - write - seal - read, read, read, ... - unseal - destroy.

Yup, I would be totally fine with this. It still has a race between
allocate and seal, but it's a huge improvement over the existing state
of the world where all dynamic memory is writable. :)

>> Finding a user for this would help clarify its protection properties,
>> too. (The LSM example is likely not the best starting point for that,
>> as it would depend on other changes that are under discussion.)
>
> Well, I *have* to do the LSM - and SE Linux too.
>
> That's where this started and why I'm here with this patch.

Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
could just drop the runtime disabling of SELinux, we'd be fine.

I'd like to see what you have in mind for LSM, since the only runtime
change would be the SELinux piece, and that would violate the
alloc-write-seal-read-read-destroy lifetime (since you'd need to
unseal-write-seal to change the state of SELinux during runtime).

> From what I could understand about the discussion, the debate about the
> changes is settled (is it?).
>
> Casey seemed to be ok with this.
>
> If something else is needed, ok, I can do that too, but I didn't see any
> specific reason why LSM should not be good enough as first example.
>
> If you can provide a specific reason why it's not suitable, I can
> reconsider.

Hopefully we're on the same page now? I love being able to work off a
patch example, so if you agree about the changes in lifetime
management, and you've got some examples of how/where to use smalloc,
I'd love to review those too. I think there are a LOT of places this
could get used.

> [...]
>
>>> +       if (!pool) {
>>> +               pr_err("No memory for allocating pool.");
>>
>> It might be handy to have pools named like they are for the slab allocator.
>
> I considered this.
> The only reason I could come up with why it might be desirable is if the
> same pool needs to be accessed from two or more places that do not share
> the pointer. It doesn't seem particularly useful.
> The only upside I can think of is that it would save a memory page, vs
> the case of creating 2 separate pools.
> It would also introduce locking cross-dependency.
>
> Did I overlook some reason why it would be desirable?

Hm, I just meant add a char[] to the metadata and pass it in during
create(). Then it's possible to report which smalloc cache is being
examined during hardened usercopy checks.

>
> [...]
>
>>> +int smalloc_destroy(struct smalloc_pool *pool)
>>> +{
>>> +       struct list_head *pos, *q;
>>> +       struct smalloc_node *node;
>>> +
>>> +       if (!pool)
>>> +               return -EINVAL;
>
> locking was missing, I added it in the new version
> also I moved to goto-like error handling, since there were several
> similar exit paths.

It seems like smalloc pools could also be refcounted?

> [...]
>
>>> +typedef uint64_t align_t;
>>> +
>>> +enum seal_t {
>>> +       SMALLOC_UNSEALED,
>>> +       SMALLOC_SEALED,
>>> +};
>>> +
>>> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
>
> How about the alignment? Is it a desirable feature?
> Did I overlook some reason why it would not work?

I think alignment makes sense. It makes sure there aren't crazy
problems with structs allocated after the header.

>>> +#define NODE_HEADER                                    \
>>> +       struct {                                        \
>>> +               __SMALLOC_ALIGNED__ struct {            \
>>> +                       struct list_head list;          \
>>> +                       align_t *free;                  \
>>> +                       unsigned long available_words;  \
>>> +               };                                      \
>>> +       }
>
> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It's probably a sufficient starting point, depending on how the API
shakes out. Without unseal-write-seal properties, I case much less
about redzoning, etc.

> [...]
>
>> I'd really like to see kernel-doc for the API functions (likely in the .c file).
>
> Yes, I just see no point right now, since the API doesn't seem to be
> agreed/finalized yet.

Sure thing. I guess I was looking for more docs because of my
questions/concerns about lifetime management, etc. :)

>> Thanks again for working on this! If you can find examples of file
>> operations living in the heap, those would be great examples for using
>> this API (assuming the other properties can be improved).
>
> As I explained above, I would prefer to continue with LSM, unless you
> have some specific reason against it.

I think I just lack the imagination for what you're trying to do that
isn't already covered by the existing __ro_after_init changes. :)

> If you have some example in mind about file operations living on the
> heap - which I suspect you do have :) - we could discuss also about
> those and if the locking needs to be modified, but from my perspective
> the use case is significantly different and I wouldn't pile it up with this.
> Of course I might be missing the point you are trying to make.
> In that case, I'm afraid that further explanation is needed from you, as
> I do not get it :-)

Well, a poor example would be struct sock, since it needs to be
regularly written to, but it has function pointers near the end which
have been a very common target for attackers. (Though this is less so
now that INET_DIAG no longer exposes the kernel addresses to allocated
struct socks.)

> Thanks a lot for the review, this is exactly the sort of early feedback
> I was hoping to receive.

Great! Gaining a ro_after_init-like API for dynamic memory would be
quite valuable, I think. And once we have a write-rarely
infrastructure for the .rodata segment, perhaps we can extend it to
the dynamic memory areas too (in the future).

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-05-23 20:11         ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-23 20:11 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> On 23/05/17 00:38, Kees Cook wrote:
>> On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>
> [...]
>
>> For the first bit of bikeshedding, should this really be called
>> seal/unseal? My mind is probably just broken from having read TPM
>> documentation, but this isn't really "sealing" as I'd understand it
>> (it's not tied to a credential, for example). It's "only" rw/ro.
>> Perhaps "protect/unprotect" or just simply "readonly/writable", and
>> call the base function "romalloc"?
>
> I was not aware of the specific mean of "seal", in this context.
> The term was implicitly proposed by Michal Hocko, while discussing about
> the mechanism and I liked it more than what I was using initially:
> "lockable".
>
> tbh I like the sound of "smalloc" better than "romalloc"
>
> But this is really the least of my worries :-P
>
>> This is fundamentally a heap allocator, with linked lists, etc. I'd
>> like to see as much attention as possible given to hardening it
>> against attacks, especially adding redzoning around the metadata at
>> least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled.
>
> My initial goal was to provide something that is useful without
> affecting performance.
>
> You seem to be pushing for a more extreme approach.

I don't know about pushing, but it seemed like the API provided for
arbitrary unsealing, etc. More below...

> While I have nothing against it and I actually agree that it can be
> useful, I would not make it mandatory.
>
> More on this later.
>
>> And as
>> part of that, I'd like hardened usercopy to grow knowledge of these
>> allocations so we can bounds-check objects. Right now, mm/usercopy.c
>> just looks at PageSlab(page) to decide if it should do slab checks. I
>> think adding a check for this type of object would be very important
>> there.
>
> I am not familiar with this and I need to study it, however I still
> think that if there is a significant trade-off in terms of performance
> vs resilience, it should be optional, for those who want it.

I would want hardened usercopy support as a requirement for using
smalloc(). Without it, we're regressing the over-read protection that
already exists for slab objects, if kernel code switched from slab to
smalloc. It should be very similar to the existing slab checks. "Is
this a smalloc object? Have we read beyond the end of a given object?"
etc. The metadata is all there, except for an efficient way to mark a
page as a smalloc page, but I think that just requires a new Page$Name
bit test, as done for slab.

> Maybe there could be a master toggle for these options, if it makes
> sense to group them logically. How does this sound?
>
>> The ro/rw granularity here is the _entire_ pool, not a specific
>> allocation (or page containing the allocation). I'm concerned that
>> makes this very open to race conditions where, especially in the
>> global pool, one thread can be trying to write to ro data in a pool
>> and another has made the pool writable.
>
> I have the impression we are thinking to something different.
> Close, but different enough.
>
> First of all, how using a mutex can create races?
> Do you mean with respect of other resources that might be held by
> competing users of the pool?

I meant this:

CPU 1     CPU 2
create
alloc
write
seal
...
unseal
                write
write
seal

The CPU 2 write would be, for example, an attacker using a
vulnerability to attempt to write to memory in the sealed area. All it
would need to do to succeed would be to trigger an action in the
kernel that would do a "legitimate" write (which requires the unseal),
and race it. Unsealing should be CPU-local, if the API is going to
support this kind of access.

> That, imho, is a locking problem that cannot be solved here.
> You can try to mitigate it, by reducing the chances it will happen, but
> basically you are trying to make do with an user of the API that is not
> implementing locking correctly.
> I'd say that it's better to fix the user.
>
> If you meant something else, I really didn't get it :-)
>
> More about the frequency of the access: you seem to expect very often
> seal/unseal - or lock/unlock, while I don't.

I am more concerned about _any_ unseal after initial seal. And even
then, it'd be nice to keep things CPU-local. My concerns are related
to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
which is kind of like this, but focused on the .data section, not
dynamic memory. It has similar concerns about CPU-locality.
Additionally, even writing to memory and then making it read-only
later runs risks (see threads about BPF JIT races vs making things
read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
doesn't change the risk this series is fixing: races with attacker
writes during assignment but before read-only marking).

So, while smalloc would hugely reduce the window an attacker has
available to change data contents, this API doesn't eliminate it. (To
eliminate it, there would need to be a CPU-local page permission view
that let only the current CPU to the page, and then restore it to
read-only to match the global read-only view.)

> What I envision as primary case for unlocking is the tear down of the
> entire pool, for example preceding the unloading of the module.
>
> Think about __ro_after_init : once it is r/o, it stays r/o.

Ah! In that case, sure. This isn't what the proposed API provided,
though, so let's adjust it to only perform the unseal at destroy time.
That makes it much saner, IMO. "Write once" dynamic allocations, or
"read-only after seal". woalloc? :P yay naming

> Which also means that there would be possibly a busy transient, when
> allocations are made. That is true.
> But only for shared pools. If a module uses one pool, I would expect the
> initialization of the module to be mostly sequential.
> So, no chance of races. Do you agree?

I think a shared global pool would need to be eliminated for true
write-once semantics.

> Furthermore, if a module _really_ wants to do parallel allocation, then
> maybe it's simpler and cleaner to have one pool per "thread" or whatever
> is the mechanism used.

Right.

> The global pool is mostly there for completing the offering of
> __ro_after_init. If one wants ot use it, it's available.
> It saves the trouble of having own pool, it means competing with the
> rest of the kernel for locking.
>
> If it seems a bad idea, it can be easily removed.

I think they need to be explicit, yes.

> I'd rather not add extra locking to something that doesn't need it:
> Allocate - write - seal - read, read, read, ... - unseal - destroy.

Yup, I would be totally fine with this. It still has a race between
allocate and seal, but it's a huge improvement over the existing state
of the world where all dynamic memory is writable. :)

>> Finding a user for this would help clarify its protection properties,
>> too. (The LSM example is likely not the best starting point for that,
>> as it would depend on other changes that are under discussion.)
>
> Well, I *have* to do the LSM - and SE Linux too.
>
> That's where this started and why I'm here with this patch.

Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
could just drop the runtime disabling of SELinux, we'd be fine.

I'd like to see what you have in mind for LSM, since the only runtime
change would be the SELinux piece, and that would violate the
alloc-write-seal-read-read-destroy lifetime (since you'd need to
unseal-write-seal to change the state of SELinux during runtime).

> From what I could understand about the discussion, the debate about the
> changes is settled (is it?).
>
> Casey seemed to be ok with this.
>
> If something else is needed, ok, I can do that too, but I didn't see any
> specific reason why LSM should not be good enough as first example.
>
> If you can provide a specific reason why it's not suitable, I can
> reconsider.

Hopefully we're on the same page now? I love being able to work off a
patch example, so if you agree about the changes in lifetime
management, and you've got some examples of how/where to use smalloc,
I'd love to review those too. I think there are a LOT of places this
could get used.

> [...]
>
>>> +       if (!pool) {
>>> +               pr_err("No memory for allocating pool.");
>>
>> It might be handy to have pools named like they are for the slab allocator.
>
> I considered this.
> The only reason I could come up with why it might be desirable is if the
> same pool needs to be accessed from two or more places that do not share
> the pointer. It doesn't seem particularly useful.
> The only upside I can think of is that it would save a memory page, vs
> the case of creating 2 separate pools.
> It would also introduce locking cross-dependency.
>
> Did I overlook some reason why it would be desirable?

Hm, I just meant add a char[] to the metadata and pass it in during
create(). Then it's possible to report which smalloc cache is being
examined during hardened usercopy checks.

>
> [...]
>
>>> +int smalloc_destroy(struct smalloc_pool *pool)
>>> +{
>>> +       struct list_head *pos, *q;
>>> +       struct smalloc_node *node;
>>> +
>>> +       if (!pool)
>>> +               return -EINVAL;
>
> locking was missing, I added it in the new version
> also I moved to goto-like error handling, since there were several
> similar exit paths.

It seems like smalloc pools could also be refcounted?

> [...]
>
>>> +typedef uint64_t align_t;
>>> +
>>> +enum seal_t {
>>> +       SMALLOC_UNSEALED,
>>> +       SMALLOC_SEALED,
>>> +};
>>> +
>>> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
>
> How about the alignment? Is it a desirable feature?
> Did I overlook some reason why it would not work?

I think alignment makes sense. It makes sure there aren't crazy
problems with structs allocated after the header.

>>> +#define NODE_HEADER                                    \
>>> +       struct {                                        \
>>> +               __SMALLOC_ALIGNED__ struct {            \
>>> +                       struct list_head list;          \
>>> +                       align_t *free;                  \
>>> +                       unsigned long available_words;  \
>>> +               };                                      \
>>> +       }
>
> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It's probably a sufficient starting point, depending on how the API
shakes out. Without unseal-write-seal properties, I case much less
about redzoning, etc.

> [...]
>
>> I'd really like to see kernel-doc for the API functions (likely in the .c file).
>
> Yes, I just see no point right now, since the API doesn't seem to be
> agreed/finalized yet.

Sure thing. I guess I was looking for more docs because of my
questions/concerns about lifetime management, etc. :)

>> Thanks again for working on this! If you can find examples of file
>> operations living in the heap, those would be great examples for using
>> this API (assuming the other properties can be improved).
>
> As I explained above, I would prefer to continue with LSM, unless you
> have some specific reason against it.

I think I just lack the imagination for what you're trying to do that
isn't already covered by the existing __ro_after_init changes. :)

> If you have some example in mind about file operations living on the
> heap - which I suspect you do have :) - we could discuss also about
> those and if the locking needs to be modified, but from my perspective
> the use case is significantly different and I wouldn't pile it up with this.
> Of course I might be missing the point you are trying to make.
> In that case, I'm afraid that further explanation is needed from you, as
> I do not get it :-)

Well, a poor example would be struct sock, since it needs to be
regularly written to, but it has function pointers near the end which
have been a very common target for attackers. (Though this is less so
now that INET_DIAG no longer exposes the kernel addresses to allocated
struct socks.)

> Thanks a lot for the review, this is exactly the sort of early feedback
> I was hoping to receive.

Great! Gaining a ro_after_init-like API for dynamic memory would be
quite valuable, I think. And once we have a write-rarely
infrastructure for the .rodata segment, perhaps we can extend it to
the dynamic memory areas too (in the future).

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-23 20:11         ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-23 20:11 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> On 23/05/17 00:38, Kees Cook wrote:
>> On Fri, May 19, 2017 at 3:38 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>
> [...]
>
>> For the first bit of bikeshedding, should this really be called
>> seal/unseal? My mind is probably just broken from having read TPM
>> documentation, but this isn't really "sealing" as I'd understand it
>> (it's not tied to a credential, for example). It's "only" rw/ro.
>> Perhaps "protect/unprotect" or just simply "readonly/writable", and
>> call the base function "romalloc"?
>
> I was not aware of the specific mean of "seal", in this context.
> The term was implicitly proposed by Michal Hocko, while discussing about
> the mechanism and I liked it more than what I was using initially:
> "lockable".
>
> tbh I like the sound of "smalloc" better than "romalloc"
>
> But this is really the least of my worries :-P
>
>> This is fundamentally a heap allocator, with linked lists, etc. I'd
>> like to see as much attention as possible given to hardening it
>> against attacks, especially adding redzoning around the metadata at
>> least, and perhaps requiring that CONFIG_DEBUG_LIST be enabled.
>
> My initial goal was to provide something that is useful without
> affecting performance.
>
> You seem to be pushing for a more extreme approach.

I don't know about pushing, but it seemed like the API provided for
arbitrary unsealing, etc. More below...

> While I have nothing against it and I actually agree that it can be
> useful, I would not make it mandatory.
>
> More on this later.
>
>> And as
>> part of that, I'd like hardened usercopy to grow knowledge of these
>> allocations so we can bounds-check objects. Right now, mm/usercopy.c
>> just looks at PageSlab(page) to decide if it should do slab checks. I
>> think adding a check for this type of object would be very important
>> there.
>
> I am not familiar with this and I need to study it, however I still
> think that if there is a significant trade-off in terms of performance
> vs resilience, it should be optional, for those who want it.

I would want hardened usercopy support as a requirement for using
smalloc(). Without it, we're regressing the over-read protection that
already exists for slab objects, if kernel code switched from slab to
smalloc. It should be very similar to the existing slab checks. "Is
this a smalloc object? Have we read beyond the end of a given object?"
etc. The metadata is all there, except for an efficient way to mark a
page as a smalloc page, but I think that just requires a new Page$Name
bit test, as done for slab.

> Maybe there could be a master toggle for these options, if it makes
> sense to group them logically. How does this sound?
>
>> The ro/rw granularity here is the _entire_ pool, not a specific
>> allocation (or page containing the allocation). I'm concerned that
>> makes this very open to race conditions where, especially in the
>> global pool, one thread can be trying to write to ro data in a pool
>> and another has made the pool writable.
>
> I have the impression we are thinking to something different.
> Close, but different enough.
>
> First of all, how using a mutex can create races?
> Do you mean with respect of other resources that might be held by
> competing users of the pool?

I meant this:

CPU 1     CPU 2
create
alloc
write
seal
...
unseal
                write
write
seal

The CPU 2 write would be, for example, an attacker using a
vulnerability to attempt to write to memory in the sealed area. All it
would need to do to succeed would be to trigger an action in the
kernel that would do a "legitimate" write (which requires the unseal),
and race it. Unsealing should be CPU-local, if the API is going to
support this kind of access.

> That, imho, is a locking problem that cannot be solved here.
> You can try to mitigate it, by reducing the chances it will happen, but
> basically you are trying to make do with an user of the API that is not
> implementing locking correctly.
> I'd say that it's better to fix the user.
>
> If you meant something else, I really didn't get it :-)
>
> More about the frequency of the access: you seem to expect very often
> seal/unseal - or lock/unlock, while I don't.

I am more concerned about _any_ unseal after initial seal. And even
then, it'd be nice to keep things CPU-local. My concerns are related
to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
which is kind of like this, but focused on the .data section, not
dynamic memory. It has similar concerns about CPU-locality.
Additionally, even writing to memory and then making it read-only
later runs risks (see threads about BPF JIT races vs making things
read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
doesn't change the risk this series is fixing: races with attacker
writes during assignment but before read-only marking).

So, while smalloc would hugely reduce the window an attacker has
available to change data contents, this API doesn't eliminate it. (To
eliminate it, there would need to be a CPU-local page permission view
that let only the current CPU to the page, and then restore it to
read-only to match the global read-only view.)

> What I envision as primary case for unlocking is the tear down of the
> entire pool, for example preceding the unloading of the module.
>
> Think about __ro_after_init : once it is r/o, it stays r/o.

Ah! In that case, sure. This isn't what the proposed API provided,
though, so let's adjust it to only perform the unseal at destroy time.
That makes it much saner, IMO. "Write once" dynamic allocations, or
"read-only after seal". woalloc? :P yay naming

> Which also means that there would be possibly a busy transient, when
> allocations are made. That is true.
> But only for shared pools. If a module uses one pool, I would expect the
> initialization of the module to be mostly sequential.
> So, no chance of races. Do you agree?

I think a shared global pool would need to be eliminated for true
write-once semantics.

> Furthermore, if a module _really_ wants to do parallel allocation, then
> maybe it's simpler and cleaner to have one pool per "thread" or whatever
> is the mechanism used.

Right.

> The global pool is mostly there for completing the offering of
> __ro_after_init. If one wants ot use it, it's available.
> It saves the trouble of having own pool, it means competing with the
> rest of the kernel for locking.
>
> If it seems a bad idea, it can be easily removed.

I think they need to be explicit, yes.

> I'd rather not add extra locking to something that doesn't need it:
> Allocate - write - seal - read, read, read, ... - unseal - destroy.

Yup, I would be totally fine with this. It still has a race between
allocate and seal, but it's a huge improvement over the existing state
of the world where all dynamic memory is writable. :)

>> Finding a user for this would help clarify its protection properties,
>> too. (The LSM example is likely not the best starting point for that,
>> as it would depend on other changes that are under discussion.)
>
> Well, I *have* to do the LSM - and SE Linux too.
>
> That's where this started and why I'm here with this patch.

Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
could just drop the runtime disabling of SELinux, we'd be fine.

I'd like to see what you have in mind for LSM, since the only runtime
change would be the SELinux piece, and that would violate the
alloc-write-seal-read-read-destroy lifetime (since you'd need to
unseal-write-seal to change the state of SELinux during runtime).

> From what I could understand about the discussion, the debate about the
> changes is settled (is it?).
>
> Casey seemed to be ok with this.
>
> If something else is needed, ok, I can do that too, but I didn't see any
> specific reason why LSM should not be good enough as first example.
>
> If you can provide a specific reason why it's not suitable, I can
> reconsider.

Hopefully we're on the same page now? I love being able to work off a
patch example, so if you agree about the changes in lifetime
management, and you've got some examples of how/where to use smalloc,
I'd love to review those too. I think there are a LOT of places this
could get used.

> [...]
>
>>> +       if (!pool) {
>>> +               pr_err("No memory for allocating pool.");
>>
>> It might be handy to have pools named like they are for the slab allocator.
>
> I considered this.
> The only reason I could come up with why it might be desirable is if the
> same pool needs to be accessed from two or more places that do not share
> the pointer. It doesn't seem particularly useful.
> The only upside I can think of is that it would save a memory page, vs
> the case of creating 2 separate pools.
> It would also introduce locking cross-dependency.
>
> Did I overlook some reason why it would be desirable?

Hm, I just meant add a char[] to the metadata and pass it in during
create(). Then it's possible to report which smalloc cache is being
examined during hardened usercopy checks.

>
> [...]
>
>>> +int smalloc_destroy(struct smalloc_pool *pool)
>>> +{
>>> +       struct list_head *pos, *q;
>>> +       struct smalloc_node *node;
>>> +
>>> +       if (!pool)
>>> +               return -EINVAL;
>
> locking was missing, I added it in the new version
> also I moved to goto-like error handling, since there were several
> similar exit paths.

It seems like smalloc pools could also be refcounted?

> [...]
>
>>> +typedef uint64_t align_t;
>>> +
>>> +enum seal_t {
>>> +       SMALLOC_UNSEALED,
>>> +       SMALLOC_SEALED,
>>> +};
>>> +
>>> +#define __SMALLOC_ALIGNED__ __aligned(sizeof(align_t))
>
> How about the alignment? Is it a desirable feature?
> Did I overlook some reason why it would not work?

I think alignment makes sense. It makes sure there aren't crazy
problems with structs allocated after the header.

>>> +#define NODE_HEADER                                    \
>>> +       struct {                                        \
>>> +               __SMALLOC_ALIGNED__ struct {            \
>>> +                       struct list_head list;          \
>>> +                       align_t *free;                  \
>>> +                       unsigned long available_words;  \
>>> +               };                                      \
>>> +       }
>
> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It's probably a sufficient starting point, depending on how the API
shakes out. Without unseal-write-seal properties, I case much less
about redzoning, etc.

> [...]
>
>> I'd really like to see kernel-doc for the API functions (likely in the .c file).
>
> Yes, I just see no point right now, since the API doesn't seem to be
> agreed/finalized yet.

Sure thing. I guess I was looking for more docs because of my
questions/concerns about lifetime management, etc. :)

>> Thanks again for working on this! If you can find examples of file
>> operations living in the heap, those would be great examples for using
>> this API (assuming the other properties can be improved).
>
> As I explained above, I would prefer to continue with LSM, unless you
> have some specific reason against it.

I think I just lack the imagination for what you're trying to do that
isn't already covered by the existing __ro_after_init changes. :)

> If you have some example in mind about file operations living on the
> heap - which I suspect you do have :) - we could discuss also about
> those and if the locking needs to be modified, but from my perspective
> the use case is significantly different and I wouldn't pile it up with this.
> Of course I might be missing the point you are trying to make.
> In that case, I'm afraid that further explanation is needed from you, as
> I do not get it :-)

Well, a poor example would be struct sock, since it needs to be
regularly written to, but it has function pointers near the end which
have been a very common target for attackers. (Though this is less so
now that INET_DIAG no longer exposes the kernel addresses to allocated
struct socks.)

> Thanks a lot for the review, this is exactly the sort of early feedback
> I was hoping to receive.

Great! Gaining a ro_after_init-like API for dynamic memory would be
quite valuable, I think. And once we have a write-rarely
infrastructure for the .rodata segment, perhaps we can extend it to
the dynamic memory areas too (in the future).

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-23 20:11         ` Kees Cook
  (?)
@ 2017-05-24 17:45           ` Igor Stoppa
  -1 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-24 17:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley



On 23/05/17 23:11, Kees Cook wrote:
> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

> I would want hardened usercopy support as a requirement for using
> smalloc(). Without it, we're regressing the over-read protection that
> already exists for slab objects, if kernel code switched from slab to
> smalloc. It should be very similar to the existing slab checks. "Is
> this a smalloc object? Have we read beyond the end of a given object?"
> etc. The metadata is all there, except for an efficient way to mark a
> page as a smalloc page, but I think that just requires a new Page$Name
> bit test, as done for slab.

ok

[...]

> I meant this:
> 
> CPU 1     CPU 2
> create
> alloc
> write
> seal
> ...
> unseal
>                 write
> write
> seal
> 
> The CPU 2 write would be, for example, an attacker using a
> vulnerability to attempt to write to memory in the sealed area. All it
> would need to do to succeed would be to trigger an action in the
> kernel that would do a "legitimate" write (which requires the unseal),
> and race it. Unsealing should be CPU-local, if the API is going to
> support this kind of access.

I see.
If the CPU1 were to forcibly halt anything that can race with it, then
it would be sure that there was no interference.
A reactive approach could be, instead, to re-validate the content after
the sealing, assuming that it is possible.

[...]

> I am more concerned about _any_ unseal after initial seal. And even
> then, it'd be nice to keep things CPU-local. My concerns are related
> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
> which is kind of like this, but focused on the .data section, not
> dynamic memory. It has similar concerns about CPU-locality.
> Additionally, even writing to memory and then making it read-only
> later runs risks (see threads about BPF JIT races vs making things
> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
> doesn't change the risk this series is fixing: races with attacker
> writes during assignment but before read-only marking).

If you are talking about an attacker, rather than protection against
accidental overwrites, how hashing can be enough?
Couldn't the attacker compromise that too?

> So, while smalloc would hugely reduce the window an attacker has
> available to change data contents, this API doesn't eliminate it. (To
> eliminate it, there would need to be a CPU-local page permission view
> that let only the current CPU to the page, and then restore it to
> read-only to match the global read-only view.)

That or, if one is ready to take the hit, freeze every other possible
attack vector. But I'm not sure this could be justifiable.

[...]

> Ah! In that case, sure. This isn't what the proposed API provided,
> though, so let's adjust it to only perform the unseal at destroy time.
> That makes it much saner, IMO. "Write once" dynamic allocations, or
> "read-only after seal". woalloc? :P yay naming

For now I'm still using smalloc.
Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
Other options might require some re-formatting.

[...]

> I think a shared global pool would need to be eliminated for true
> write-once semantics.

ok

[...]

>> I'd rather not add extra locking to something that doesn't need it:
>> Allocate - write - seal - read, read, read, ... - unseal - destroy.
> 
> Yup, I would be totally fine with this. It still has a race between
> allocate and seal, but it's a huge improvement over the existing state
> of the world where all dynamic memory is writable. :)

Great!


[...]

> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
> could just drop the runtime disabling of SELinux, we'd be fine.

I am not sure I understand this point.
If the kernel is properly configured, the master toggle variable
disappears, right?
Or do you mean the disabling through modifications of the linked list of
the hooks?

[...]


> Hm, I just meant add a char[] to the metadata and pass it in during
> create(). Then it's possible to report which smalloc cache is being
> examined during hardened usercopy checks.

Ok, that is not a big deal.
wrt this, I have spent some time writing a debug module, which currently
dumps into a debugfs entry a bunch of info about the various pools.
I could split it across multiple entries, using the label to generate
their names.

[...]

> It seems like smalloc pools could also be refcounted?

I am not sure what you mean.
What do you want to count?
Number of pools? Nodes per pool? Allocations per node?

And what for?

At least in the case of tearing down a pool, when a module is unloaded,
nobody needs to free anything that was allocated with smalloc.
The teardown function will free the pages from each node.

Is this the place where you think there should be a check on the number
of pages freed?

[...]

>>>> +#define NODE_HEADER                                    \
>>>> +       struct {                                        \
>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>> +                       struct list_head list;          \
>>>> +                       align_t *free;                  \
>>>> +                       unsigned long available_words;  \
>>>> +               };                                      \
>>>> +       }
>>
>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> It's probably a sufficient starting point, depending on how the API
> shakes out. Without unseal-write-seal properties, I case much less
> about redzoning, etc.

ok, but my question (I am not sure if it was clear) was about the use of
a macro for the nameless structure that contains the header.

[...]

> Well, a poor example would be struct sock, since it needs to be
> regularly written to, but it has function pointers near the end which
> have been a very common target for attackers. (Though this is less so
> now that INET_DIAG no longer exposes the kernel addresses to allocated
> struct socks.)

Ok, this could be the scope for a further set of patches, after this one
is done.



One more thing: how should I tie this allocator to the rest?
I have verified that is seems to work with both SLUB and SLAB.
Can I make it depend on either of them being enabled?

Should it be optionally enabled?
What to default to, if it's not enabled? vmalloc?

---
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-05-24 17:45           ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-24 17:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley



On 23/05/17 23:11, Kees Cook wrote:
> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

> I would want hardened usercopy support as a requirement for using
> smalloc(). Without it, we're regressing the over-read protection that
> already exists for slab objects, if kernel code switched from slab to
> smalloc. It should be very similar to the existing slab checks. "Is
> this a smalloc object? Have we read beyond the end of a given object?"
> etc. The metadata is all there, except for an efficient way to mark a
> page as a smalloc page, but I think that just requires a new Page$Name
> bit test, as done for slab.

ok

[...]

> I meant this:
> 
> CPU 1     CPU 2
> create
> alloc
> write
> seal
> ...
> unseal
>                 write
> write
> seal
> 
> The CPU 2 write would be, for example, an attacker using a
> vulnerability to attempt to write to memory in the sealed area. All it
> would need to do to succeed would be to trigger an action in the
> kernel that would do a "legitimate" write (which requires the unseal),
> and race it. Unsealing should be CPU-local, if the API is going to
> support this kind of access.

I see.
If the CPU1 were to forcibly halt anything that can race with it, then
it would be sure that there was no interference.
A reactive approach could be, instead, to re-validate the content after
the sealing, assuming that it is possible.

[...]

> I am more concerned about _any_ unseal after initial seal. And even
> then, it'd be nice to keep things CPU-local. My concerns are related
> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
> which is kind of like this, but focused on the .data section, not
> dynamic memory. It has similar concerns about CPU-locality.
> Additionally, even writing to memory and then making it read-only
> later runs risks (see threads about BPF JIT races vs making things
> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
> doesn't change the risk this series is fixing: races with attacker
> writes during assignment but before read-only marking).

If you are talking about an attacker, rather than protection against
accidental overwrites, how hashing can be enough?
Couldn't the attacker compromise that too?

> So, while smalloc would hugely reduce the window an attacker has
> available to change data contents, this API doesn't eliminate it. (To
> eliminate it, there would need to be a CPU-local page permission view
> that let only the current CPU to the page, and then restore it to
> read-only to match the global read-only view.)

That or, if one is ready to take the hit, freeze every other possible
attack vector. But I'm not sure this could be justifiable.

[...]

> Ah! In that case, sure. This isn't what the proposed API provided,
> though, so let's adjust it to only perform the unseal at destroy time.
> That makes it much saner, IMO. "Write once" dynamic allocations, or
> "read-only after seal". woalloc? :P yay naming

For now I'm still using smalloc.
Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
Other options might require some re-formatting.

[...]

> I think a shared global pool would need to be eliminated for true
> write-once semantics.

ok

[...]

>> I'd rather not add extra locking to something that doesn't need it:
>> Allocate - write - seal - read, read, read, ... - unseal - destroy.
> 
> Yup, I would be totally fine with this. It still has a race between
> allocate and seal, but it's a huge improvement over the existing state
> of the world where all dynamic memory is writable. :)

Great!


[...]

> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
> could just drop the runtime disabling of SELinux, we'd be fine.

I am not sure I understand this point.
If the kernel is properly configured, the master toggle variable
disappears, right?
Or do you mean the disabling through modifications of the linked list of
the hooks?

[...]


> Hm, I just meant add a char[] to the metadata and pass it in during
> create(). Then it's possible to report which smalloc cache is being
> examined during hardened usercopy checks.

Ok, that is not a big deal.
wrt this, I have spent some time writing a debug module, which currently
dumps into a debugfs entry a bunch of info about the various pools.
I could split it across multiple entries, using the label to generate
their names.

[...]

> It seems like smalloc pools could also be refcounted?

I am not sure what you mean.
What do you want to count?
Number of pools? Nodes per pool? Allocations per node?

And what for?

At least in the case of tearing down a pool, when a module is unloaded,
nobody needs to free anything that was allocated with smalloc.
The teardown function will free the pages from each node.

Is this the place where you think there should be a check on the number
of pages freed?

[...]

>>>> +#define NODE_HEADER                                    \
>>>> +       struct {                                        \
>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>> +                       struct list_head list;          \
>>>> +                       align_t *free;                  \
>>>> +                       unsigned long available_words;  \
>>>> +               };                                      \
>>>> +       }
>>
>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> It's probably a sufficient starting point, depending on how the API
> shakes out. Without unseal-write-seal properties, I case much less
> about redzoning, etc.

ok, but my question (I am not sure if it was clear) was about the use of
a macro for the nameless structure that contains the header.

[...]

> Well, a poor example would be struct sock, since it needs to be
> regularly written to, but it has function pointers near the end which
> have been a very common target for attackers. (Though this is less so
> now that INET_DIAG no longer exposes the kernel addresses to allocated
> struct socks.)

Ok, this could be the scope for a further set of patches, after this one
is done.



One more thing: how should I tie this allocator to the rest?
I have verified that is seems to work with both SLUB and SLAB.
Can I make it depend on either of them being enabled?

Should it be optionally enabled?
What to default to, if it's not enabled? vmalloc?

---
thanks, igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-24 17:45           ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-24 17:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley



On 23/05/17 23:11, Kees Cook wrote:
> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

> I would want hardened usercopy support as a requirement for using
> smalloc(). Without it, we're regressing the over-read protection that
> already exists for slab objects, if kernel code switched from slab to
> smalloc. It should be very similar to the existing slab checks. "Is
> this a smalloc object? Have we read beyond the end of a given object?"
> etc. The metadata is all there, except for an efficient way to mark a
> page as a smalloc page, but I think that just requires a new Page$Name
> bit test, as done for slab.

ok

[...]

> I meant this:
> 
> CPU 1     CPU 2
> create
> alloc
> write
> seal
> ...
> unseal
>                 write
> write
> seal
> 
> The CPU 2 write would be, for example, an attacker using a
> vulnerability to attempt to write to memory in the sealed area. All it
> would need to do to succeed would be to trigger an action in the
> kernel that would do a "legitimate" write (which requires the unseal),
> and race it. Unsealing should be CPU-local, if the API is going to
> support this kind of access.

I see.
If the CPU1 were to forcibly halt anything that can race with it, then
it would be sure that there was no interference.
A reactive approach could be, instead, to re-validate the content after
the sealing, assuming that it is possible.

[...]

> I am more concerned about _any_ unseal after initial seal. And even
> then, it'd be nice to keep things CPU-local. My concerns are related
> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
> which is kind of like this, but focused on the .data section, not
> dynamic memory. It has similar concerns about CPU-locality.
> Additionally, even writing to memory and then making it read-only
> later runs risks (see threads about BPF JIT races vs making things
> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
> doesn't change the risk this series is fixing: races with attacker
> writes during assignment but before read-only marking).

If you are talking about an attacker, rather than protection against
accidental overwrites, how hashing can be enough?
Couldn't the attacker compromise that too?

> So, while smalloc would hugely reduce the window an attacker has
> available to change data contents, this API doesn't eliminate it. (To
> eliminate it, there would need to be a CPU-local page permission view
> that let only the current CPU to the page, and then restore it to
> read-only to match the global read-only view.)

That or, if one is ready to take the hit, freeze every other possible
attack vector. But I'm not sure this could be justifiable.

[...]

> Ah! In that case, sure. This isn't what the proposed API provided,
> though, so let's adjust it to only perform the unseal at destroy time.
> That makes it much saner, IMO. "Write once" dynamic allocations, or
> "read-only after seal". woalloc? :P yay naming

For now I'm still using smalloc.
Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
Other options might require some re-formatting.

[...]

> I think a shared global pool would need to be eliminated for true
> write-once semantics.

ok

[...]

>> I'd rather not add extra locking to something that doesn't need it:
>> Allocate - write - seal - read, read, read, ... - unseal - destroy.
> 
> Yup, I would be totally fine with this. It still has a race between
> allocate and seal, but it's a huge improvement over the existing state
> of the world where all dynamic memory is writable. :)

Great!


[...]

> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
> could just drop the runtime disabling of SELinux, we'd be fine.

I am not sure I understand this point.
If the kernel is properly configured, the master toggle variable
disappears, right?
Or do you mean the disabling through modifications of the linked list of
the hooks?

[...]


> Hm, I just meant add a char[] to the metadata and pass it in during
> create(). Then it's possible to report which smalloc cache is being
> examined during hardened usercopy checks.

Ok, that is not a big deal.
wrt this, I have spent some time writing a debug module, which currently
dumps into a debugfs entry a bunch of info about the various pools.
I could split it across multiple entries, using the label to generate
their names.

[...]

> It seems like smalloc pools could also be refcounted?

I am not sure what you mean.
What do you want to count?
Number of pools? Nodes per pool? Allocations per node?

And what for?

At least in the case of tearing down a pool, when a module is unloaded,
nobody needs to free anything that was allocated with smalloc.
The teardown function will free the pages from each node.

Is this the place where you think there should be a check on the number
of pages freed?

[...]

>>>> +#define NODE_HEADER                                    \
>>>> +       struct {                                        \
>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>> +                       struct list_head list;          \
>>>> +                       align_t *free;                  \
>>>> +                       unsigned long available_words;  \
>>>> +               };                                      \
>>>> +       }
>>
>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> It's probably a sufficient starting point, depending on how the API
> shakes out. Without unseal-write-seal properties, I case much less
> about redzoning, etc.

ok, but my question (I am not sure if it was clear) was about the use of
a macro for the nameless structure that contains the header.

[...]

> Well, a poor example would be struct sock, since it needs to be
> regularly written to, but it has function pointers near the end which
> have been a very common target for attackers. (Though this is less so
> now that INET_DIAG no longer exposes the kernel addresses to allocated
> struct socks.)

Ok, this could be the scope for a further set of patches, after this one
is done.



One more thing: how should I tie this allocator to the rest?
I have verified that is seems to work with both SLUB and SLAB.
Can I make it depend on either of them being enabled?

Should it be optionally enabled?
What to default to, if it's not enabled? vmalloc?

---
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-24 17:45           ` Igor Stoppa
  (?)
@ 2017-05-28 18:23             ` Kees Cook
  -1 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-28 18:23 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> On 23/05/17 23:11, Kees Cook wrote:
>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> I meant this:
>>
>> CPU 1     CPU 2
>> create
>> alloc
>> write
>> seal
>> ...
>> unseal
>>                 write
>> write
>> seal
>>
>> The CPU 2 write would be, for example, an attacker using a
>> vulnerability to attempt to write to memory in the sealed area. All it
>> would need to do to succeed would be to trigger an action in the
>> kernel that would do a "legitimate" write (which requires the unseal),
>> and race it. Unsealing should be CPU-local, if the API is going to
>> support this kind of access.
>
> I see.
> If the CPU1 were to forcibly halt anything that can race with it, then
> it would be sure that there was no interference.

Correct. This is actually what ARM does for doing kernel memory
writing when poking stuff for kprobes, etc. It's rather dramatic,
though. :)

> A reactive approach could be, instead, to re-validate the content after
> the sealing, assuming that it is possible.

I would prefer to avoid this, as that allows an attacker to still have
made the changes (which could even result in them then disabling the
re-validation during the attack).

>> I am more concerned about _any_ unseal after initial seal. And even
>> then, it'd be nice to keep things CPU-local. My concerns are related
>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>> which is kind of like this, but focused on the .data section, not
>> dynamic memory. It has similar concerns about CPU-locality.
>> Additionally, even writing to memory and then making it read-only
>> later runs risks (see threads about BPF JIT races vs making things
>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>> doesn't change the risk this series is fixing: races with attacker
>> writes during assignment but before read-only marking).
>
> If you are talking about an attacker, rather than protection against
> accidental overwrites, how hashing can be enough?
> Couldn't the attacker compromise that too?

In theory, yes, though the goal was to dedicate a register to the
hash, which would make it hard/impossible for an attacker to reach.
(The BPF JIT situation is just an example of this kind of race,
though. I'm still in favor of reducing the write window to init-time
from full run-time.)

>> So, while smalloc would hugely reduce the window an attacker has
>> available to change data contents, this API doesn't eliminate it. (To
>> eliminate it, there would need to be a CPU-local page permission view
>> that let only the current CPU to the page, and then restore it to
>> read-only to match the global read-only view.)
>
> That or, if one is ready to take the hit, freeze every other possible
> attack vector. But I'm not sure this could be justifiable.

I would expect other people would NAK using "stop all other CPUs"
approach. Better to have the CPU-local writes.

>> Ah! In that case, sure. This isn't what the proposed API provided,
>> though, so let's adjust it to only perform the unseal at destroy time.
>> That makes it much saner, IMO. "Write once" dynamic allocations, or
>> "read-only after seal". woalloc? :P yay naming
>
> For now I'm still using smalloc.
> Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
> Other options might require some re-formatting.

Yeah, I don't have any better idea for names. :)

>> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
>> could just drop the runtime disabling of SELinux, we'd be fine.
>
> I am not sure I understand this point.
> If the kernel is properly configured, the master toggle variable
> disappears, right?
> Or do you mean the disabling through modifications of the linked list of
> the hooks?

We might be talking past each-other. Right now, the LSM is marked with
__ro_after_init, which will make all the list structures, entries, etc
read-only already. There is one case where this is not true, and
that's for CONFIG_SECURITY_WRITABLE_HOOKS for
CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
SELinux. Are you talking about the SELinux policy installed during
early boot? Which things did you want to use smalloc() on?

>> It seems like smalloc pools could also be refcounted?
>
> I am not sure what you mean.
> What do you want to count?
> Number of pools? Nodes per pool? Allocations per node?

I meant things that point into an smalloc() pool could keep a refcount
and when nothing was pointing into it, it could be destroyed. (i.e.
using refcount garbage collection instead of explicit destruction.)

> And what for?

It might be easier to reason about later if allocations get complex.
It's certainly not required for the first version of this.

> At least in the case of tearing down a pool, when a module is unloaded,
> nobody needs to free anything that was allocated with smalloc.
> The teardown function will free the pages from each node.

Right, yeah.

>>>>> +#define NODE_HEADER                                    \
>>>>> +       struct {                                        \
>>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>>> +                       struct list_head list;          \
>>>>> +                       align_t *free;                  \
>>>>> +                       unsigned long available_words;  \
>>>>> +               };                                      \
>>>>> +       }
>>>
>>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> It's probably a sufficient starting point, depending on how the API
>> shakes out. Without unseal-write-seal properties, I case much less
>> about redzoning, etc.
>
> ok, but my question (I am not sure if it was clear) was about the use of
> a macro for the nameless structure that contains the header.

I don't really have an opinion on this. It might be more readable with
a named structure?

> One more thing: how should I tie this allocator to the rest?
> I have verified that is seems to work with both SLUB and SLAB.
> Can I make it depend on either of them being enabled?

It seems totally unrelated. The only relationship I see would be
interaction with hardened usercopy. In a perfect world, none of the
smalloc pools would be allowed to be copied to/from userspace, which
would make integration really easy: if smalloc_pool(ptr) return NOPE;
:P

> Should it be optionally enabled?
> What to default to, if it's not enabled? vmalloc?

I don't see any reason to make it optional.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-05-28 18:23             ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-28 18:23 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> On 23/05/17 23:11, Kees Cook wrote:
>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> I meant this:
>>
>> CPU 1     CPU 2
>> create
>> alloc
>> write
>> seal
>> ...
>> unseal
>>                 write
>> write
>> seal
>>
>> The CPU 2 write would be, for example, an attacker using a
>> vulnerability to attempt to write to memory in the sealed area. All it
>> would need to do to succeed would be to trigger an action in the
>> kernel that would do a "legitimate" write (which requires the unseal),
>> and race it. Unsealing should be CPU-local, if the API is going to
>> support this kind of access.
>
> I see.
> If the CPU1 were to forcibly halt anything that can race with it, then
> it would be sure that there was no interference.

Correct. This is actually what ARM does for doing kernel memory
writing when poking stuff for kprobes, etc. It's rather dramatic,
though. :)

> A reactive approach could be, instead, to re-validate the content after
> the sealing, assuming that it is possible.

I would prefer to avoid this, as that allows an attacker to still have
made the changes (which could even result in them then disabling the
re-validation during the attack).

>> I am more concerned about _any_ unseal after initial seal. And even
>> then, it'd be nice to keep things CPU-local. My concerns are related
>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>> which is kind of like this, but focused on the .data section, not
>> dynamic memory. It has similar concerns about CPU-locality.
>> Additionally, even writing to memory and then making it read-only
>> later runs risks (see threads about BPF JIT races vs making things
>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>> doesn't change the risk this series is fixing: races with attacker
>> writes during assignment but before read-only marking).
>
> If you are talking about an attacker, rather than protection against
> accidental overwrites, how hashing can be enough?
> Couldn't the attacker compromise that too?

In theory, yes, though the goal was to dedicate a register to the
hash, which would make it hard/impossible for an attacker to reach.
(The BPF JIT situation is just an example of this kind of race,
though. I'm still in favor of reducing the write window to init-time
from full run-time.)

>> So, while smalloc would hugely reduce the window an attacker has
>> available to change data contents, this API doesn't eliminate it. (To
>> eliminate it, there would need to be a CPU-local page permission view
>> that let only the current CPU to the page, and then restore it to
>> read-only to match the global read-only view.)
>
> That or, if one is ready to take the hit, freeze every other possible
> attack vector. But I'm not sure this could be justifiable.

I would expect other people would NAK using "stop all other CPUs"
approach. Better to have the CPU-local writes.

>> Ah! In that case, sure. This isn't what the proposed API provided,
>> though, so let's adjust it to only perform the unseal at destroy time.
>> That makes it much saner, IMO. "Write once" dynamic allocations, or
>> "read-only after seal". woalloc? :P yay naming
>
> For now I'm still using smalloc.
> Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
> Other options might require some re-formatting.

Yeah, I don't have any better idea for names. :)

>> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
>> could just drop the runtime disabling of SELinux, we'd be fine.
>
> I am not sure I understand this point.
> If the kernel is properly configured, the master toggle variable
> disappears, right?
> Or do you mean the disabling through modifications of the linked list of
> the hooks?

We might be talking past each-other. Right now, the LSM is marked with
__ro_after_init, which will make all the list structures, entries, etc
read-only already. There is one case where this is not true, and
that's for CONFIG_SECURITY_WRITABLE_HOOKS for
CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
SELinux. Are you talking about the SELinux policy installed during
early boot? Which things did you want to use smalloc() on?

>> It seems like smalloc pools could also be refcounted?
>
> I am not sure what you mean.
> What do you want to count?
> Number of pools? Nodes per pool? Allocations per node?

I meant things that point into an smalloc() pool could keep a refcount
and when nothing was pointing into it, it could be destroyed. (i.e.
using refcount garbage collection instead of explicit destruction.)

> And what for?

It might be easier to reason about later if allocations get complex.
It's certainly not required for the first version of this.

> At least in the case of tearing down a pool, when a module is unloaded,
> nobody needs to free anything that was allocated with smalloc.
> The teardown function will free the pages from each node.

Right, yeah.

>>>>> +#define NODE_HEADER                                    \
>>>>> +       struct {                                        \
>>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>>> +                       struct list_head list;          \
>>>>> +                       align_t *free;                  \
>>>>> +                       unsigned long available_words;  \
>>>>> +               };                                      \
>>>>> +       }
>>>
>>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> It's probably a sufficient starting point, depending on how the API
>> shakes out. Without unseal-write-seal properties, I case much less
>> about redzoning, etc.
>
> ok, but my question (I am not sure if it was clear) was about the use of
> a macro for the nameless structure that contains the header.

I don't really have an opinion on this. It might be more readable with
a named structure?

> One more thing: how should I tie this allocator to the rest?
> I have verified that is seems to work with both SLUB and SLAB.
> Can I make it depend on either of them being enabled?

It seems totally unrelated. The only relationship I see would be
interaction with hardened usercopy. In a perfect world, none of the
smalloc pools would be allowed to be copied to/from userspace, which
would make integration really easy: if smalloc_pool(ptr) return NOPE;
:P

> Should it be optionally enabled?
> What to default to, if it's not enabled? vmalloc?

I don't see any reason to make it optional.

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-28 18:23             ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-28 18:23 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> On 23/05/17 23:11, Kees Cook wrote:
>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> I meant this:
>>
>> CPU 1     CPU 2
>> create
>> alloc
>> write
>> seal
>> ...
>> unseal
>>                 write
>> write
>> seal
>>
>> The CPU 2 write would be, for example, an attacker using a
>> vulnerability to attempt to write to memory in the sealed area. All it
>> would need to do to succeed would be to trigger an action in the
>> kernel that would do a "legitimate" write (which requires the unseal),
>> and race it. Unsealing should be CPU-local, if the API is going to
>> support this kind of access.
>
> I see.
> If the CPU1 were to forcibly halt anything that can race with it, then
> it would be sure that there was no interference.

Correct. This is actually what ARM does for doing kernel memory
writing when poking stuff for kprobes, etc. It's rather dramatic,
though. :)

> A reactive approach could be, instead, to re-validate the content after
> the sealing, assuming that it is possible.

I would prefer to avoid this, as that allows an attacker to still have
made the changes (which could even result in them then disabling the
re-validation during the attack).

>> I am more concerned about _any_ unseal after initial seal. And even
>> then, it'd be nice to keep things CPU-local. My concerns are related
>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>> which is kind of like this, but focused on the .data section, not
>> dynamic memory. It has similar concerns about CPU-locality.
>> Additionally, even writing to memory and then making it read-only
>> later runs risks (see threads about BPF JIT races vs making things
>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>> doesn't change the risk this series is fixing: races with attacker
>> writes during assignment but before read-only marking).
>
> If you are talking about an attacker, rather than protection against
> accidental overwrites, how hashing can be enough?
> Couldn't the attacker compromise that too?

In theory, yes, though the goal was to dedicate a register to the
hash, which would make it hard/impossible for an attacker to reach.
(The BPF JIT situation is just an example of this kind of race,
though. I'm still in favor of reducing the write window to init-time
from full run-time.)

>> So, while smalloc would hugely reduce the window an attacker has
>> available to change data contents, this API doesn't eliminate it. (To
>> eliminate it, there would need to be a CPU-local page permission view
>> that let only the current CPU to the page, and then restore it to
>> read-only to match the global read-only view.)
>
> That or, if one is ready to take the hit, freeze every other possible
> attack vector. But I'm not sure this could be justifiable.

I would expect other people would NAK using "stop all other CPUs"
approach. Better to have the CPU-local writes.

>> Ah! In that case, sure. This isn't what the proposed API provided,
>> though, so let's adjust it to only perform the unseal at destroy time.
>> That makes it much saner, IMO. "Write once" dynamic allocations, or
>> "read-only after seal". woalloc? :P yay naming
>
> For now I'm still using smalloc.
> Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
> Other options might require some re-formatting.

Yeah, I don't have any better idea for names. :)

>> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
>> could just drop the runtime disabling of SELinux, we'd be fine.
>
> I am not sure I understand this point.
> If the kernel is properly configured, the master toggle variable
> disappears, right?
> Or do you mean the disabling through modifications of the linked list of
> the hooks?

We might be talking past each-other. Right now, the LSM is marked with
__ro_after_init, which will make all the list structures, entries, etc
read-only already. There is one case where this is not true, and
that's for CONFIG_SECURITY_WRITABLE_HOOKS for
CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
SELinux. Are you talking about the SELinux policy installed during
early boot? Which things did you want to use smalloc() on?

>> It seems like smalloc pools could also be refcounted?
>
> I am not sure what you mean.
> What do you want to count?
> Number of pools? Nodes per pool? Allocations per node?

I meant things that point into an smalloc() pool could keep a refcount
and when nothing was pointing into it, it could be destroyed. (i.e.
using refcount garbage collection instead of explicit destruction.)

> And what for?

It might be easier to reason about later if allocations get complex.
It's certainly not required for the first version of this.

> At least in the case of tearing down a pool, when a module is unloaded,
> nobody needs to free anything that was allocated with smalloc.
> The teardown function will free the pages from each node.

Right, yeah.

>>>>> +#define NODE_HEADER                                    \
>>>>> +       struct {                                        \
>>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>>> +                       struct list_head list;          \
>>>>> +                       align_t *free;                  \
>>>>> +                       unsigned long available_words;  \
>>>>> +               };                                      \
>>>>> +       }
>>>
>>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> It's probably a sufficient starting point, depending on how the API
>> shakes out. Without unseal-write-seal properties, I case much less
>> about redzoning, etc.
>
> ok, but my question (I am not sure if it was clear) was about the use of
> a macro for the nameless structure that contains the header.

I don't really have an opinion on this. It might be more readable with
a named structure?

> One more thing: how should I tie this allocator to the rest?
> I have verified that is seems to work with both SLUB and SLAB.
> Can I make it depend on either of them being enabled?

It seems totally unrelated. The only relationship I see would be
interaction with hardened usercopy. In a perfect world, none of the
smalloc pools would be allowed to be copied to/from userspace, which
would make integration really easy: if smalloc_pool(ptr) return NOPE;
:P

> Should it be optionally enabled?
> What to default to, if it's not enabled? vmalloc?

I don't see any reason to make it optional.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
  2017-05-28 18:23             ` Kees Cook
  (?)
@ 2017-05-28 18:56               ` Boris Lukashev
  -1 siblings, 0 replies; 68+ messages in thread
From: Boris Lukashev @ 2017-05-28 18:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

One-time sealable memory makes the most sense from a defensive
perspective - red team reads this stuff, the races mentioned will be
implemented as described to win the day, and probably in other
innovative ways. If a gap is left in the implementation, without
explicit coverage by an adjacent function, it will be used no matter
how small the chances of it occurring in the real world are - grooming
systems to create unlikely conditions is fair play (look at
eternalblue's SMB pool machinations).
However, out of tree modules will likely not appreciate this - third
party LSMs, tpe module, SCST, etc. I dont want to get into the "NIH"
debate, they're real functional components, used by real companies,
often enough that they need to be considered a member of the
ecosystem, even if not a first-order member.
So what about a middle ground where CoW semantics are used to enforce
the state of these allocations as RO, but provide a strictly
controlled pathway to read the RO data, copy and modify it, then write
and seal into a new allocation. Successful return from this process
should permit the page table to change the pointer to where the object
now resides, and initiate freeing of the original memory so long as a
refcount is kept for accesses. That way, sealable memory is sealed,
and any consumers reading it will be using the original ptr to the
original smalloc region. Attackers who do manage to change the
allocation by writing a new one still have to figure out how to change
the region used by an existing consumer. New consumers will get hit by
whatever they changed, but the change will be tracked and require them
to gain execution control over the allocator itself in order to affect
the change (or ROP chain something else into doing it, but thats's a
discussion on RAP/CFI).
CPU-local ops are great, if they dont halt the other cores. Stopping
all other CPUs is going to be DoA in HPC and other CPU intensive
workloads - think what ZFS would do if its pipelines kept getting
halted by something running a lot of smallocs (they get
non-preemptible often enough, requiring waits on both sides of the
op), how how LIO would behave - iSCSI waits for no man or woman, or
their allocator strategies. I'm all for "security should be more of a
concern than performance - its easier to build a faster car later if
you're still alive to do it," but keeping in mind who the consumers
are, i can easily see this functionality staying disabled in most
distributions and thus receiving much less testing and beatdown than
it should.
Lastly, my meager understanding is that PAX set the entire kernel as
RO, and implemented writeable access via pax_open/close. How were they
fighting against race conditions, and what is the benefit of specific
regions being allocated this way as opposed to the RO-all-the-things
approach which makes writes a specialized set of operations?

On Sun, May 28, 2017 at 2:23 PM, Kees Cook <keescook@google.com> wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> On 23/05/17 23:11, Kees Cook wrote:
>>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>>> I meant this:
>>>
>>> CPU 1     CPU 2
>>> create
>>> alloc
>>> write
>>> seal
>>> ...
>>> unseal
>>>                 write
>>> write
>>> seal
>>>
>>> The CPU 2 write would be, for example, an attacker using a
>>> vulnerability to attempt to write to memory in the sealed area. All it
>>> would need to do to succeed would be to trigger an action in the
>>> kernel that would do a "legitimate" write (which requires the unseal),
>>> and race it. Unsealing should be CPU-local, if the API is going to
>>> support this kind of access.
>>
>> I see.
>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
>
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)
>
>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
>
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).
>
>>> I am more concerned about _any_ unseal after initial seal. And even
>>> then, it'd be nice to keep things CPU-local. My concerns are related
>>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>>> which is kind of like this, but focused on the .data section, not
>>> dynamic memory. It has similar concerns about CPU-locality.
>>> Additionally, even writing to memory and then making it read-only
>>> later runs risks (see threads about BPF JIT races vs making things
>>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>>> doesn't change the risk this series is fixing: races with attacker
>>> writes during assignment but before read-only marking).
>>
>> If you are talking about an attacker, rather than protection against
>> accidental overwrites, how hashing can be enough?
>> Couldn't the attacker compromise that too?
>
> In theory, yes, though the goal was to dedicate a register to the
> hash, which would make it hard/impossible for an attacker to reach.
> (The BPF JIT situation is just an example of this kind of race,
> though. I'm still in favor of reducing the write window to init-time
> from full run-time.)
>
>>> So, while smalloc would hugely reduce the window an attacker has
>>> available to change data contents, this API doesn't eliminate it. (To
>>> eliminate it, there would need to be a CPU-local page permission view
>>> that let only the current CPU to the page, and then restore it to
>>> read-only to match the global read-only view.)
>>
>> That or, if one is ready to take the hit, freeze every other possible
>> attack vector. But I'm not sure this could be justifiable.
>
> I would expect other people would NAK using "stop all other CPUs"
> approach. Better to have the CPU-local writes.
>
>>> Ah! In that case, sure. This isn't what the proposed API provided,
>>> though, so let's adjust it to only perform the unseal at destroy time.
>>> That makes it much saner, IMO. "Write once" dynamic allocations, or
>>> "read-only after seal". woalloc? :P yay naming
>>
>> For now I'm still using smalloc.
>> Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
>> Other options might require some re-formatting.
>
> Yeah, I don't have any better idea for names. :)
>
>>> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
>>> could just drop the runtime disabling of SELinux, we'd be fine.
>>
>> I am not sure I understand this point.
>> If the kernel is properly configured, the master toggle variable
>> disappears, right?
>> Or do you mean the disabling through modifications of the linked list of
>> the hooks?
>
> We might be talking past each-other. Right now, the LSM is marked with
> __ro_after_init, which will make all the list structures, entries, etc
> read-only already. There is one case where this is not true, and
> that's for CONFIG_SECURITY_WRITABLE_HOOKS for
> CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
> SELinux. Are you talking about the SELinux policy installed during
> early boot? Which things did you want to use smalloc() on?
>
>>> It seems like smalloc pools could also be refcounted?
>>
>> I am not sure what you mean.
>> What do you want to count?
>> Number of pools? Nodes per pool? Allocations per node?
>
> I meant things that point into an smalloc() pool could keep a refcount
> and when nothing was pointing into it, it could be destroyed. (i.e.
> using refcount garbage collection instead of explicit destruction.)
>
>> And what for?
>
> It might be easier to reason about later if allocations get complex.
> It's certainly not required for the first version of this.
>
>> At least in the case of tearing down a pool, when a module is unloaded,
>> nobody needs to free anything that was allocated with smalloc.
>> The teardown function will free the pages from each node.
>
> Right, yeah.
>
>>>>>> +#define NODE_HEADER                                    \
>>>>>> +       struct {                                        \
>>>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>>>> +                       struct list_head list;          \
>>>>>> +                       align_t *free;                  \
>>>>>> +                       unsigned long available_words;  \
>>>>>> +               };                                      \
>>>>>> +       }
>>>>
>>>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> It's probably a sufficient starting point, depending on how the API
>>> shakes out. Without unseal-write-seal properties, I case much less
>>> about redzoning, etc.
>>
>> ok, but my question (I am not sure if it was clear) was about the use of
>> a macro for the nameless structure that contains the header.
>
> I don't really have an opinion on this. It might be more readable with
> a named structure?
>
>> One more thing: how should I tie this allocator to the rest?
>> I have verified that is seems to work with both SLUB and SLAB.
>> Can I make it depend on either of them being enabled?
>
> It seems totally unrelated. The only relationship I see would be
> interaction with hardened usercopy. In a perfect world, none of the
> smalloc pools would be allowed to be copied to/from userspace, which
> would make integration really easy: if smalloc_pool(ptr) return NOPE;
> :P
>
>> Should it be optionally enabled?
>> What to default to, if it's not enabled? vmalloc?
>
> I don't see any reason to make it optional.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-28 18:56               ` Boris Lukashev
  0 siblings, 0 replies; 68+ messages in thread
From: Boris Lukashev @ 2017-05-28 18:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

One-time sealable memory makes the most sense from a defensive
perspective - red team reads this stuff, the races mentioned will be
implemented as described to win the day, and probably in other
innovative ways. If a gap is left in the implementation, without
explicit coverage by an adjacent function, it will be used no matter
how small the chances of it occurring in the real world are - grooming
systems to create unlikely conditions is fair play (look at
eternalblue's SMB pool machinations).
However, out of tree modules will likely not appreciate this - third
party LSMs, tpe module, SCST, etc. I dont want to get into the "NIH"
debate, they're real functional components, used by real companies,
often enough that they need to be considered a member of the
ecosystem, even if not a first-order member.
So what about a middle ground where CoW semantics are used to enforce
the state of these allocations as RO, but provide a strictly
controlled pathway to read the RO data, copy and modify it, then write
and seal into a new allocation. Successful return from this process
should permit the page table to change the pointer to where the object
now resides, and initiate freeing of the original memory so long as a
refcount is kept for accesses. That way, sealable memory is sealed,
and any consumers reading it will be using the original ptr to the
original smalloc region. Attackers who do manage to change the
allocation by writing a new one still have to figure out how to change
the region used by an existing consumer. New consumers will get hit by
whatever they changed, but the change will be tracked and require them
to gain execution control over the allocator itself in order to affect
the change (or ROP chain something else into doing it, but thats's a
discussion on RAP/CFI).
CPU-local ops are great, if they dont halt the other cores. Stopping
all other CPUs is going to be DoA in HPC and other CPU intensive
workloads - think what ZFS would do if its pipelines kept getting
halted by something running a lot of smallocs (they get
non-preemptible often enough, requiring waits on both sides of the
op), how how LIO would behave - iSCSI waits for no man or woman, or
their allocator strategies. I'm all for "security should be more of a
concern than performance - its easier to build a faster car later if
you're still alive to do it," but keeping in mind who the consumers
are, i can easily see this functionality staying disabled in most
distributions and thus receiving much less testing and beatdown than
it should.
Lastly, my meager understanding is that PAX set the entire kernel as
RO, and implemented writeable access via pax_open/close. How were they
fighting against race conditions, and what is the benefit of specific
regions being allocated this way as opposed to the RO-all-the-things
approach which makes writes a specialized set of operations?

On Sun, May 28, 2017 at 2:23 PM, Kees Cook <keescook@google.com> wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> On 23/05/17 23:11, Kees Cook wrote:
>>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>>> I meant this:
>>>
>>> CPU 1     CPU 2
>>> create
>>> alloc
>>> write
>>> seal
>>> ...
>>> unseal
>>>                 write
>>> write
>>> seal
>>>
>>> The CPU 2 write would be, for example, an attacker using a
>>> vulnerability to attempt to write to memory in the sealed area. All it
>>> would need to do to succeed would be to trigger an action in the
>>> kernel that would do a "legitimate" write (which requires the unseal),
>>> and race it. Unsealing should be CPU-local, if the API is going to
>>> support this kind of access.
>>
>> I see.
>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
>
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)
>
>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
>
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).
>
>>> I am more concerned about _any_ unseal after initial seal. And even
>>> then, it'd be nice to keep things CPU-local. My concerns are related
>>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>>> which is kind of like this, but focused on the .data section, not
>>> dynamic memory. It has similar concerns about CPU-locality.
>>> Additionally, even writing to memory and then making it read-only
>>> later runs risks (see threads about BPF JIT races vs making things
>>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>>> doesn't change the risk this series is fixing: races with attacker
>>> writes during assignment but before read-only marking).
>>
>> If you are talking about an attacker, rather than protection against
>> accidental overwrites, how hashing can be enough?
>> Couldn't the attacker compromise that too?
>
> In theory, yes, though the goal was to dedicate a register to the
> hash, which would make it hard/impossible for an attacker to reach.
> (The BPF JIT situation is just an example of this kind of race,
> though. I'm still in favor of reducing the write window to init-time
> from full run-time.)
>
>>> So, while smalloc would hugely reduce the window an attacker has
>>> available to change data contents, this API doesn't eliminate it. (To
>>> eliminate it, there would need to be a CPU-local page permission view
>>> that let only the current CPU to the page, and then restore it to
>>> read-only to match the global read-only view.)
>>
>> That or, if one is ready to take the hit, freeze every other possible
>> attack vector. But I'm not sure this could be justifiable.
>
> I would expect other people would NAK using "stop all other CPUs"
> approach. Better to have the CPU-local writes.
>
>>> Ah! In that case, sure. This isn't what the proposed API provided,
>>> though, so let's adjust it to only perform the unseal at destroy time.
>>> That makes it much saner, IMO. "Write once" dynamic allocations, or
>>> "read-only after seal". woalloc? :P yay naming
>>
>> For now I'm still using smalloc.
>> Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
>> Other options might require some re-formatting.
>
> Yeah, I don't have any better idea for names. :)
>
>>> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
>>> could just drop the runtime disabling of SELinux, we'd be fine.
>>
>> I am not sure I understand this point.
>> If the kernel is properly configured, the master toggle variable
>> disappears, right?
>> Or do you mean the disabling through modifications of the linked list of
>> the hooks?
>
> We might be talking past each-other. Right now, the LSM is marked with
> __ro_after_init, which will make all the list structures, entries, etc
> read-only already. There is one case where this is not true, and
> that's for CONFIG_SECURITY_WRITABLE_HOOKS for
> CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
> SELinux. Are you talking about the SELinux policy installed during
> early boot? Which things did you want to use smalloc() on?
>
>>> It seems like smalloc pools could also be refcounted?
>>
>> I am not sure what you mean.
>> What do you want to count?
>> Number of pools? Nodes per pool? Allocations per node?
>
> I meant things that point into an smalloc() pool could keep a refcount
> and when nothing was pointing into it, it could be destroyed. (i.e.
> using refcount garbage collection instead of explicit destruction.)
>
>> And what for?
>
> It might be easier to reason about later if allocations get complex.
> It's certainly not required for the first version of this.
>
>> At least in the case of tearing down a pool, when a module is unloaded,
>> nobody needs to free anything that was allocated with smalloc.
>> The teardown function will free the pages from each node.
>
> Right, yeah.
>
>>>>>> +#define NODE_HEADER                                    \
>>>>>> +       struct {                                        \
>>>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>>>> +                       struct list_head list;          \
>>>>>> +                       align_t *free;                  \
>>>>>> +                       unsigned long available_words;  \
>>>>>> +               };                                      \
>>>>>> +       }
>>>>
>>>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> It's probably a sufficient starting point, depending on how the API
>>> shakes out. Without unseal-write-seal properties, I case much less
>>> about redzoning, etc.
>>
>> ok, but my question (I am not sure if it was clear) was about the use of
>> a macro for the nameless structure that contains the header.
>
> I don't really have an opinion on this. It might be more readable with
> a named structure?
>
>> One more thing: how should I tie this allocator to the rest?
>> I have verified that is seems to work with both SLUB and SLAB.
>> Can I make it depend on either of them being enabled?
>
> It seems totally unrelated. The only relationship I see would be
> interaction with hardened usercopy. In a perfect world, none of the
> smalloc pools would be allowed to be copied to/from userspace, which
> would make integration really easy: if smalloc_pool(ptr) return NOPE;
> :P
>
>> Should it be optionally enabled?
>> What to default to, if it's not enabled? vmalloc?
>
> I don't see any reason to make it optional.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-28 18:56               ` Boris Lukashev
  0 siblings, 0 replies; 68+ messages in thread
From: Boris Lukashev @ 2017-05-28 18:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

One-time sealable memory makes the most sense from a defensive
perspective - red team reads this stuff, the races mentioned will be
implemented as described to win the day, and probably in other
innovative ways. If a gap is left in the implementation, without
explicit coverage by an adjacent function, it will be used no matter
how small the chances of it occurring in the real world are - grooming
systems to create unlikely conditions is fair play (look at
eternalblue's SMB pool machinations).
However, out of tree modules will likely not appreciate this - third
party LSMs, tpe module, SCST, etc. I dont want to get into the "NIH"
debate, they're real functional components, used by real companies,
often enough that they need to be considered a member of the
ecosystem, even if not a first-order member.
So what about a middle ground where CoW semantics are used to enforce
the state of these allocations as RO, but provide a strictly
controlled pathway to read the RO data, copy and modify it, then write
and seal into a new allocation. Successful return from this process
should permit the page table to change the pointer to where the object
now resides, and initiate freeing of the original memory so long as a
refcount is kept for accesses. That way, sealable memory is sealed,
and any consumers reading it will be using the original ptr to the
original smalloc region. Attackers who do manage to change the
allocation by writing a new one still have to figure out how to change
the region used by an existing consumer. New consumers will get hit by
whatever they changed, but the change will be tracked and require them
to gain execution control over the allocator itself in order to affect
the change (or ROP chain something else into doing it, but thats's a
discussion on RAP/CFI).
CPU-local ops are great, if they dont halt the other cores. Stopping
all other CPUs is going to be DoA in HPC and other CPU intensive
workloads - think what ZFS would do if its pipelines kept getting
halted by something running a lot of smallocs (they get
non-preemptible often enough, requiring waits on both sides of the
op), how how LIO would behave - iSCSI waits for no man or woman, or
their allocator strategies. I'm all for "security should be more of a
concern than performance - its easier to build a faster car later if
you're still alive to do it," but keeping in mind who the consumers
are, i can easily see this functionality staying disabled in most
distributions and thus receiving much less testing and beatdown than
it should.
Lastly, my meager understanding is that PAX set the entire kernel as
RO, and implemented writeable access via pax_open/close. How were they
fighting against race conditions, and what is the benefit of specific
regions being allocated this way as opposed to the RO-all-the-things
approach which makes writes a specialized set of operations?

On Sun, May 28, 2017 at 2:23 PM, Kees Cook <keescook@google.com> wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> On 23/05/17 23:11, Kees Cook wrote:
>>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>>> I meant this:
>>>
>>> CPU 1     CPU 2
>>> create
>>> alloc
>>> write
>>> seal
>>> ...
>>> unseal
>>>                 write
>>> write
>>> seal
>>>
>>> The CPU 2 write would be, for example, an attacker using a
>>> vulnerability to attempt to write to memory in the sealed area. All it
>>> would need to do to succeed would be to trigger an action in the
>>> kernel that would do a "legitimate" write (which requires the unseal),
>>> and race it. Unsealing should be CPU-local, if the API is going to
>>> support this kind of access.
>>
>> I see.
>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
>
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)
>
>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
>
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).
>
>>> I am more concerned about _any_ unseal after initial seal. And even
>>> then, it'd be nice to keep things CPU-local. My concerns are related
>>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>>> which is kind of like this, but focused on the .data section, not
>>> dynamic memory. It has similar concerns about CPU-locality.
>>> Additionally, even writing to memory and then making it read-only
>>> later runs risks (see threads about BPF JIT races vs making things
>>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>>> doesn't change the risk this series is fixing: races with attacker
>>> writes during assignment but before read-only marking).
>>
>> If you are talking about an attacker, rather than protection against
>> accidental overwrites, how hashing can be enough?
>> Couldn't the attacker compromise that too?
>
> In theory, yes, though the goal was to dedicate a register to the
> hash, which would make it hard/impossible for an attacker to reach.
> (The BPF JIT situation is just an example of this kind of race,
> though. I'm still in favor of reducing the write window to init-time
> from full run-time.)
>
>>> So, while smalloc would hugely reduce the window an attacker has
>>> available to change data contents, this API doesn't eliminate it. (To
>>> eliminate it, there would need to be a CPU-local page permission view
>>> that let only the current CPU to the page, and then restore it to
>>> read-only to match the global read-only view.)
>>
>> That or, if one is ready to take the hit, freeze every other possible
>> attack vector. But I'm not sure this could be justifiable.
>
> I would expect other people would NAK using "stop all other CPUs"
> approach. Better to have the CPU-local writes.
>
>>> Ah! In that case, sure. This isn't what the proposed API provided,
>>> though, so let's adjust it to only perform the unseal at destroy time.
>>> That makes it much saner, IMO. "Write once" dynamic allocations, or
>>> "read-only after seal". woalloc? :P yay naming
>>
>> For now I'm still using smalloc.
>> Anything that is either [x]malloc or [yz]malloc is fine, lengthwise.
>> Other options might require some re-formatting.
>
> Yeah, I don't have any better idea for names. :)
>
>>> Ah, okay. Most of the LSM is happily covered by __ro_after_init. If we
>>> could just drop the runtime disabling of SELinux, we'd be fine.
>>
>> I am not sure I understand this point.
>> If the kernel is properly configured, the master toggle variable
>> disappears, right?
>> Or do you mean the disabling through modifications of the linked list of
>> the hooks?
>
> We might be talking past each-other. Right now, the LSM is marked with
> __ro_after_init, which will make all the list structures, entries, etc
> read-only already. There is one case where this is not true, and
> that's for CONFIG_SECURITY_WRITABLE_HOOKS for
> CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
> SELinux. Are you talking about the SELinux policy installed during
> early boot? Which things did you want to use smalloc() on?
>
>>> It seems like smalloc pools could also be refcounted?
>>
>> I am not sure what you mean.
>> What do you want to count?
>> Number of pools? Nodes per pool? Allocations per node?
>
> I meant things that point into an smalloc() pool could keep a refcount
> and when nothing was pointing into it, it could be destroyed. (i.e.
> using refcount garbage collection instead of explicit destruction.)
>
>> And what for?
>
> It might be easier to reason about later if allocations get complex.
> It's certainly not required for the first version of this.
>
>> At least in the case of tearing down a pool, when a module is unloaded,
>> nobody needs to free anything that was allocated with smalloc.
>> The teardown function will free the pages from each node.
>
> Right, yeah.
>
>>>>>> +#define NODE_HEADER                                    \
>>>>>> +       struct {                                        \
>>>>>> +               __SMALLOC_ALIGNED__ struct {            \
>>>>>> +                       struct list_head list;          \
>>>>>> +                       align_t *free;                  \
>>>>>> +                       unsigned long available_words;  \
>>>>>> +               };                                      \
>>>>>> +       }
>>>>
>>>> Does this look ok? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> It's probably a sufficient starting point, depending on how the API
>>> shakes out. Without unseal-write-seal properties, I case much less
>>> about redzoning, etc.
>>
>> ok, but my question (I am not sure if it was clear) was about the use of
>> a macro for the nameless structure that contains the header.
>
> I don't really have an opinion on this. It might be more readable with
> a named structure?
>
>> One more thing: how should I tie this allocator to the rest?
>> I have verified that is seems to work with both SLUB and SLAB.
>> Can I make it depend on either of them being enabled?
>
> It seems totally unrelated. The only relationship I see would be
> interaction with hardened usercopy. In a perfect world, none of the
> smalloc pools would be allowed to be copied to/from userspace, which
> would make integration really easy: if smalloc_pool(ptr) return NOPE;
> :P
>
>> Should it be optionally enabled?
>> What to default to, if it's not enabled? vmalloc?
>
> I don't see any reason to make it optional.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
  2017-05-28 18:56               ` Boris Lukashev
  (?)
@ 2017-05-28 21:32                 ` Kees Cook
  -1 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-28 21:32 UTC (permalink / raw)
  To: Boris Lukashev
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
<blukashev@sempervictus.com> wrote:
> So what about a middle ground where CoW semantics are used to enforce
> the state of these allocations as RO, but provide a strictly
> controlled pathway to read the RO data, copy and modify it, then write
> and seal into a new allocation. Successful return from this process
> should permit the page table to change the pointer to where the object
> now resides, and initiate freeing of the original memory so long as a
> refcount is kept for accesses. That way, sealable memory is sealed,
> and any consumers reading it will be using the original ptr to the
> original smalloc region. Attackers who do manage to change the

This could be another way to do it, yeah, and it helps that smalloc()
is built on vmalloc(). It'd require some careful design, but it could
be a way forward after this initial sealed-after-init version goes in.

> Lastly, my meager understanding is that PAX set the entire kernel as
> RO, and implemented writeable access via pax_open/close. How were they
> fighting against race conditions, and what is the benefit of specific
> regions being allocated this way as opposed to the RO-all-the-things
> approach which makes writes a specialized set of operations?

My understanding is that PaX's KERNEXEC with the constification plugin
moves a substantial portion of the kernel's .data section
(effectively) into the .rodata section. It's not the "entire" kernel.
(Well, depending on how you count. The .text section is already
read-only upstream.) PaX, as far as I know, provided no dynamic memory
allocation protections, like smalloc() would provide.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-28 21:32                 ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-28 21:32 UTC (permalink / raw)
  To: Boris Lukashev
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
<blukashev@sempervictus.com> wrote:
> So what about a middle ground where CoW semantics are used to enforce
> the state of these allocations as RO, but provide a strictly
> controlled pathway to read the RO data, copy and modify it, then write
> and seal into a new allocation. Successful return from this process
> should permit the page table to change the pointer to where the object
> now resides, and initiate freeing of the original memory so long as a
> refcount is kept for accesses. That way, sealable memory is sealed,
> and any consumers reading it will be using the original ptr to the
> original smalloc region. Attackers who do manage to change the

This could be another way to do it, yeah, and it helps that smalloc()
is built on vmalloc(). It'd require some careful design, but it could
be a way forward after this initial sealed-after-init version goes in.

> Lastly, my meager understanding is that PAX set the entire kernel as
> RO, and implemented writeable access via pax_open/close. How were they
> fighting against race conditions, and what is the benefit of specific
> regions being allocated this way as opposed to the RO-all-the-things
> approach which makes writes a specialized set of operations?

My understanding is that PaX's KERNEXEC with the constification plugin
moves a substantial portion of the kernel's .data section
(effectively) into the .rodata section. It's not the "entire" kernel.
(Well, depending on how you count. The .text section is already
read-only upstream.) PaX, as far as I know, provided no dynamic memory
allocation protections, like smalloc() would provide.

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-28 21:32                 ` Kees Cook
  0 siblings, 0 replies; 68+ messages in thread
From: Kees Cook @ 2017-05-28 21:32 UTC (permalink / raw)
  To: Boris Lukashev
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
<blukashev@sempervictus.com> wrote:
> So what about a middle ground where CoW semantics are used to enforce
> the state of these allocations as RO, but provide a strictly
> controlled pathway to read the RO data, copy and modify it, then write
> and seal into a new allocation. Successful return from this process
> should permit the page table to change the pointer to where the object
> now resides, and initiate freeing of the original memory so long as a
> refcount is kept for accesses. That way, sealable memory is sealed,
> and any consumers reading it will be using the original ptr to the
> original smalloc region. Attackers who do manage to change the

This could be another way to do it, yeah, and it helps that smalloc()
is built on vmalloc(). It'd require some careful design, but it could
be a way forward after this initial sealed-after-init version goes in.

> Lastly, my meager understanding is that PAX set the entire kernel as
> RO, and implemented writeable access via pax_open/close. How were they
> fighting against race conditions, and what is the benefit of specific
> regions being allocated this way as opposed to the RO-all-the-things
> approach which makes writes a specialized set of operations?

My understanding is that PaX's KERNEXEC with the constification plugin
moves a substantial portion of the kernel's .data section
(effectively) into the .rodata section. It's not the "entire" kernel.
(Well, depending on how you count. The .text section is already
read-only upstream.) PaX, as far as I know, provided no dynamic memory
allocation protections, like smalloc() would provide.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
  2017-05-28 21:32                 ` Kees Cook
  (?)
@ 2017-05-29  6:04                   ` Boris Lukashev
  -1 siblings, 0 replies; 68+ messages in thread
From: Boris Lukashev @ 2017-05-29  6:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

If i understand the current direction for smalloc, its to implement it
without the ability to "unseal," which has implications on how LSM
implementations and other users of these dynamic allocations handle
things. If its implemented without a writeable interface for modules
which need it, then switched to a CoW (or other
controlled/isolated/hardened) mechanism for creating writeable
segments in dynamic allocations, that would create another set of
breaking changes for the consumers adapting to the first set. In the
spirit of adopting things faster/smoother, maybe it makes sense to
permit the smalloc re-writing method suggested in the first
implementation; with an API which consumers can later use for a more
secure approach that reduces the attack surface without breaking
consumer interfaces (or internal semantics). A sysctl affecting the
behavior as ro-only or ro-and-sometimes-rw would give users the
flexibility to tune their environment per their needs, which should
also reduce potential conflict/complaints.

On Sun, May 28, 2017 at 5:32 PM, Kees Cook <keescook@google.com> wrote:
> On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
> <blukashev@sempervictus.com> wrote:
>> So what about a middle ground where CoW semantics are used to enforce
>> the state of these allocations as RO, but provide a strictly
>> controlled pathway to read the RO data, copy and modify it, then write
>> and seal into a new allocation. Successful return from this process
>> should permit the page table to change the pointer to where the object
>> now resides, and initiate freeing of the original memory so long as a
>> refcount is kept for accesses. That way, sealable memory is sealed,
>> and any consumers reading it will be using the original ptr to the
>> original smalloc region. Attackers who do manage to change the
>
> This could be another way to do it, yeah, and it helps that smalloc()
> is built on vmalloc(). It'd require some careful design, but it could
> be a way forward after this initial sealed-after-init version goes in.
>
>> Lastly, my meager understanding is that PAX set the entire kernel as
>> RO, and implemented writeable access via pax_open/close. How were they
>> fighting against race conditions, and what is the benefit of specific
>> regions being allocated this way as opposed to the RO-all-the-things
>> approach which makes writes a specialized set of operations?
>
> My understanding is that PaX's KERNEXEC with the constification plugin
> moves a substantial portion of the kernel's .data section
> (effectively) into the .rodata section. It's not the "entire" kernel.
> (Well, depending on how you count. The .text section is already
> read-only upstream.) PaX, as far as I know, provided no dynamic memory
> allocation protections, like smalloc() would provide.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-29  6:04                   ` Boris Lukashev
  0 siblings, 0 replies; 68+ messages in thread
From: Boris Lukashev @ 2017-05-29  6:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

If i understand the current direction for smalloc, its to implement it
without the ability to "unseal," which has implications on how LSM
implementations and other users of these dynamic allocations handle
things. If its implemented without a writeable interface for modules
which need it, then switched to a CoW (or other
controlled/isolated/hardened) mechanism for creating writeable
segments in dynamic allocations, that would create another set of
breaking changes for the consumers adapting to the first set. In the
spirit of adopting things faster/smoother, maybe it makes sense to
permit the smalloc re-writing method suggested in the first
implementation; with an API which consumers can later use for a more
secure approach that reduces the attack surface without breaking
consumer interfaces (or internal semantics). A sysctl affecting the
behavior as ro-only or ro-and-sometimes-rw would give users the
flexibility to tune their environment per their needs, which should
also reduce potential conflict/complaints.

On Sun, May 28, 2017 at 5:32 PM, Kees Cook <keescook@google.com> wrote:
> On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
> <blukashev@sempervictus.com> wrote:
>> So what about a middle ground where CoW semantics are used to enforce
>> the state of these allocations as RO, but provide a strictly
>> controlled pathway to read the RO data, copy and modify it, then write
>> and seal into a new allocation. Successful return from this process
>> should permit the page table to change the pointer to where the object
>> now resides, and initiate freeing of the original memory so long as a
>> refcount is kept for accesses. That way, sealable memory is sealed,
>> and any consumers reading it will be using the original ptr to the
>> original smalloc region. Attackers who do manage to change the
>
> This could be another way to do it, yeah, and it helps that smalloc()
> is built on vmalloc(). It'd require some careful design, but it could
> be a way forward after this initial sealed-after-init version goes in.
>
>> Lastly, my meager understanding is that PAX set the entire kernel as
>> RO, and implemented writeable access via pax_open/close. How were they
>> fighting against race conditions, and what is the benefit of specific
>> regions being allocated this way as opposed to the RO-all-the-things
>> approach which makes writes a specialized set of operations?
>
> My understanding is that PaX's KERNEXEC with the constification plugin
> moves a substantial portion of the kernel's .data section
> (effectively) into the .rodata section. It's not the "entire" kernel.
> (Well, depending on how you count. The .text section is already
> read-only upstream.) PaX, as far as I know, provided no dynamic memory
> allocation protections, like smalloc() would provide.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-29  6:04                   ` Boris Lukashev
  0 siblings, 0 replies; 68+ messages in thread
From: Boris Lukashev @ 2017-05-29  6:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: Igor Stoppa, Casey Schaufler, Michal Hocko, Dave Hansen,
	Laura Abbott, Linux-MM, kernel-hardening, LKML, Daniel Micay,
	Greg KH, James Morris, Stephen Smalley

If i understand the current direction for smalloc, its to implement it
without the ability to "unseal," which has implications on how LSM
implementations and other users of these dynamic allocations handle
things. If its implemented without a writeable interface for modules
which need it, then switched to a CoW (or other
controlled/isolated/hardened) mechanism for creating writeable
segments in dynamic allocations, that would create another set of
breaking changes for the consumers adapting to the first set. In the
spirit of adopting things faster/smoother, maybe it makes sense to
permit the smalloc re-writing method suggested in the first
implementation; with an API which consumers can later use for a more
secure approach that reduces the attack surface without breaking
consumer interfaces (or internal semantics). A sysctl affecting the
behavior as ro-only or ro-and-sometimes-rw would give users the
flexibility to tune their environment per their needs, which should
also reduce potential conflict/complaints.

On Sun, May 28, 2017 at 5:32 PM, Kees Cook <keescook@google.com> wrote:
> On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
> <blukashev@sempervictus.com> wrote:
>> So what about a middle ground where CoW semantics are used to enforce
>> the state of these allocations as RO, but provide a strictly
>> controlled pathway to read the RO data, copy and modify it, then write
>> and seal into a new allocation. Successful return from this process
>> should permit the page table to change the pointer to where the object
>> now resides, and initiate freeing of the original memory so long as a
>> refcount is kept for accesses. That way, sealable memory is sealed,
>> and any consumers reading it will be using the original ptr to the
>> original smalloc region. Attackers who do manage to change the
>
> This could be another way to do it, yeah, and it helps that smalloc()
> is built on vmalloc(). It'd require some careful design, but it could
> be a way forward after this initial sealed-after-init version goes in.
>
>> Lastly, my meager understanding is that PAX set the entire kernel as
>> RO, and implemented writeable access via pax_open/close. How were they
>> fighting against race conditions, and what is the benefit of specific
>> regions being allocated this way as opposed to the RO-all-the-things
>> approach which makes writes a specialized set of operations?
>
> My understanding is that PaX's KERNEXEC with the constification plugin
> moves a substantial portion of the kernel's .data section
> (effectively) into the .rodata section. It's not the "entire" kernel.
> (Well, depending on how you count. The .text section is already
> read-only upstream.) PaX, as far as I know, provided no dynamic memory
> allocation protections, like smalloc() would provide.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-19 10:38   ` Igor Stoppa
  (?)
@ 2017-05-31 13:55     ` kbuild test robot
  -1 siblings, 0 replies; 68+ messages in thread
From: kbuild test robot @ 2017-05-31 13:55 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, mhocko, dave.hansen, labbott, linux-mm,
	kernel-hardening, linux-kernel, Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

Hi Igor,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.12-rc3 next-20170531]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/Sealable-memory-support/20170522-163525
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-v0-05311736 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from mm/smalloc.c:24:
>> mm/smalloc.h:47: error: flexible array member in otherwise empty struct
   mm/smalloc.c: In function 'smalloc_seal_set':
   mm/smalloc.c:135: error: implicit declaration of function 'set_memory_ro'
   mm/smalloc.c:138: error: implicit declaration of function 'set_memory_rw'

vim +47 mm/smalloc.h

    41		struct mutex lock;
    42		enum seal_t seal;
    43	};
    44	
    45	struct smalloc_node {
    46		NODE_HEADER;
  > 47		__SMALLOC_ALIGNED__ align_t data[];
    48	};
    49	
    50	#define smalloc_seal(pool) \

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25394 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-05-31 13:55     ` kbuild test robot
  0 siblings, 0 replies; 68+ messages in thread
From: kbuild test robot @ 2017-05-31 13:55 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, mhocko, dave.hansen, labbott, linux-mm,
	kernel-hardening, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

Hi Igor,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.12-rc3 next-20170531]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/Sealable-memory-support/20170522-163525
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-v0-05311736 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from mm/smalloc.c:24:
>> mm/smalloc.h:47: error: flexible array member in otherwise empty struct
   mm/smalloc.c: In function 'smalloc_seal_set':
   mm/smalloc.c:135: error: implicit declaration of function 'set_memory_ro'
   mm/smalloc.c:138: error: implicit declaration of function 'set_memory_rw'

vim +47 mm/smalloc.h

    41		struct mutex lock;
    42		enum seal_t seal;
    43	};
    44	
    45	struct smalloc_node {
    46		NODE_HEADER;
  > 47		__SMALLOC_ALIGNED__ align_t data[];
    48	};
    49	
    50	#define smalloc_seal(pool) \

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25394 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-31 13:55     ` kbuild test robot
  0 siblings, 0 replies; 68+ messages in thread
From: kbuild test robot @ 2017-05-31 13:55 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, mhocko, dave.hansen, labbott, linux-mm,
	kernel-hardening, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

Hi Igor,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.12-rc3 next-20170531]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/Sealable-memory-support/20170522-163525
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-v0-05311736 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from mm/smalloc.c:24:
>> mm/smalloc.h:47: error: flexible array member in otherwise empty struct
   mm/smalloc.c: In function 'smalloc_seal_set':
   mm/smalloc.c:135: error: implicit declaration of function 'set_memory_ro'
   mm/smalloc.c:138: error: implicit declaration of function 'set_memory_rw'

vim +47 mm/smalloc.h

    41		struct mutex lock;
    42		enum seal_t seal;
    43	};
    44	
    45	struct smalloc_node {
    46		NODE_HEADER;
  > 47		__SMALLOC_ALIGNED__ align_t data[];
    48	};
    49	
    50	#define smalloc_seal(pool) \

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25394 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-28 18:23             ` Kees Cook
  (?)
@ 2017-05-31 21:22               ` Igor Stoppa
  -1 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-31 21:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On 28/05/17 21:23, Kees Cook wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
> 
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)

ok

>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
> 
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).

ok


[...]

>> If you are talking about an attacker, rather than protection against
>> accidental overwrites, how hashing can be enough?
>> Couldn't the attacker compromise that too?
> 
> In theory, yes, though the goal was to dedicate a register to the
> hash, which would make it hard/impossible for an attacker to reach.
> (The BPF JIT situation is just an example of this kind of race,
> though. I'm still in favor of reducing the write window to init-time
> from full run-time.)

It looks like some compromise is needed between reliability and
performance ...

[...]

> I would expect other people would NAK using "stop all other CPUs"
> approach. Better to have the CPU-local writes.

ok, that could be a feature for the follow-up?

[...]

> Yeah, I don't have any better idea for names. :)

still clinging to smalloc for now, how about pmalloc,
as "protected malloc"?


> We might be talking past each-other. Right now, the LSM is marked with
> __ro_after_init, which will make all the list structures, entries, etc
> read-only already. There is one case where this is not true, and
> that's for CONFIG_SECURITY_WRITABLE_HOOKS for
> CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
> SELinux. 

On this I'll just shut up and provide the patch. It seems easier :-)

> Are you talking about the SELinux policy installed during
> early boot? Which things did you want to use smalloc() on?

The policy DB.
And of course the policy cache has to go, as in: it must be possible to
disable it completely, in favor of re-evaluating the policies, when needed.

I'm pretty sure that it is absolutely needed in certain cases, but in
others I think the re-evaluation is negligible in comparison to the
overall duration of the use cases affected.

So it should be ok to let the system integrator gauge speed (if any) vs
resilience, by enablying/disabling the SE Linux policy cache.


>>> It seems like smalloc pools could also be refcounted?
> 
> I meant things that point into an smalloc() pool could keep a refcount
> and when nothing was pointing into it, it could be destroyed. (i.e.
> using refcount garbage collection instead of explicit destruction.)

ok, but it depends on the level of paranoia that we are talking about.

Counting smalloc vs sfree is easy.

Verifying that the free invocations are actually aligned with previous
allocations is a bit more onerous.

Validating that the free is done from the same entity that invoked the
smalloc might be hard.

>> And what for?
> 
> It might be easier to reason about later if allocations get complex.
> It's certainly not required for the first version of this.

good :-)
more clarifications are needed

[...]

> I don't really have an opinion on this. It might be more readable with
> a named structure?

Eventually I got rid of the macro.
The structure is intentionally unnamed to stress the grouping, without
adding a layer of indirection. But maybe it could be removed. I need to
test it.

>> One more thing: how should I tie this allocator to the rest?
>> I have verified that is seems to work with both SLUB and SLAB.
>> Can I make it depend on either of them being enabled?
> 
> It seems totally unrelated. The only relationship I see would be
> interaction with hardened usercopy. In a perfect world, none of the
> smalloc pools would be allowed to be copied to/from userspace, which
> would make integration really easy: if smalloc_pool(ptr) return NOPE;
> :P

I went down the harder path and implemented the check.

>> Should it be optionally enabled?
>> What to default to, if it's not enabled? vmalloc?
> 
> I don't see any reason to make it optional.

great \o/

So now I'm back to the LSM use case.
It shouldn't take too long.

--
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-05-31 21:22               ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-31 21:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On 28/05/17 21:23, Kees Cook wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
> 
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)

ok

>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
> 
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).

ok


[...]

>> If you are talking about an attacker, rather than protection against
>> accidental overwrites, how hashing can be enough?
>> Couldn't the attacker compromise that too?
> 
> In theory, yes, though the goal was to dedicate a register to the
> hash, which would make it hard/impossible for an attacker to reach.
> (The BPF JIT situation is just an example of this kind of race,
> though. I'm still in favor of reducing the write window to init-time
> from full run-time.)

It looks like some compromise is needed between reliability and
performance ...

[...]

> I would expect other people would NAK using "stop all other CPUs"
> approach. Better to have the CPU-local writes.

ok, that could be a feature for the follow-up?

[...]

> Yeah, I don't have any better idea for names. :)

still clinging to smalloc for now, how about pmalloc,
as "protected malloc"?


> We might be talking past each-other. Right now, the LSM is marked with
> __ro_after_init, which will make all the list structures, entries, etc
> read-only already. There is one case where this is not true, and
> that's for CONFIG_SECURITY_WRITABLE_HOOKS for
> CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
> SELinux. 

On this I'll just shut up and provide the patch. It seems easier :-)

> Are you talking about the SELinux policy installed during
> early boot? Which things did you want to use smalloc() on?

The policy DB.
And of course the policy cache has to go, as in: it must be possible to
disable it completely, in favor of re-evaluating the policies, when needed.

I'm pretty sure that it is absolutely needed in certain cases, but in
others I think the re-evaluation is negligible in comparison to the
overall duration of the use cases affected.

So it should be ok to let the system integrator gauge speed (if any) vs
resilience, by enablying/disabling the SE Linux policy cache.


>>> It seems like smalloc pools could also be refcounted?
> 
> I meant things that point into an smalloc() pool could keep a refcount
> and when nothing was pointing into it, it could be destroyed. (i.e.
> using refcount garbage collection instead of explicit destruction.)

ok, but it depends on the level of paranoia that we are talking about.

Counting smalloc vs sfree is easy.

Verifying that the free invocations are actually aligned with previous
allocations is a bit more onerous.

Validating that the free is done from the same entity that invoked the
smalloc might be hard.

>> And what for?
> 
> It might be easier to reason about later if allocations get complex.
> It's certainly not required for the first version of this.

good :-)
more clarifications are needed

[...]

> I don't really have an opinion on this. It might be more readable with
> a named structure?

Eventually I got rid of the macro.
The structure is intentionally unnamed to stress the grouping, without
adding a layer of indirection. But maybe it could be removed. I need to
test it.

>> One more thing: how should I tie this allocator to the rest?
>> I have verified that is seems to work with both SLUB and SLAB.
>> Can I make it depend on either of them being enabled?
> 
> It seems totally unrelated. The only relationship I see would be
> interaction with hardened usercopy. In a perfect world, none of the
> smalloc pools would be allowed to be copied to/from userspace, which
> would make integration really easy: if smalloc_pool(ptr) return NOPE;
> :P

I went down the harder path and implemented the check.

>> Should it be optionally enabled?
>> What to default to, if it's not enabled? vmalloc?
> 
> I don't see any reason to make it optional.

great \o/

So now I'm back to the LSM use case.
It shouldn't take too long.

--
thanks, igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-05-31 21:22               ` Igor Stoppa
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Stoppa @ 2017-05-31 21:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Casey Schaufler, Michal Hocko, Dave Hansen, Laura Abbott,
	Linux-MM, kernel-hardening, LKML, Daniel Micay, Greg KH,
	James Morris, Stephen Smalley

On 28/05/17 21:23, Kees Cook wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:

[...]

>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
> 
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)

ok

>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
> 
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).

ok


[...]

>> If you are talking about an attacker, rather than protection against
>> accidental overwrites, how hashing can be enough?
>> Couldn't the attacker compromise that too?
> 
> In theory, yes, though the goal was to dedicate a register to the
> hash, which would make it hard/impossible for an attacker to reach.
> (The BPF JIT situation is just an example of this kind of race,
> though. I'm still in favor of reducing the write window to init-time
> from full run-time.)

It looks like some compromise is needed between reliability and
performance ...

[...]

> I would expect other people would NAK using "stop all other CPUs"
> approach. Better to have the CPU-local writes.

ok, that could be a feature for the follow-up?

[...]

> Yeah, I don't have any better idea for names. :)

still clinging to smalloc for now, how about pmalloc,
as "protected malloc"?


> We might be talking past each-other. Right now, the LSM is marked with
> __ro_after_init, which will make all the list structures, entries, etc
> read-only already. There is one case where this is not true, and
> that's for CONFIG_SECURITY_WRITABLE_HOOKS for
> CONFIG_SECURITY_SELINUX_DISABLE, which wants to do run-time removal of
> SELinux. 

On this I'll just shut up and provide the patch. It seems easier :-)

> Are you talking about the SELinux policy installed during
> early boot? Which things did you want to use smalloc() on?

The policy DB.
And of course the policy cache has to go, as in: it must be possible to
disable it completely, in favor of re-evaluating the policies, when needed.

I'm pretty sure that it is absolutely needed in certain cases, but in
others I think the re-evaluation is negligible in comparison to the
overall duration of the use cases affected.

So it should be ok to let the system integrator gauge speed (if any) vs
resilience, by enablying/disabling the SE Linux policy cache.


>>> It seems like smalloc pools could also be refcounted?
> 
> I meant things that point into an smalloc() pool could keep a refcount
> and when nothing was pointing into it, it could be destroyed. (i.e.
> using refcount garbage collection instead of explicit destruction.)

ok, but it depends on the level of paranoia that we are talking about.

Counting smalloc vs sfree is easy.

Verifying that the free invocations are actually aligned with previous
allocations is a bit more onerous.

Validating that the free is done from the same entity that invoked the
smalloc might be hard.

>> And what for?
> 
> It might be easier to reason about later if allocations get complex.
> It's certainly not required for the first version of this.

good :-)
more clarifications are needed

[...]

> I don't really have an opinion on this. It might be more readable with
> a named structure?

Eventually I got rid of the macro.
The structure is intentionally unnamed to stress the grouping, without
adding a layer of indirection. But maybe it could be removed. I need to
test it.

>> One more thing: how should I tie this allocator to the rest?
>> I have verified that is seems to work with both SLUB and SLAB.
>> Can I make it depend on either of them being enabled?
> 
> It seems totally unrelated. The only relationship I see would be
> interaction with hardened usercopy. In a perfect world, none of the
> smalloc pools would be allowed to be copied to/from userspace, which
> would make integration really easy: if smalloc_pool(ptr) return NOPE;
> :P

I went down the harder path and implemented the check.

>> Should it be optionally enabled?
>> What to default to, if it's not enabled? vmalloc?
> 
> I don't see any reason to make it optional.

great \o/

So now I'm back to the LSM use case.
It shouldn't take too long.

--
thanks, igor

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
  2017-05-19 10:38   ` Igor Stoppa
  (?)
@ 2017-06-04  2:18     ` kbuild test robot
  -1 siblings, 0 replies; 68+ messages in thread
From: kbuild test robot @ 2017-06-04  2:18 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, mhocko, dave.hansen, labbott, linux-mm,
	kernel-hardening, linux-kernel, Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Hi Igor,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.12-rc3 next-20170602]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/Sealable-memory-support/20170522-163525
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-n0-06032349 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm//smalloc.c: In function 'smalloc_seal_set':
>> mm//smalloc.c:135:4: error: implicit declaration of function 'set_memory_ro' [-Werror=implicit-function-declaration]
       set_memory_ro((unsigned long)node,
       ^~~~~~~~~~~~~
>> mm//smalloc.c:138:4: error: implicit declaration of function 'set_memory_rw' [-Werror=implicit-function-declaration]
       set_memory_rw((unsigned long)node,
       ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/set_memory_ro +135 mm//smalloc.c

   129			mutex_unlock(&pool->lock);
   130			return;
   131		}
   132		list_for_each(pos, &pool->list) {
   133			node = list_entry(pos, struct smalloc_node, list);
   134			if (seal == SMALLOC_SEALED)
 > 135				set_memory_ro((unsigned long)node,
   136					      get_node_pages_nr(node));
   137			else if (seal == SMALLOC_UNSEALED)
 > 138				set_memory_rw((unsigned long)node,
   139					      get_node_pages_nr(node));
   140		}
   141		pool->seal = seal;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25100 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/1] Sealable memory support
@ 2017-06-04  2:18     ` kbuild test robot
  0 siblings, 0 replies; 68+ messages in thread
From: kbuild test robot @ 2017-06-04  2:18 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, mhocko, dave.hansen, labbott, linux-mm,
	kernel-hardening, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Hi Igor,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.12-rc3 next-20170602]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/Sealable-memory-support/20170522-163525
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-n0-06032349 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm//smalloc.c: In function 'smalloc_seal_set':
>> mm//smalloc.c:135:4: error: implicit declaration of function 'set_memory_ro' [-Werror=implicit-function-declaration]
       set_memory_ro((unsigned long)node,
       ^~~~~~~~~~~~~
>> mm//smalloc.c:138:4: error: implicit declaration of function 'set_memory_rw' [-Werror=implicit-function-declaration]
       set_memory_rw((unsigned long)node,
       ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/set_memory_ro +135 mm//smalloc.c

   129			mutex_unlock(&pool->lock);
   130			return;
   131		}
   132		list_for_each(pos, &pool->list) {
   133			node = list_entry(pos, struct smalloc_node, list);
   134			if (seal == SMALLOC_SEALED)
 > 135				set_memory_ro((unsigned long)node,
   136					      get_node_pages_nr(node));
   137			else if (seal == SMALLOC_UNSEALED)
 > 138				set_memory_rw((unsigned long)node,
   139					      get_node_pages_nr(node));
   140		}
   141		pool->seal = seal;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25100 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
@ 2017-06-04  2:18     ` kbuild test robot
  0 siblings, 0 replies; 68+ messages in thread
From: kbuild test robot @ 2017-06-04  2:18 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, mhocko, dave.hansen, labbott, linux-mm,
	kernel-hardening, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Hi Igor,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.12-rc3 next-20170602]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/Sealable-memory-support/20170522-163525
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-n0-06032349 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm//smalloc.c: In function 'smalloc_seal_set':
>> mm//smalloc.c:135:4: error: implicit declaration of function 'set_memory_ro' [-Werror=implicit-function-declaration]
       set_memory_ro((unsigned long)node,
       ^~~~~~~~~~~~~
>> mm//smalloc.c:138:4: error: implicit declaration of function 'set_memory_rw' [-Werror=implicit-function-declaration]
       set_memory_rw((unsigned long)node,
       ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/set_memory_ro +135 mm//smalloc.c

   129			mutex_unlock(&pool->lock);
   130			return;
   131		}
   132		list_for_each(pos, &pool->list) {
   133			node = list_entry(pos, struct smalloc_node, list);
   134			if (seal == SMALLOC_SEALED)
 > 135				set_memory_ro((unsigned long)node,
   136					      get_node_pages_nr(node));
   137			else if (seal == SMALLOC_UNSEALED)
 > 138				set_memory_rw((unsigned long)node,
   139					      get_node_pages_nr(node));
   140		}
   141		pool->seal = seal;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25100 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2017-06-04  2:19 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 10:38 [RFC v3]mm: ro protection for data allocated dynamically Igor Stoppa
2017-05-19 10:38 ` [kernel-hardening] " Igor Stoppa
2017-05-19 10:38 ` Igor Stoppa
2017-05-19 10:38 ` [PATCH 1/1] Sealable memory support Igor Stoppa
2017-05-19 10:38   ` [kernel-hardening] " Igor Stoppa
2017-05-19 10:38   ` Igor Stoppa
2017-05-20  8:51   ` [kernel-hardening] " Greg KH
2017-05-20  8:51     ` Greg KH
2017-05-21 11:14     ` [PATCH] LSM: Make security_hook_heads a local variable Tetsuo Handa
2017-05-21 11:14       ` [kernel-hardening] " Tetsuo Handa
2017-05-21 11:14       ` Tetsuo Handa
2017-05-21 11:14       ` Tetsuo Handa
2017-05-22 14:03       ` Christoph Hellwig
2017-05-22 14:03         ` [kernel-hardening] " Christoph Hellwig
2017-05-22 14:03         ` Christoph Hellwig
2017-05-22 14:03         ` Christoph Hellwig
2017-05-22 15:09         ` Casey Schaufler
2017-05-22 15:09           ` [kernel-hardening] " Casey Schaufler
2017-05-22 15:09           ` Casey Schaufler
2017-05-22 15:09           ` Casey Schaufler
2017-05-22 19:50           ` Igor Stoppa
2017-05-22 19:50             ` [kernel-hardening] " Igor Stoppa
2017-05-22 19:50             ` Igor Stoppa
2017-05-22 19:50             ` Igor Stoppa
2017-05-22 20:32             ` Casey Schaufler
2017-05-22 20:32               ` [kernel-hardening] " Casey Schaufler
2017-05-22 20:32               ` Casey Schaufler
2017-05-22 20:32               ` Casey Schaufler
2017-05-22 20:43               ` Tetsuo Handa
2017-05-22 20:43                 ` [kernel-hardening] " Tetsuo Handa
2017-05-22 20:43                 ` Tetsuo Handa
2017-05-22 20:43                 ` Tetsuo Handa
2017-05-22 19:45     ` [kernel-hardening] [PATCH 1/1] Sealable memory support Igor Stoppa
2017-05-22 19:45       ` Igor Stoppa
2017-05-22 19:45       ` Igor Stoppa
2017-05-22 21:38   ` Kees Cook
2017-05-22 21:38     ` [kernel-hardening] " Kees Cook
2017-05-22 21:38     ` Kees Cook
2017-05-23  9:43     ` Igor Stoppa
2017-05-23  9:43       ` [kernel-hardening] " Igor Stoppa
2017-05-23  9:43       ` Igor Stoppa
2017-05-23 20:11       ` Kees Cook
2017-05-23 20:11         ` [kernel-hardening] " Kees Cook
2017-05-23 20:11         ` Kees Cook
2017-05-24 17:45         ` Igor Stoppa
2017-05-24 17:45           ` [kernel-hardening] " Igor Stoppa
2017-05-24 17:45           ` Igor Stoppa
2017-05-28 18:23           ` Kees Cook
2017-05-28 18:23             ` [kernel-hardening] " Kees Cook
2017-05-28 18:23             ` Kees Cook
2017-05-28 18:56             ` [kernel-hardening] " Boris Lukashev
2017-05-28 18:56               ` Boris Lukashev
2017-05-28 18:56               ` Boris Lukashev
2017-05-28 21:32               ` Kees Cook
2017-05-28 21:32                 ` Kees Cook
2017-05-28 21:32                 ` Kees Cook
2017-05-29  6:04                 ` Boris Lukashev
2017-05-29  6:04                   ` Boris Lukashev
2017-05-29  6:04                   ` Boris Lukashev
2017-05-31 21:22             ` Igor Stoppa
2017-05-31 21:22               ` [kernel-hardening] " Igor Stoppa
2017-05-31 21:22               ` Igor Stoppa
2017-05-31 13:55   ` kbuild test robot
2017-05-31 13:55     ` [kernel-hardening] " kbuild test robot
2017-05-31 13:55     ` kbuild test robot
2017-06-04  2:18   ` kbuild test robot
2017-06-04  2:18     ` [kernel-hardening] " kbuild test robot
2017-06-04  2:18     ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.