[RFC PATCH v17 0/7] mm: security: ro protection for dynamic data

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data
@ 2018-02-23 14:48 Igor Stoppa
  2018-02-23 14:48 ` [PATCH 1/7] genalloc: track beginning of allocations Igor Stoppa
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v16:
[http://www.openwall.com/lists/kernel-hardening/2018/02/12/13]

* introduced lkdtm test for write protection of a variable
* added comments with rationale for use of BUG_ON() when running selftest
* converted self-tests to kernel naming patter "test_xxx.c"
* moved triggering of pmalloc self-testing to early phase of main.c
* improved summaries of genalloc and vmalloc patches
* removed example of optimization of find_vm_area because of possible bug
  in vmalloc_to_page:
[https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1617030.html]

Igor Stoppa (7):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc selftest
  lkdtm: crash on overwriting protected pmalloc var
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +++++++
 drivers/misc/lkdtm.h               |   1 +
 drivers/misc/lkdtm_core.c          |   3 +
 drivers/misc/lkdtm_perms.c         |  28 ++
 include/linux/genalloc.h           |   7 +-
 include/linux/mm_types.h           |   1 +
 include/linux/pmalloc.h            | 242 ++++++++++++++
 include/linux/test_genalloc.h      |  26 ++
 include/linux/test_pmalloc.h       |  24 ++
 include/linux/vmalloc.h            |   1 +
 init/main.c                        |   4 +
 lib/Kconfig                        |  15 +
 lib/Makefile                       |   1 +
 lib/genalloc.c                     | 658 +++++++++++++++++++++++++++----------
 lib/test_genalloc.c                | 410 +++++++++++++++++++++++
 mm/Kconfig                         |  15 +
 mm/Makefile                        |   2 +
 mm/pmalloc.c                       | 499 ++++++++++++++++++++++++++++
 mm/test_pmalloc.c                  |  79 +++++
 mm/usercopy.c                      |  33 ++
 mm/vmalloc.c                       |   5 +
 22 files changed, 1999 insertions(+), 170 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 include/linux/test_genalloc.h
 create mode 100644 include/linux/test_pmalloc.h
 create mode 100644 lib/test_genalloc.c
 create mode 100644 mm/pmalloc.c
 create mode 100644 mm/test_pmalloc.c

-- 
2.14.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/7] genalloc: track beginning of allocations
  2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
@ 2018-02-23 14:48 ` Igor Stoppa
  2018-02-23 22:28   ` J Freyensee
  2018-02-25  3:37   ` kbuild test robot
  2018-02-23 14:48 ` [PATCH 2/7] genalloc: selftest Igor Stoppa
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

The primary effect of this patch is that code using the gen_alloc
library does not need anymore to keep track of the size of the
allocations it makes.

Prior to this patch, it was necessary to keep track of the size of the
allocation, so that it would be possible, later on, to know how much
space should be freed.

Now, users of the api can choose to etiher still specify explicitly the
size, or let the library determine it, by giving a value of 0.

However, even when the value is specified, the library still uses its on
understanding of the space associated with a certain allocation, to
confirm that they are consistent.

This verification provides also confirmation that the patch works correctly.

Eventually, the extra parameter (and the corresponding verification) could
be dropped, in favor of a simplified API.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c           | 631 ++++++++++++++++++++++++++++++++++-------------
 2 files changed, 465 insertions(+), 170 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
-#include <linux/atomic.h>
+#include <linux/slab.h>
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
 	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
 	unsigned long start_addr;	/* start address of memory chunk */
 	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
-	unsigned long bits[0];		/* bitmap for allocating memory chunk */
+	unsigned long entries[0];	/* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..87f62f31b52f 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -26,6 +26,74 @@
  *
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
+ *
+ *
+ *
+ * Encoding of the bitmap tracking the allocations
+ * -----------------------------------------------
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ *  - allocation status (busy/free)
+ *  - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb.......LSb:
+ *
+ *  ...000010111100000010101011   <-- Read in this direction.
+ *     \__|\__|\|\____|\______|
+ *        |   | |     |       \___ 4 used allocation units
+ *        |   | |     \___________ 3 empty allocation units
+ *        |   | \_________________ 1 used allocation unit
+ *        |   \___________________ 2 used allocation units
+ *        \_______________________ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the research for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only at even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking internally.
+ *
+ * If ever needed, the bitmap could be extended, assigning larger amounts
+ * of bits to each allocation unit (the increase must follow powers of 2),
+ * to track other properties of the allocations.
  */
 
 #include <linux/slab.h>
@@ -36,118 +104,247 @@
 #include <linux/genalloc.h>
 #include <linux/of_device.h>
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
+
+/* Binary pattern of 1010...1010 that spans one unsigned long. */
+#define MASK (~0UL / 3 * 2)
+
+/**
+ * get_bitmap_entry() - extracts the specified entry from the bitmap
+ * @map: pointer to a bitmap
+ * @entry_index: the index of the desired entry in the bitmap
+ *
+ * Return: The requested bitmap.
+ */
+static inline unsigned long get_bitmap_entry(unsigned long *map,
+					    int entry_index)
+{
+	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
+		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
+		ENTRY_MASK;
+}
+
+
+/**
+ * mem_to_units() - convert references to memory into orders of allocation
+ * @size: amount in bytes
+ * @order: power of 2 represented by each entry in the bitmap
+ *
+ * Return: the number of units representing the size.
+ */
+static inline unsigned long mem_to_units(unsigned long size,
+					 unsigned long order)
+{
+	return (size + (1UL << order) - 1) >> order;
+}
+
+/**
+ * chunk_size() - dimension of a chunk of memory, in bytes
+ * @chunk: pointer to the struct describing the chunk
+ *
+ * Return: The size of the chunk, in bytes.
+ */
 static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
 {
 	return chunk->end_addr - chunk->start_addr + 1;
 }
 
-static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+
+/**
+ * set_bits_ll() - based on value and mask, sets bits at address
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to store
+ *
+ * Return:
+ * * 0		- success
+ * * -EBUSY	- otherwise
+ */
+static int set_bits_ll(unsigned long *addr,
+		       unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
 	do {
-		val = nval;
-		if (val & mask_to_set)
+		present = nval;
+		if (present & mask)
 			return -EBUSY;
+		target =  present | value;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+
+/**
+ * clear_bits_ll() - based on value and mask, clears bits at address
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to clear
+ *
+ * Return:
+ * * 0		- success
+ * * -EBUSY	- otherwise
+ */
+static int clear_bits_ll(unsigned long *addr,
+			 unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
+	present = nval;
+	if (unlikely((present & mask) ^ value))
+		return -EBUSY;
 	do {
-		val = nval;
-		if ((val & mask_to_clear) != mask_to_clear)
+		present = nval;
+		if (unlikely((present & mask) ^ value))
 			return -EBUSY;
+		target =  present & ~mask;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-/*
- * bitmap_set_ll - set the specified number of bits at the specified position
+
+/**
+ * get_boundary() - verifies address, then measure length.
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start_entry: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
  *
- * Set @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users set the same bit, one user will return remain bits, otherwise
- * return 0.
+ * Return:
+ * * length of an allocation	- success
+ * * -EINVAL			- invalid parameters
  */
-static int bitmap_set_ll(unsigned long *map, int start, int nr)
+static int get_boundary(unsigned long *map, int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_set >= 0) {
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-		nr -= bits_to_set;
-		bits_to_set = BITS_PER_LONG;
-		mask_to_set = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-	}
+	int i;
+	unsigned long bitmap_entry;
 
-	return 0;
+
+	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
+		return -EINVAL;
+	for (i = start_entry + 1; i < nentries; i++) {
+		bitmap_entry = get_bitmap_entry(map, i);
+		if (bitmap_entry == ENTRY_HEAD ||
+		    bitmap_entry == ENTRY_UNUSED)
+			return i;
+	}
+	return nentries - start_entry;
 }
 
+
+#define SET_BITS 1
+#define CLEAR_BITS 0
+
 /*
- * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * alter_bitmap_ll() - set/clear the entries associated with an allocation
+ * @alteration: indicates if the bits selected should be set or cleared
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
+ *
+ * The modification happens lock-lessly.
+ * Several users can write to the same map simultaneously, without lock.
+ * In case of mid-air conflict, when 2 or more writers try to alter the
+ * same word in the bitmap, only one will succeed and continue, the others
+ * will fail and receive as return value the amount of entries that were
+ * not written. Each failed writer is responsible to revert the changes
+ * it did to the bitmap.
+ * The lockless conflict resolution is implemented through cmpxchg.
+ * Success or failure is purely based on first come first served basis.
+ * The first writer that manages to gain write access to the target word
+ * of the bitmap wins. Whatever can affect the order and priority of execution
+ * of the writers can and will affect the result of the race.
  *
- * Clear @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users clear the same bit, one user will return remain bits,
- * otherwise return 0.
+ * Return:
+ * * 0			- success
+ * * remaining entries	- failure
  */
-static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+static int alter_bitmap_ll(bool alteration, unsigned long *map,
+			   int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_clear >= 0) {
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
-		nr -= bits_to_clear;
-		bits_to_clear = BITS_PER_LONG;
-		mask_to_clear = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
+	unsigned long start_bit;
+	unsigned long end_bit;
+	unsigned long mask;
+	unsigned long value;
+	int nbits;
+	int bits_to_write;
+	int index;
+	int (*action)(unsigned long *addr,
+		      unsigned long mask, unsigned long value);
+
+	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
+
+	/*
+	 * Prepare for writing the initial part of the allocation, from
+	 * starting entry, to the end of the UL bitmap element which
+	 * contains it. It might be larger than the actual allocation.
+	 */
+	start_bit = ENTRIES_TO_BITS(start_entry);
+	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
+	nbits = ENTRIES_TO_BITS(nentries);
+	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
+	mask = BITMAP_FIRST_WORD_MASK(start_bit);
+	/* Mark the beginning of the allocation. */
+	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
+	index = BITS_DIV_LONGS(start_bit);
+
+	/*
+	 * Writes entries to the bitmap, as long as the reminder is
+	 * positive or zero.
+	 * Might be skipped if the entries to write do not reach the end
+	 * of a bitmap UL unit.
+	 */
+	while (nbits >= bits_to_write) {
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+		nbits -= bits_to_write;
+		bits_to_write = BITS_PER_LONG;
+		mask = ~0UL;
+		value = MASK;
+		index++;
 	}
 
+	/* Takes care of the ending part of the entries to mark. */
+	if (nbits > 0) {
+		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
+		bits_to_write = nbits;
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+	}
 	return 0;
 }
 
+
 /**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @min_alloc_order: log base 2 of number of bytes each bitmap entry
+ *		     represents
+ * @nid: node id of the node the pool structure should be allocated on,
+ *	 or -1
  *
- * Create a new special memory pool that can be used to manage special purpose
- * memory not managed by the regular kmalloc/kfree interface.
+ * Create a new special memory pool that can be used to manage special
+ * purpose memory not managed by the regular kmalloc/kfree interface.
+ *
+ * Return:
+ * * pointer to the pool	- success
+ * * NULL			- otherwise
  */
 struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
 {
@@ -167,7 +364,7 @@ struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
 EXPORT_SYMBOL(gen_pool_create);
 
 /**
- * gen_pool_add_virt - add a new chunk of special memory to the pool
+ * gen_pool_add_virt() - add a new chunk of special memory to the pool
  * @pool: pool to add new memory chunk to
  * @virt: virtual starting address of memory chunk to add to pool
  * @phys: physical starting address of memory chunk to add to pool
@@ -177,16 +374,20 @@ EXPORT_SYMBOL(gen_pool_create);
  *
  * Add a new chunk of special memory to the specified pool.
  *
- * Returns 0 on success or a -ve errno on failure.
+ * Return:
+ * * 0		- success
+ * * -ve errno	- failure
  */
-int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
-		 size_t size, int nid)
+int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt,
+		      phys_addr_t phys, size_t size, int nid)
 {
 	struct gen_pool_chunk *chunk;
-	int nbits = size >> pool->min_alloc_order;
-	int nbytes = sizeof(struct gen_pool_chunk) +
-				BITS_TO_LONGS(nbits) * sizeof(long);
+	int nentries;
+	int nbytes;
 
+	nentries = size >> pool->min_alloc_order;
+	nbytes = sizeof(struct gen_pool_chunk) +
+		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
 	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
 	if (unlikely(chunk == NULL))
 		return -ENOMEM;
@@ -205,11 +406,13 @@ int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phy
 EXPORT_SYMBOL(gen_pool_add_virt);
 
 /**
- * gen_pool_virt_to_phys - return the physical address of memory
+ * gen_pool_virt_to_phys() - return the physical address of memory
  * @pool: pool to allocate from
  * @addr: starting address of memory
  *
- * Returns the physical address on success, or -1 on error.
+ * Return:
+ * * the physical address	- success
+ * * \-1			- error
  */
 phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
 {
@@ -230,7 +433,7 @@ phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
 EXPORT_SYMBOL(gen_pool_virt_to_phys);
 
 /**
- * gen_pool_destroy - destroy a special memory pool
+ * gen_pool_destroy() - destroy a special memory pool
  * @pool: pool to destroy
  *
  * Destroy the specified special memory pool. Verifies that there are no
@@ -248,7 +451,7 @@ void gen_pool_destroy(struct gen_pool *pool)
 		list_del(&chunk->next_chunk);
 
 		end_bit = chunk_size(chunk) >> order;
-		bit = find_next_bit(chunk->bits, end_bit, 0);
+		bit = find_next_bit(chunk->entries, end_bit, 0);
 		BUG_ON(bit < end_bit);
 
 		kfree(chunk);
@@ -259,7 +462,7 @@ void gen_pool_destroy(struct gen_pool *pool)
 EXPORT_SYMBOL(gen_pool_destroy);
 
 /**
- * gen_pool_alloc - allocate special memory from the pool
+ * gen_pool_alloc() - allocate special memory from the pool
  * @pool: pool to allocate from
  * @size: number of bytes to allocate from the pool
  *
@@ -267,6 +470,10 @@ EXPORT_SYMBOL(gen_pool_destroy);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return:
+ * * address of the memory allocated	- success
+ * * NULL				- error
  */
 unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 {
@@ -275,7 +482,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 EXPORT_SYMBOL(gen_pool_alloc);
 
 /**
- * gen_pool_alloc_algo - allocate special memory from the pool
+ * gen_pool_alloc_algo() - allocate special memory from the pool
  * @pool: pool to allocate from
  * @size: number of bytes to allocate from the pool
  * @algo: algorithm passed from caller
@@ -285,6 +492,10 @@ EXPORT_SYMBOL(gen_pool_alloc);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return:
+ * * address of the memory allocated	- success
+ * * NULL				- error
  */
 unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 		genpool_algo_t algo, void *data)
@@ -292,7 +503,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	struct gen_pool_chunk *chunk;
 	unsigned long addr = 0;
 	int order = pool->min_alloc_order;
-	int nbits, start_bit, end_bit, remain;
+	int nentries, start_entry, end_entry, remain;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
@@ -301,29 +512,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	if (size == 0)
 		return 0;
 
-	nbits = (size + (1UL << order) - 1) >> order;
+	nentries = mem_to_units(size, order);
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (size > atomic_long_read(&chunk->avail))
 			continue;
 
-		start_bit = 0;
-		end_bit = chunk_size(chunk) >> order;
+		start_entry = 0;
+		end_entry = chunk_size(chunk) >> order;
 retry:
-		start_bit = algo(chunk->bits, end_bit, start_bit,
-				 nbits, data, pool);
-		if (start_bit >= end_bit)
+		start_entry = algo(chunk->entries, end_entry, start_entry,
+				  nentries, data, pool);
+		if (start_entry >= end_entry)
 			continue;
-		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
+					 start_entry, nentries);
 		if (remain) {
-			remain = bitmap_clear_ll(chunk->bits, start_bit,
-						 nbits - remain);
-			BUG_ON(remain);
+			remain = alter_bitmap_ll(CLEAR_BITS,
+						 chunk->entries,
+						 start_entry,
+						 nentries - remain);
 			goto retry;
 		}
 
-		addr = chunk->start_addr + ((unsigned long)start_bit << order);
-		size = nbits << order;
+		addr = chunk->start_addr +
+			((unsigned long)start_entry << order);
+		size = nentries << order;
 		atomic_long_sub(size, &chunk->avail);
 		break;
 	}
@@ -333,7 +547,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 EXPORT_SYMBOL(gen_pool_alloc_algo);
 
 /**
- * gen_pool_dma_alloc - allocate special memory from the pool for DMA usage
+ * gen_pool_dma_alloc() - allocate special memory from the pool for DMA usage
  * @pool: pool to allocate from
  * @size: number of bytes to allocate from the pool
  * @dma: dma-view physical address return value.  Use NULL if unneeded.
@@ -342,6 +556,10 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return:
+ * * address of the memory allocated	- success
+ * * NULL				- error
  */
 void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
 {
@@ -362,10 +580,10 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
 EXPORT_SYMBOL(gen_pool_dma_alloc);
 
 /**
- * gen_pool_free - free allocated special memory back to the pool
+ * gen_pool_free() - free allocated special memory back to the pool
  * @pool: pool to free to
  * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * @size: size in bytes of memory to free or 0, for auto-detection
  *
  * Free previously allocated special memory back to the specified
  * pool.  Can not be used in NMI handler on architectures without
@@ -375,22 +593,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 {
 	struct gen_pool_chunk *chunk;
 	int order = pool->min_alloc_order;
-	int start_bit, nbits, remain;
+	int start_entry, remaining_entries, nentries, remain;
+	int boundary;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
 #endif
 
-	nbits = (size + (1UL << order) - 1) >> order;
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
 			BUG_ON(addr + size - 1 > chunk->end_addr);
-			start_bit = (addr - chunk->start_addr) >> order;
-			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+			start_entry = (addr - chunk->start_addr) >> order;
+			remaining_entries = (chunk->end_addr - addr) >> order;
+			boundary = get_boundary(chunk->entries, start_entry,
+						remaining_entries);
+			BUG_ON(boundary < 0);
+			nentries = boundary - start_entry;
+			BUG_ON(size &&
+			       (nentries != mem_to_units(size, order)));
+			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
+						 start_entry, nentries);
 			BUG_ON(remain);
-			size = nbits << order;
-			atomic_long_add(size, &chunk->avail);
+			atomic_long_add(nentries << order, &chunk->avail);
 			rcu_read_unlock();
 			return;
 		}
@@ -401,7 +626,7 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 EXPORT_SYMBOL(gen_pool_free);
 
 /**
- * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
+ * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
  * @func:	func to call
  * @data:	additional data used by @func
@@ -423,13 +648,16 @@ void gen_pool_for_each_chunk(struct gen_pool *pool,
 EXPORT_SYMBOL(gen_pool_for_each_chunk);
 
 /**
- * addr_in_gen_pool - checks if an address falls within the range of a pool
+ * addr_in_gen_pool() - checks if an address falls within the range of a pool
  * @pool:	the generic memory pool
  * @start:	start address
  * @size:	size of the region
  *
- * Check if the range of addresses falls within the specified pool. Returns
- * true if the entire range is contained in the pool and false otherwise.
+ * Check if the range of addresses falls within the specified pool.
+ *
+ * Return:
+ * * true	- the entire range is contained in the pool
+ * * false	- otherwise
  */
 bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
 			size_t size)
@@ -452,10 +680,10 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
 }
 
 /**
- * gen_pool_avail - get available free space of the pool
+ * gen_pool_avail() - get available free space of the pool
  * @pool: pool to get available free space
  *
- * Return available free space of the specified pool.
+ * Return: available free space of the specified pool.
  */
 size_t gen_pool_avail(struct gen_pool *pool)
 {
@@ -471,10 +699,10 @@ size_t gen_pool_avail(struct gen_pool *pool)
 EXPORT_SYMBOL_GPL(gen_pool_avail);
 
 /**
- * gen_pool_size - get size in bytes of memory managed by the pool
+ * gen_pool_size() - get size in bytes of memory managed by the pool
  * @pool: pool to get size
  *
- * Return size in bytes of memory managed by the pool.
+ * Return: size in bytes of memory managed by the pool.
  */
 size_t gen_pool_size(struct gen_pool *pool)
 {
@@ -490,7 +718,7 @@ size_t gen_pool_size(struct gen_pool *pool)
 EXPORT_SYMBOL_GPL(gen_pool_size);
 
 /**
- * gen_pool_set_algo - set the allocation algorithm
+ * gen_pool_set_algo() - set the allocation algorithm
  * @pool: pool to change allocation algorithm
  * @algo: custom algorithm function
  * @data: additional data used by @algo
@@ -514,32 +742,48 @@ void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo, void *data)
 EXPORT_SYMBOL(gen_pool_set_algo);
 
 /**
- * gen_pool_first_fit - find the first available region
+ * gen_pool_first_fit() - find the first available region
  * of memory matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return:
+ * * index of the memory allocated	- sufficient space available
+ * * end of the range			- insufficient space
  */
 unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	return bitmap_find_next_zero_area(map, size, start, nr, 0);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit);
 
 /**
- * gen_pool_first_fit_align - find the first available region
+ * gen_pool_first_fit_align() - find the first available region
  * of memory matching the size requirement (alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return:
+ * * index of the memory allocated	- sufficient space available
+ * * end of the range			- insufficient space
  */
 unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -547,23 +791,34 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_align *alignment;
 	unsigned long align_mask;
+	unsigned long bit_index;
 	int order;
 
 	alignment = data;
 	order = pool->min_alloc_order;
-	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	align_mask = roundup_pow_of_two(
+			ENTRIES_TO_BITS(mem_to_units(alignment->align,
+						     order))) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_align);
 
 /**
- * gen_pool_fixed_alloc - reserve a specific region
+ * gen_pool_fixed_alloc() - reserve a specific region
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return:
+ * * index of the memory allocated	- sufficient space available
+ * * end of the range			- insufficient space
  */
 unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -571,82 +826,113 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_fixed *fixed_data;
 	int order;
-	unsigned long offset_bit;
-	unsigned long start_bit;
+	unsigned long offset;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
 	fixed_data = data;
 	order = pool->min_alloc_order;
-	offset_bit = fixed_data->offset >> order;
 	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
 		return size;
+	offset = fixed_data->offset >> order;
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start + offset),
+					       ENTRIES_TO_BITS(nr), align_mask);
+	if (bit_index != ENTRIES_TO_BITS(offset))
+		return size;
 
-	start_bit = bitmap_find_next_zero_area(map, size,
-			start + offset_bit, nr, 0);
-	if (start_bit != offset_bit)
-		start_bit = size;
-	return start_bit;
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_fixed_alloc);
 
 /**
- * gen_pool_first_fit_order_align - find the first available region
+ * gen_pool_first_fit_order_align() - find the first available region
  * of memory matching the size requirement. The region will be aligned
  * to the order of the size specified.
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return:
+ * * index of the memory allocated	- sufficient space available
+ * * end of the range			- insufficient space
  */
 unsigned long gen_pool_first_fit_order_align(unsigned long *map,
 		unsigned long size, unsigned long start,
 		unsigned int nr, void *data, struct gen_pool *pool)
 {
-	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
-
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_order_align);
 
 /**
- * gen_pool_best_fit - find the best fitting region of memory
- * macthing the size requirement (no alignment constraint)
+ * gen_pool_best_fit() - find the best fitting region of memory
+ * matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
  *
  * Iterate over the bitmap to find the smallest free region
  * which we can allocate the memory.
+ *
+ * Return:
+ * * index of the memory allocated	- sufficient space available
+ * * end of the range			- insufficient space
  */
 unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	unsigned long start_bit = size;
+	unsigned long start_bit = ENTRIES_TO_BITS(size);
 	unsigned long len = size + 1;
 	unsigned long index;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
-	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	index = BITS_DIV_ENTRIES(bit_index);
 
 	while (index < size) {
-		int next_bit = find_next_bit(map, size, index + nr);
-		if ((next_bit - index) < len) {
-			len = next_bit - index;
-			start_bit = index;
+		int next_bit;
+
+		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
+					 ENTRIES_TO_BITS(index + nr));
+		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
+			len = BITS_DIV_ENTRIES(next_bit) - index;
+			start_bit = ENTRIES_TO_BITS(index);
 			if (len == nr)
-				return start_bit;
+				return BITS_DIV_ENTRIES(start_bit);
 		}
-		index = bitmap_find_next_zero_area(map, size,
-						   next_bit + 1, nr, 0);
+		bit_index =
+			bitmap_find_next_zero_area(map,
+						   ENTRIES_TO_BITS(size),
+						   next_bit + 1,
+						   ENTRIES_TO_BITS(nr),
+						   align_mask);
+		index = BITS_DIV_ENTRIES(bit_index);
 	}
 
-	return start_bit;
+	return BITS_DIV_ENTRIES(start_bit);
 }
-EXPORT_SYMBOL(gen_pool_best_fit);
 
 static void devm_gen_pool_release(struct device *dev, void *res)
 {
@@ -668,11 +954,14 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
 }
 
 /**
- * gen_pool_get - Obtain the gen_pool (if any) for a device
+ * gen_pool_get() - Obtain the gen_pool (if any) for a device
  * @dev: device to retrieve the gen_pool from
- * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
+ * @name: name of a gen_pool or NULL, identifies a particular gen_pool
+ *	  on device
  *
- * Returns the gen_pool for the device if one is present, or NULL.
+ * Return:
+ * * the gen_pool for the device	- if it exists
+ * * NULL				- no pool exists for the device
  */
 struct gen_pool *gen_pool_get(struct device *dev, const char *name)
 {
@@ -687,7 +976,7 @@ struct gen_pool *gen_pool_get(struct device *dev, const char *name)
 EXPORT_SYMBOL_GPL(gen_pool_get);
 
 /**
- * devm_gen_pool_create - managed gen_pool_create
+ * devm_gen_pool_create() - managed gen_pool_create
  * @dev: device that provides the gen_pool
  * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
  * @nid: node selector for allocated gen_pool, %NUMA_NO_NODE for all nodes
@@ -696,6 +985,10 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface. The pool will be
  * automatically destroyed by the device management code.
+ *
+ * Return:
+ * * address of the pool	- success
+ * * NULL			- error
  */
 struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
 				      int nid, const char *name)
@@ -738,14 +1031,16 @@ EXPORT_SYMBOL(devm_gen_pool_create);
 
 #ifdef CONFIG_OF
 /**
- * of_gen_pool_get - find a pool by phandle property
+ * of_gen_pool_get() - find a pool by phandle property
  * @np: device node
  * @propname: property name containing phandle(s)
  * @index: index into the phandle array
  *
- * Returns the pool that contains the chunk starting at the physical
- * address of the device tree node pointed at by the phandle property,
- * or NULL if not found.
+ * Return:
+ * * pool address	- it contains the chunk starting at the physical
+ *			  address of the device tree node pointed at by
+ *			  the phandle property
+ * * NULL		- otherwise
  */
 struct gen_pool *of_gen_pool_get(struct device_node *np,
 	const char *propname, int index)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/7] genalloc: selftest
  2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
  2018-02-23 14:48 ` [PATCH 1/7] genalloc: track beginning of allocations Igor Stoppa
@ 2018-02-23 14:48 ` Igor Stoppa
  2018-02-23 22:42   ` J Freyensee
  2018-02-23 14:48 ` [PATCH 3/7] struct page: add field for vm_struct Igor Stoppa
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

The testing takes place in the very early stages of main.c, to ensure
that failures in genalloc are caught before they can cause unexplained
erratic behavior in any of genalloc users.

Therefore, it would not be advisable to implement it as module.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/test_genalloc.h |  26 +++
 init/main.c                   |   2 +
 lib/Kconfig                   |  15 ++
 lib/Makefile                  |   1 +
 lib/test_genalloc.c           | 410 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 454 insertions(+)
 create mode 100644 include/linux/test_genalloc.h
 create mode 100644 lib/test_genalloc.c

diff --git a/include/linux/test_genalloc.h b/include/linux/test_genalloc.h
new file mode 100644
index 000000000000..cc45c6c859cf
--- /dev/null
+++ b/include/linux/test_genalloc.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * test_genalloc.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __LINUX_TEST_GENALLOC_H
+#define __LINUX_TEST_GENALLOC_H
+
+
+#ifdef CONFIG_TEST_GENERIC_ALLOCATOR
+
+#include <linux/genalloc.h>
+
+void test_genalloc(void);
+
+#else
+
+static inline void test_genalloc(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..bfccf1fd463c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include <linux/io.h>
 #include <linux/cache.h>
 #include <linux/rodata_test.h>
+#include <linux/test_genalloc.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 	 */
 	mem_encrypt_init();
 
+	test_genalloc();
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..361514324d64 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
 	bool
 
+config TEST_GENERIC_ALLOCATOR
+	bool "genalloc tester"
+	default n
+	select GENERIC_ALLOCATOR
+	help
+	  Enable automated testing of the generic allocator.
+	  The testing is primarily for the tracking of allocated space.
+
+config TEST_GENERIC_ALLOCATOR_VERBOSE
+	bool "make the genalloc tester more verbose"
+	default n
+	select TEST_GENERIC_ALLOCATOR
+	help
+	  More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..5b5ee8d8f6d6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_TEST_GENERIC_ALLOCATOR) += test_genalloc.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/test_genalloc.c b/lib/test_genalloc.c
new file mode 100644
index 000000000000..12a61c9e7558
--- /dev/null
+++ b/lib/test_genalloc.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_genalloc.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/vmalloc.h>
+#include <linux/string.h>
+#include <linux/debugfs.h>
+#include <linux/atomic.h>
+#include <linux/genalloc.h>
+
+#include <linux/test_genalloc.h>
+
+
+/*
+ * In case of failure of any of these tests, memory corruption is almost
+ * guarranteed; allowing the boot to continue means risking to corrupt
+ * also any filesystem/block device accessed write mode.
+ * Therefore, BUG_ON() is used, when testing.
+ */
+
+
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_TEST_GENERIC_ALLOCATOR_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+	struct gen_pool_chunk *chunk;
+	char bitmap[BITMAP_SIZE_C * 2 + 1];
+	unsigned long i;
+	char *bm = bitmap;
+	char *entry;
+
+	if (unlikely(pool == NULL || pool->chunks.next == NULL))
+		return;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	entry = (void *)chunk->entries;
+	for (i = 1; i <= BITMAP_SIZE_C; i++)
+		bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+	*bm = '\0';
+	pr_notice("chunk: %p    bitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+	CMD_ALLOCATOR,
+	CMD_ALLOCATE,
+	CMD_FLUSH,
+	CMD_FREE,
+	CMD_NUMBER,
+	CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+	void *null;
+};
+
+struct test_allocator {
+	genpool_algo_t algo;
+	union {
+		struct genpool_data_align align;
+		struct genpool_data_fixed offset;
+		struct null_struct null;
+	} data;
+};
+
+struct test_action {
+	unsigned int location;
+	char pattern[BITMAP_SIZE_C];
+	unsigned int size;
+};
+
+
+struct test_command {
+	enum test_commands command;
+	union {
+		struct test_allocator allocator;
+		struct test_action action;
+	};
+};
+
+
+/*
+ * To pass an array literal as parameter to a macro, it must go through
+ * this one, first.
+ */
+#define ARR(...) __VA_ARGS__
+
+#define SET_DATA(parameter, value)	\
+	.parameter = {			\
+		.parameter = value,	\
+	}				\
+
+#define SET_ALLOCATOR(alloc, parameter, value)		\
+{							\
+	.command = CMD_ALLOCATOR,			\
+	.allocator = {					\
+		.algo = (alloc),			\
+		.data = {				\
+			SET_DATA(parameter, value),	\
+		},					\
+	}						\
+}
+
+#define ACTION_MEM(act, mem_size, mem_loc, match)	\
+{							\
+	.command = act,					\
+	.action = {					\
+		.size = (mem_size),			\
+		.location = (mem_loc),			\
+		.pattern = match,			\
+	},						\
+}
+
+#define ALLOCATE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_ALLOCATE, mem_size, mem_loc, ARR(match))
+
+#define FREE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_FREE, mem_size, mem_loc, ARR(match))
+
+#define FLUSH_MEM()		\
+{				\
+	.command = CMD_FLUSH,	\
+}
+
+#define END()			\
+{				\
+	.command = CMD_END,	\
+}
+
+static inline int compare_bitmaps(const struct gen_pool *pool,
+				   const char *reference)
+{
+	struct gen_pool_chunk *chunk;
+	char *bitmap;
+	unsigned int i;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	bitmap = (char *)chunk->entries;
+
+	for (i = 0; i < BITMAP_SIZE_C; i++)
+		if (bitmap[i] != reference[i])
+			return -1;
+	return 0;
+}
+
+static void callback_set_allocator(struct gen_pool *pool,
+				   const struct test_command *cmd,
+				   unsigned long *locations)
+{
+	gen_pool_set_algo(pool, cmd->allocator.algo,
+			  (void *)&cmd->allocator.data);
+}
+
+static void callback_allocate(struct gen_pool *pool,
+			      const struct test_command *cmd,
+			      unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	locations[action->location] = gen_pool_alloc(pool, action->size);
+	BUG_ON(!locations[action->location]);
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void callback_flush(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	unsigned int i;
+
+	for (i = 0; i < ENTRIES; i++)
+		if (locations[i]) {
+			gen_pool_free(pool, locations[i], 0);
+			locations[i] = 0;
+		}
+}
+
+static void callback_free(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	gen_pool_free(pool, locations[action->location], 0);
+	locations[action->location] = 0;
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void (* const callbacks[CMD_NUMBER])(struct gen_pool *,
+					    const struct test_command *,
+					    unsigned long *) = {
+	[CMD_ALLOCATOR] = callback_set_allocator,
+	[CMD_ALLOCATE] = callback_allocate,
+	[CMD_FREE] = callback_free,
+	[CMD_FLUSH] = callback_flush,
+};
+
+static const struct test_command test_first_fit[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(3, 0, ARR({0x2b})),
+	ALLOCATE_MEM(2, 1, ARR({0xeb, 0x02})),
+	ALLOCATE_MEM(5, 2, ARR({0xeb, 0xae, 0x0a})),
+	FREE_MEM(2, 1,  ARR({0x2b, 0xac, 0x0a})),
+	ALLOCATE_MEM(1, 1, ARR({0xeb, 0xac, 0x0a})),
+	FREE_MEM(0, 2,  ARR({0xeb})),
+	FREE_MEM(0, 0,  ARR({0xc0})),
+	FREE_MEM(0, 1,	ARR({0x00})),
+	END(),
+};
+
+/*
+ * To make the test work for both 32bit and 64bit ulong sizes,
+ * allocate (8 / 2 * 4 - 1) = 15 bytes bytes, then 16, then 2.
+ * The first allocation prepares for the crossing of the 32bit ulong
+ * threshold. The following crosses the 32bit threshold and prepares for
+ * crossing the 64bit thresholds. The last is large enough (2 bytes) to
+ * cross the 64bit threshold.
+ * Then free the allocations in the order: 2nd, 1st, 3rd.
+ */
+static const struct test_command test_ulong_span[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(15, 0, ARR({0xab, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(16, 1, ARR({0xab, 0xaa, 0xaa, 0xea,
+				0xaa, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xaa, 0xaa, 0xea,
+			       0xaa, 0xaa, 0xaa, 0xea,
+			       0x02})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0xaa, 0x2a,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x00, 0x00,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 2, ARR({0x00})),
+	END(),
+};
+
+/*
+ * Create progressively smaller allocations A B C D E.
+ * then free B and D.
+ * Then create new allocation that would fit in both of the gaps left by
+ * B and D. Verify that it uses the gap from B.
+ */
+static const struct test_command test_first_fit_gaps[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(10, 0, ARR({0xab, 0xaa, 0x0a})),
+	ALLOCATE_MEM(8, 1, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0x0a})),
+	ALLOCATE_MEM(6, 2, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa})),
+	ALLOCATE_MEM(4, 3, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab})),
+	ALLOCATE_MEM(2, 4, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 3, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0x00, 0x0b})),
+	ALLOCATE_MEM(3, 3, ARR({0xab, 0xaa, 0xba, 0x02,
+			       0xb0, 0xaa, 0x00, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+/* Test first fit align */
+static const struct test_command test_first_fit_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_align, align, 4),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0x02, 0x2b, 0x0b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0x02, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 2, ARR({0x00, 0x00, 0x2b, 0x00, 0x03})),
+	ALLOCATE_MEM(2, 0, ARR({0x0b, 0x00, 0x2b, 0x00, 0x03})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test fixed alloc */
+static const struct test_command test_fixed_data[] = {
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 1),
+	ALLOCATE_MEM(5, 0, ARR({0xac, 0x0a})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 8),
+	ALLOCATE_MEM(3, 1, ARR({0xac, 0x0a, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 6),
+	ALLOCATE_MEM(2, 2, ARR({0xac, 0xba, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 30),
+	ALLOCATE_MEM(40, 3, ARR({0xac, 0xba, 0x2b, 0x00,
+				0x00, 0x00, 0x00, 0xb0,
+				0xaa, 0xaa, 0xaa, 0xaa,
+				0xaa, 0xaa, 0xaa, 0xaa})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test first fit order align */
+static const struct test_command test_first_fit_order_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_order_align, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xb2, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xbe, 0x2b})),
+	ALLOCATE_MEM(1, 4, ARR({0xab, 0xbe, 0xeb})),
+	ALLOCATE_MEM(2, 5, ARR({0xab, 0xbe, 0xeb, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* 007 Test best fit */
+static const struct test_command test_best_fit[] = {
+	SET_ALLOCATOR(gen_pool_best_fit, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0xae})),
+	ALLOCATE_MEM(3, 2, ARR({0xab, 0xae, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xae, 0xeb})),
+	FREE_MEM(0, 0, ARR({0x00, 0xac, 0xeb})),
+	FREE_MEM(0, 2, ARR({0x00, 0xac, 0xc0})),
+	ALLOCATE_MEM(2, 0, ARR({0x00, 0xac, 0xcb})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+enum test_cases_indexes {
+	TEST_CASE_FIRST_FIT,
+	TEST_CASE_ULONG_SPAN,
+	TEST_CASE_FIRST_FIT_GAPS,
+	TEST_CASE_FIRST_FIT_ALIGN,
+	TEST_CASE_FIXED_DATA,
+	TEST_CASE_FIRST_FIT_ORDER_ALIGN,
+	TEST_CASE_BEST_FIT,
+	TEST_CASES_NUM,
+};
+
+static const struct test_command *test_cases[TEST_CASES_NUM] = {
+	[TEST_CASE_FIRST_FIT] = test_first_fit,
+	[TEST_CASE_ULONG_SPAN] = test_ulong_span,
+	[TEST_CASE_FIRST_FIT_GAPS] = test_first_fit_gaps,
+	[TEST_CASE_FIRST_FIT_ALIGN] = test_first_fit_align,
+	[TEST_CASE_FIXED_DATA] = test_fixed_data,
+	[TEST_CASE_FIRST_FIT_ORDER_ALIGN] = test_first_fit_order_align,
+	[TEST_CASE_BEST_FIT] = test_best_fit,
+};
+
+
+void test_genalloc(void)
+{
+	static struct gen_pool *pool;
+	unsigned long locations[ENTRIES];
+	char chunk[CHUNK_SIZE];
+	int retval;
+	unsigned int i;
+	const struct test_command *cmd;
+
+	pool = gen_pool_create(ALLOC_ORDER, -1);
+	if (unlikely(!pool)) {
+		pr_err("genalloc-selftest: no memory for pool.");
+		return;
+	}
+
+	retval = gen_pool_add_virt(pool, (unsigned long)chunk, 0,
+				   CHUNK_SIZE, -1);
+	if (unlikely(retval)) {
+		pr_err("genalloc-selftest: could not register chunk.");
+		goto destroy_pool;
+	}
+
+	memset(locations, 0, ENTRIES * sizeof(unsigned long));
+	for (i = 0; i < TEST_CASES_NUM; i++)
+		for (cmd = test_cases[i]; cmd->command < CMD_END; cmd++)
+			callbacks[cmd->command](pool, cmd, locations);
+	pr_notice("genalloc-selftest: executed successfully %d tests",
+		  TEST_CASES_NUM);
+
+destroy_pool:
+	gen_pool_destroy(pool);
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/7] struct page: add field for vm_struct
  2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
  2018-02-23 14:48 ` [PATCH 1/7] genalloc: track beginning of allocations Igor Stoppa
  2018-02-23 14:48 ` [PATCH 2/7] genalloc: selftest Igor Stoppa
@ 2018-02-23 14:48 ` Igor Stoppa
  2018-02-25  3:38   ` Matthew Wilcox
  2018-02-23 14:48 ` [PATCH 4/7] Protectable Memory Igor Stoppa
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtain
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area.

This will avoid more expensive searches, later on.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/mm_types.h | 1 +
 mm/vmalloc.c             | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
 		void *s_mem;			/* slab first object */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
+		struct vm_struct *area;
 	};
 
 	/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..14d99ed22397 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1536,6 +1536,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
 			struct page *page = area->pages[i];
 
 			BUG_ON(!page);
+			page->area = NULL;
 			__free_pages(page, 0);
 		}
 
@@ -1744,6 +1745,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller)
 {
 	struct vm_struct *area;
+	unsigned int i;
 	void *addr;
 	unsigned long real_size = size;
 
@@ -1769,6 +1771,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 
 	kmemleak_vmalloc(area, size, gfp_mask);
 
+	for (i = 0; i < area->nr_pages; i++)
+		area->pages[i]->area = area;
+
 	return addr;
 
 fail:
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/7] Protectable Memory
  2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
                   ` (2 preceding siblings ...)
  2018-02-23 14:48 ` [PATCH 3/7] struct page: add field for vm_struct Igor Stoppa
@ 2018-02-23 14:48 ` Igor Stoppa
  2018-02-24  0:10   ` J Freyensee
  2018-02-25  2:33   ` kbuild test robot
  2018-02-23 14:48 ` [PATCH 5/7] Pmalloc selftest Igor Stoppa
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Since pmalloc memory is obtained from vmalloc, an attacker that has
gained access to the physical mapping, still has to identify where the
target of the attack is actually located.

At the same time, being also based on genalloc, pmalloc does not
generate as much trashing of the TLB as it would be caused by using
directly only vmalloc.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 242 +++++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 499 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 812 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..afc2068d5545
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool() - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool	- success
+ * * NULL			- error
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object() - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return:
+ * * 0		- the object does not belong to pmalloc
+ * * 1		- the object belongs to pmalloc
+ * * \-1	- the object overlaps pmalloc memory incorrectly
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return:
+ * * true	- the vmalloc call was successful
+ * * false	- error
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc() - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc() - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array() - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc() - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup() - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return:
+ * * pointer to the replica	- success
+ * * NULL			- error
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool() - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree() - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool() - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 87f62f31b52f..24ed35035095 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -625,6 +625,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk() - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and it should have become unavailable for any other
+ * sort of operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..abddba90a9f6
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,499 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/*
+	 * is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/*
+	 * Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/*
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/7] Pmalloc selftest
  2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
                   ` (3 preceding siblings ...)
  2018-02-23 14:48 ` [PATCH 4/7] Protectable Memory Igor Stoppa
@ 2018-02-23 14:48 ` Igor Stoppa
  2018-03-06 16:59   ` J Freyensee
  2018-02-23 14:48 ` [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var Igor Stoppa
  2018-02-23 14:48 ` [PATCH 7/7] Documentation for Pmalloc Igor Stoppa
  6 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Add basic self-test functionality for pmalloc.

The testing is introduced as early as possible, right after the main
dependency, genalloc, has passed successfully, so that it can help
diagnosing failures in pmalloc users.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/test_pmalloc.h | 24 ++++++++++++++
 init/main.c                  |  2 ++
 mm/Kconfig                   |  9 +++++
 mm/Makefile                  |  1 +
 mm/test_pmalloc.c            | 79 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 115 insertions(+)
 create mode 100644 include/linux/test_pmalloc.h
 create mode 100644 mm/test_pmalloc.c

diff --git a/include/linux/test_pmalloc.h b/include/linux/test_pmalloc.h
new file mode 100644
index 000000000000..c7e2e451c17c
--- /dev/null
+++ b/include/linux/test_pmalloc.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * test_pmalloc.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __LINUX_TEST_PMALLOC_H
+#define __LINUX_TEST_PMALLOC_H
+
+
+#ifdef CONFIG_TEST_PROTECTABLE_MEMORY
+
+void test_pmalloc(void);
+
+#else
+
+static inline void test_pmalloc(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index bfccf1fd463c..a61e3a5fd321 100644
--- a/init/main.c
+++ b/init/main.c
@@ -90,6 +90,7 @@
 #include <linux/cache.h>
 #include <linux/rodata_test.h>
 #include <linux/test_genalloc.h>
+#include <linux/test_pmalloc.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -662,6 +663,7 @@ asmlinkage __visible void __init start_kernel(void)
 	mem_encrypt_init();
 
 	test_genalloc();
+	test_pmalloc();
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/mm/Kconfig b/mm/Kconfig
index be578fbdce6d..81dcc5345631 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
     depends on ARCH_HAS_SET_MEMORY
     select GENERIC_ALLOCATOR
     default y
+
+config TEST_PROTECTABLE_MEMORY
+	bool "Run self test for pmalloc memory allocator"
+	depends on ARCH_HAS_SET_MEMORY
+	select PROTECTABLE_MEMORY
+	default n
+	help
+	  Tries to verify that pmalloc works correctly and that the memory
+	  is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..1de4be5fd0bc 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_TEST_PROTECTABLE_MEMORY) += test_pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_pmalloc.c b/mm/test_pmalloc.c
new file mode 100644
index 000000000000..5de21c462d3d
--- /dev/null
+++ b/mm/test_pmalloc.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_pmalloc.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+#include <linux/test_pmalloc.h>
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)	\
+	pr_notice("must be " expected ": %s",		\
+		  is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)	\
+	validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)	\
+	validate_alloc("no", variable, size)
+
+void test_pmalloc(void)
+{
+	struct gen_pool *pool_unprot;
+	struct gen_pool *pool_prot;
+	void *var_prot, *var_unprot, *var_vmall;
+
+	pr_notice("pmalloc-selftest");
+	pool_unprot = pmalloc_create_pool("unprotected", 0);
+	if (unlikely(!pool_unprot))
+		goto error;
+	pool_prot = pmalloc_create_pool("protected", 0);
+	if (unlikely(!(pool_prot)))
+		goto error_release;
+
+	var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+	var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+	*(int *)var_prot = 0;
+	var_vmall = vmalloc(SIZE_2);
+	is_alloc_ok(var_unprot, 10);
+	is_alloc_ok(var_unprot, SIZE_1);
+	is_alloc_ok(var_unprot, PAGE_SIZE);
+	is_alloc_no(var_unprot, SIZE_1 + 1);
+	is_alloc_no(var_vmall, 10);
+
+
+	pfree(pool_unprot, var_unprot);
+	vfree(var_vmall);
+
+	pmalloc_protect_pool(pool_prot);
+
+	/*
+	 * This will intentionally trigger a WARN because the pool being
+	 * destroyed is not protected, which is unusual and should happen
+	 * on error paths only, where probably other warnings are already
+	 * displayed.
+	 */
+	pr_notice("pmalloc-selftest:"
+		  " Expect WARN in pmalloc_pool_set_protection below.");
+	pmalloc_destroy_pool(pool_unprot);
+	pr_notice("pmalloc-selftest:"
+		  "Critical point for expected WARN passed.");
+
+	/* This must not cause WARNings */
+	pr_notice("pmalloc-selftest:"
+		  "Expect no WARN below.");
+	pmalloc_destroy_pool(pool_prot);
+	pr_notice("pmalloc-selftest:"
+		  "Critical point for unexpected WARN passed.");
+	return;
+error_release:
+	pmalloc_destroy_pool(pool_unprot);
+error:
+	WARN(true, "Unable to allocate memory for pmalloc selftest.");
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var
  2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
                   ` (4 preceding siblings ...)
  2018-02-23 14:48 ` [PATCH 5/7] Pmalloc selftest Igor Stoppa
@ 2018-02-23 14:48 ` Igor Stoppa
  2018-02-25  3:46   ` kbuild test robot
  2018-03-06 17:05   ` J Freyensee
  2018-02-23 14:48 ` [PATCH 7/7] Documentation for Pmalloc Igor Stoppa
  6 siblings, 2 replies; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Verify that pmalloc read-only protection is in place: trying to
overwrite a protected variable will crash the kernel.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 drivers/misc/lkdtm.h       |  1 +
 drivers/misc/lkdtm_core.c  |  3 +++
 drivers/misc/lkdtm_perms.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 32 insertions(+)

diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index 9e513dcfd809..dcda3ae76ceb 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -38,6 +38,7 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+void lkdtm_WRITE_RO_PMALLOC(void);
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index 2154d1bfd18b..c9fd42bda6ee 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
 	CRASHTYPE(ACCESS_USERSPACE),
 	CRASHTYPE(WRITE_RO),
 	CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PROTECTABLE_MEMORY
+	CRASHTYPE(WRITE_RO_PMALLOC),
+#endif
 	CRASHTYPE(WRITE_KERN),
 	CRASHTYPE(REFCOUNT_INC_OVERFLOW),
 	CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm_perms.c b/drivers/misc/lkdtm_perms.c
index 53b85c9d16b8..0ac9023fd2b0 100644
--- a/drivers/misc/lkdtm_perms.c
+++ b/drivers/misc/lkdtm_perms.c
@@ -9,6 +9,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mman.h>
 #include <linux/uaccess.h>
+#include <linux/pmalloc.h>
 #include <asm/cacheflush.h>
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -104,6 +105,33 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
 	*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+void lkdtm_WRITE_RO_PMALLOC(void)
+{
+	struct gen_pool *pool;
+	int *i;
+
+	pool = pmalloc_create_pool("pool", 0);
+	if (unlikely(!pool)) {
+		pr_info("Failed preparing pool for pmalloc test.");
+		return;
+	}
+
+	i = (int *)pmalloc(pool, sizeof(int), GFP_KERNEL);
+	if (unlikely(!i)) {
+		pr_info("Failed allocating memory for pmalloc test.");
+		pmalloc_destroy_pool(pool);
+		return;
+	}
+
+	*i = INT_MAX;
+	pmalloc_protect_pool(pool);
+
+	pr_info("attempting bad pmalloc write at %p\n", i);
+	*i = 0;
+}
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
 	size_t size;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/7] Documentation for Pmalloc
  2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
                   ` (5 preceding siblings ...)
  2018-02-23 14:48 ` [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var Igor Stoppa
@ 2018-02-23 14:48 ` Igor Stoppa
  2018-02-24  0:26   ` J Freyensee
  6 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-23 14:48 UTC (permalink / raw)
  To: david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
    genalloc
    errseq
    printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===============================
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
new file mode 100644
index 000000000000..d9725870444e
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Protectable memory allocator
+============================
+
+Purpose
+-------
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor could it take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+-------
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+-------
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requester; however, it will not become available
+  for further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support with respect to allocating vs
+  protecting an individual pool, for performance reasons.
+  It is recommended not to share the same pool between unrelated functions.
+  Should sharing be a necessity, the user of the shared pool is expected
+  to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using vmalloc directly. The exact number depends on the
+  size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write-protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handles obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] genalloc: track beginning of allocations
  2018-02-23 14:48 ` [PATCH 1/7] genalloc: track beginning of allocations Igor Stoppa
@ 2018-02-23 22:28   ` J Freyensee
  2018-02-26 12:09     ` Igor Stoppa
  2018-02-25  3:37   ` kbuild test robot
  1 sibling, 1 reply; 32+ messages in thread
From: J Freyensee @ 2018-02-23 22:28 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening

some code snipping
.
.
.
> +/**
> + * get_bitmap_entry() - extracts the specified entry from the bitmap
> + * @map: pointer to a bitmap
> + * @entry_index: the index of the desired entry in the bitmap
> + *
> + * Return: The requested bitmap.
> + */
> +static inline unsigned long get_bitmap_entry(unsigned long *map,
> +					    int entry_index)
> +{

Apologies if this has been mentioned before, but since this function is 
expecting map to not be NULL, shouldn't something like:

WARN_ON(map == NULL);

be used to check the parameters for bad values?

BTW, excellent commenting :-).
> +	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
> +		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
> +		ENTRY_MASK;
> +}
> +
> +
> +/**
> + * mem_to_units() - convert references to memory into orders of allocation
> + * @size: amount in bytes
> + * @order: power of 2 represented by each entry in the bitmap
> + *
> + * Return: the number of units representing the size.
> + */
> +static inline unsigned long mem_to_units(unsigned long size,
> +					 unsigned long order)
> +{
> +	return (size + (1UL << order) - 1) >> order;
> +}
> +
> +/**
> + * chunk_size() - dimension of a chunk of memory, in bytes
> + * @chunk: pointer to the struct describing the chunk
> + *
> + * Return: The size of the chunk, in bytes.
> + */
>   static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
>   {
Same problem here, always expecting chunk to not but NULL.

>   	return chunk->end_addr - chunk->start_addr + 1;
>   }
>   
> -static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
> +
> +/**
> + * set_bits_ll() - based on value and mask, sets bits at address
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to store
> + *
> + * Return:
> + * * 0		- success
> + * * -EBUSY	- otherwise
> + */
> +static int set_bits_ll(unsigned long *addr,
> +		       unsigned long mask, unsigned long value)
>   {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
>   
>   	nval = *addr;

Same issue here with addr.
>   	do {
> -		val = nval;
> -		if (val & mask_to_set)
> +		present = nval;
> +		if (present & mask)
>   			return -EBUSY;
> +		target =  present | value;
>   		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>   	return 0;
>   }
>   
> -static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
> +
> +/**
> + * clear_bits_ll() - based on value and mask, clears bits at address
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to clear
> + *
> + * Return:
> + * * 0		- success
> + * * -EBUSY	- otherwise
> + */
> +static int clear_bits_ll(unsigned long *addr,
> +			 unsigned long mask, unsigned long value)

Dido here for addr too.
>   {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
>   
>   	nval = *addr;
> +	present = nval;
> +	if (unlikely((present & mask) ^ value))
> +		return -EBUSY;
>   	do {
> -		val = nval;
> -		if ((val & mask_to_clear) != mask_to_clear)
> +		present = nval;
> +		if (unlikely((present & mask) ^ value))
>   			return -EBUSY;
> +		target =  present & ~mask;
>   		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>   	return 0;
>   }
>   
> -/*
> - * bitmap_set_ll - set the specified number of bits at the specified position
> +
> +/**
> + * get_boundary() - verifies address, then measure length.
>    * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start_entry: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
>    *
> - * Set @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users set the same bit, one user will return remain bits, otherwise
> - * return 0.
> + * Return:
> + * * length of an allocation	- success
> + * * -EINVAL			- invalid parameters
>    */
> -static int bitmap_set_ll(unsigned long *map, int start, int nr)
> +static int get_boundary(unsigned long *map, int start_entry, int nentries)
Same problem for map.

>   {
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_set >= 0) {
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -		nr -= bits_to_set;
> -		bits_to_set = BITS_PER_LONG;
> -		mask_to_set = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -	}
> +	int i;
> +	unsigned long bitmap_entry;
>   
> -	return 0;
> +
> +	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))j
Since get_bitmap_entry() is a new function, I'm not sure why the second 
parameter is defined as an 'int', looks like unsigned int would be 
appropriate.
> +		return -EINVAL;
> +	for (i = start_entry + 1; i < nentries; i++) {
It is being assumed nentries is not negative, which can be possible as 
the function parameter is defined as 'int'.
> +		bitmap_entry = get_bitmap_entry(map, i);
> +		if (bitmap_entry == ENTRY_HEAD ||
> +		    bitmap_entry == ENTRY_UNUSED)
> +			return i;
> +	}
> +	return nentries - start_entry;
>   }
>   
> +
> +#define SET_BITS 1
> +#define CLEAR_BITS 0
> +
>   /*
> - * bitmap_clear_ll - clear the specified number of bits at the specified position
> + * alter_bitmap_ll() - set/clear the entries associated with an allocation
> + * @alteration: indicates if the bits selected should be set or cleared
>    * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
> + *
> + * The modification happens lock-lessly.
> + * Several users can write to the same map simultaneously, without lock.
> + * In case of mid-air conflict, when 2 or more writers try to alter the
> + * same word in the bitmap, only one will succeed and continue, the others
> + * will fail and receive as return value the amount of entries that were
> + * not written. Each failed writer is responsible to revert the changes
> + * it did to the bitmap.
> + * The lockless conflict resolution is implemented through cmpxchg.
> + * Success or failure is purely based on first come first served basis.
> + * The first writer that manages to gain write access to the target word
> + * of the bitmap wins. Whatever can affect the order and priority of execution
> + * of the writers can and will affect the result of the race.
>    *
> - * Clear @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users clear the same bit, one user will return remain bits,
> - * otherwise return 0.
> + * Return:
> + * * 0			- success
> + * * remaining entries	- failure
>    */
> -static int bitmap_clear_ll(unsigned long *map, int start, int nr)
> +static int alter_bitmap_ll(bool alteration, unsigned long *map,
> +			   int start_entry, int nentries)
>   {
map could benefit from a WARN_ON(map == NULL).
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_clear >= 0) {
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> -		nr -= bits_to_clear;
> -		bits_to_clear = BITS_PER_LONG;
> -		mask_to_clear = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> +	unsigned long start_bit;
> +	unsigned long end_bit;
> +	unsigned long mask;
> +	unsigned long value;
> +	int nbits;
> +	int bits_to_write;
> +	int index;
> +	int (*action)(unsigned long *addr,
> +		      unsigned long mask, unsigned long value);
> +
> +	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
> +
> +	/*
> +	 * Prepare for writing the initial part of the allocation, from
> +	 * starting entry, to the end of the UL bitmap element which
> +	 * contains it. It might be larger than the actual allocation.
> +	 */
> +	start_bit = ENTRIES_TO_BITS(start_entry);
> +	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
> +	nbits = ENTRIES_TO_BITS(nentries);

these statements won't make any sense if start_entry and nentries are 
negative values, which is possible based on the function definition 
alter_bitmap_ll().  Am I missing something that it's ok for these 
parameters to be negative?
> +	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
> +	mask = BITMAP_FIRST_WORD_MASK(start_bit);
> +	/* Mark the beginning of the allocation. */
> +	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
> +	index = BITS_DIV_LONGS(start_bit);
> +
> +	/*
> +	 * Writes entries to the bitmap, as long as the reminder is
> +	 * positive or zero.
> +	 * Might be skipped if the entries to write do not reach the end
> +	 * of a bitmap UL unit.
> +	 */
> +	while (nbits >= bits_to_write) {
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +		nbits -= bits_to_write;
> +		bits_to_write = BITS_PER_LONG;
> +		mask = ~0UL;
> +		value = MASK;
> +		index++;
>   	}
>   
> +	/* Takes care of the ending part of the entries to mark. */
> +	if (nbits > 0) {
> +		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
> +		bits_to_write = nbits;
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +	}
>   	return 0;
>   }
>   
> +
>   /**
> - * gen_pool_create - create a new special memory pool
> - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
> - * @nid: node id of the node the pool structure should be allocated on, or -1
> + * gen_pool_create() - create a new special memory pool
> + * @min_alloc_order: log base 2 of number of bytes each bitmap entry
> + *		     represents
> + * @nid: node id of the node the pool structure should be allocated on,
> + *	 or -1
>    *
> - * Create a new special memory pool that can be used to manage special purpose
> - * memory not managed by the regular kmalloc/kfree interface.
> + * Create a new special memory pool that can be used to manage special
> + * purpose memory not managed by the regular kmalloc/kfree interface.
> + *
> + * Return:
> + * * pointer to the pool	- success
> + * * NULL			- otherwise
>    */
>   struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>   {
> @@ -167,7 +364,7 @@ struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>   EXPORT_SYMBOL(gen_pool_create);
>   
>   /**
> - * gen_pool_add_virt - add a new chunk of special memory to the pool
> + * gen_pool_add_virt() - add a new chunk of special memory to the pool
>    * @pool: pool to add new memory chunk to
>    * @virt: virtual starting address of memory chunk to add to pool
>    * @phys: physical starting address of memory chunk to add to pool
> @@ -177,16 +374,20 @@ EXPORT_SYMBOL(gen_pool_create);
>    *
>    * Add a new chunk of special memory to the specified pool.
>    *
> - * Returns 0 on success or a -ve errno on failure.
> + * Return:
> + * * 0		- success
> + * * -ve errno	- failure
>    */
> -int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
> -		 size_t size, int nid)
> +int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt,
> +		      phys_addr_t phys, size_t size, int nid)
>   {
WARN_ON(pool == NULL);
?
>   	struct gen_pool_chunk *chunk;
> -	int nbits = size >> pool->min_alloc_order;
> -	int nbytes = sizeof(struct gen_pool_chunk) +
> -				BITS_TO_LONGS(nbits) * sizeof(long);
> +	int nentries;
> +	int nbytes;
>   
> +	nentries = size >> pool->min_alloc_order;
> +	nbytes = sizeof(struct gen_pool_chunk) +
> +		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
>   	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
>   	if (unlikely(chunk == NULL))
>   		return -ENOMEM;
> @@ -205,11 +406,13 @@ int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phy
>   EXPORT_SYMBOL(gen_pool_add_virt);
>   
>   /**
> - * gen_pool_virt_to_phys - return the physical address of memory
> + * gen_pool_virt_to_phys() - return the physical address of memory
>    * @pool: pool to allocate from
>    * @addr: starting address of memory
>    *
> - * Returns the physical address on success, or -1 on error.
> + * Return:
> + * * the physical address	- success
> + * * \-1			- error
>    */
>   phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>   {
> @@ -230,7 +433,7 @@ phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>   EXPORT_SYMBOL(gen_pool_virt_to_phys);
>   
>   /**
> - * gen_pool_destroy - destroy a special memory pool
> + * gen_pool_destroy() - destroy a special memory pool
>    * @pool: pool to destroy
>    *
>    * Destroy the specified special memory pool. Verifies that there are no
> @@ -248,7 +451,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>   		list_del(&chunk->next_chunk);
>   
>   		end_bit = chunk_size(chunk) >> order;
> -		bit = find_next_bit(chunk->bits, end_bit, 0);
> +		bit = find_next_bit(chunk->entries, end_bit, 0);
>   		BUG_ON(bit < end_bit);
>   
>   		kfree(chunk);
> @@ -259,7 +462,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>   EXPORT_SYMBOL(gen_pool_destroy);
>   
>   /**
> - * gen_pool_alloc - allocate special memory from the pool
> + * gen_pool_alloc() - allocate special memory from the pool
>    * @pool: pool to allocate from
>    * @size: number of bytes to allocate from the pool
>    *
> @@ -267,6 +470,10 @@ EXPORT_SYMBOL(gen_pool_destroy);
>    * Uses the pool allocation function (with first-fit algorithm by default).
>    * Can not be used in NMI handler on architectures without
>    * NMI-safe cmpxchg implementation.
> + *
> + * Return:
> + * * address of the memory allocated	- success
> + * * NULL				- error
>    */
>   unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>   {
> @@ -275,7 +482,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>   EXPORT_SYMBOL(gen_pool_alloc);
>   
>   /**
> - * gen_pool_alloc_algo - allocate special memory from the pool
> + * gen_pool_alloc_algo() - allocate special memory from the pool
>    * @pool: pool to allocate from
>    * @size: number of bytes to allocate from the pool
>    * @algo: algorithm passed from caller
> @@ -285,6 +492,10 @@ EXPORT_SYMBOL(gen_pool_alloc);
>    * Uses the pool allocation function (with first-fit algorithm by default).
>    * Can not be used in NMI handler on architectures without
>    * NMI-safe cmpxchg implementation.
> + *
> + * Return:
> + * * address of the memory allocated	- success
> + * * NULL				- error
>    */
>   unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>   		genpool_algo_t algo, void *data)
> @@ -292,7 +503,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>   	struct gen_pool_chunk *chunk;
>   	unsigned long addr = 0;
>   	int order = pool->min_alloc_order;
> -	int nbits, start_bit, end_bit, remain;
> +	int nentries, start_entry, end_entry, remain;
Be nicer to use "unsigned int", but it's not clear from this diff that 
this could work with other existing code.

>   
>   #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>   	BUG_ON(in_nmi());
> @@ -301,29 +512,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>   	if (size == 0)
>   		return 0;
>   
> -	nbits = (size + (1UL << order) - 1) >> order;
> +	nentries = mem_to_units(size, order);
>   	rcu_read_lock();
>   	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>   		if (size > atomic_long_read(&chunk->avail))
>   			continue;
>   
> -		start_bit = 0;
> -		end_bit = chunk_size(chunk) >> order;
> +		start_entry = 0;
> +		end_entry = chunk_size(chunk) >> order;
>   retry:
> -		start_bit = algo(chunk->bits, end_bit, start_bit,
> -				 nbits, data, pool);
> -		if (start_bit >= end_bit)
> +		start_entry = algo(chunk->entries, end_entry, start_entry,
> +				  nentries, data, pool);
> +		if (start_entry >= end_entry)
>   			continue;
> -		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
> +		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
> +					 start_entry, nentries);
>   		if (remain) {
> -			remain = bitmap_clear_ll(chunk->bits, start_bit,
> -						 nbits - remain);
> -			BUG_ON(remain);
> +			remain = alter_bitmap_ll(CLEAR_BITS,
> +						 chunk->entries,
> +						 start_entry,
> +						 nentries - remain);
>   			goto retry;
>   		}
>   
> -		addr = chunk->start_addr + ((unsigned long)start_bit << order);
> -		size = nbits << order;
> +		addr = chunk->start_addr +
> +			((unsigned long)start_entry << order);
> +		size = nentries << order;
>   		atomic_long_sub(size, &chunk->avail);
>   		break;
>   	}
> @@ -333,7 +547,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>   EXPORT_SYMBOL(gen_pool_alloc_algo);
>   
>   /**
> - * gen_pool_dma_alloc - allocate special memory from the pool for DMA usage
> + * gen_pool_dma_alloc() - allocate special memory from the pool for DMA usage
>    * @pool: pool to allocate from
>    * @size: number of bytes to allocate from the pool
>    * @dma: dma-view physical address return value.  Use NULL if unneeded.
> @@ -342,6 +556,10 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
>    * Uses the pool allocation function (with first-fit algorithm by default).
>    * Can not be used in NMI handler on architectures without
>    * NMI-safe cmpxchg implementation.
> + *
> + * Return:
> + * * address of the memory allocated	- success
> + * * NULL				- error
>    */
>   void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>   {
> @@ -362,10 +580,10 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>   EXPORT_SYMBOL(gen_pool_dma_alloc);
>   
>   /**
> - * gen_pool_free - free allocated special memory back to the pool
> + * gen_pool_free() - free allocated special memory back to the pool
>    * @pool: pool to free to
>    * @addr: starting address of memory to free back to pool
> - * @size: size in bytes of memory to free
> + * @size: size in bytes of memory to free or 0, for auto-detection
>    *
>    * Free previously allocated special memory back to the specified
>    * pool.  Can not be used in NMI handler on architectures without
> @@ -375,22 +593,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>   {
>   	struct gen_pool_chunk *chunk;
>   	int order = pool->min_alloc_order;
> -	int start_bit, nbits, remain;
> +	int start_entry, remaining_entries, nentries, remain;
> +	int boundary;
>   
>   #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>   	BUG_ON(in_nmi());
>   #endif
>   
> -	nbits = (size + (1UL << order) - 1) >> order;
>   	rcu_read_lock();
>   	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>   		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
>   			BUG_ON(addr + size - 1 > chunk->end_addr);
> -			start_bit = (addr - chunk->start_addr) >> order;
> -			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
> +			start_entry = (addr - chunk->start_addr) >> order;
> +			remaining_entries = (chunk->end_addr - addr) >> order;
> +			boundary = get_boundary(chunk->entries, start_entry,
> +						remaining_entries);
> +			BUG_ON(boundary < 0);

Do you really want to use BUG_ON()?  I've thought twice about using 
BUG_ON() based on Linus's wrath with BUG_ON() code causing an issue with 
the 4.8 release:

https://lkml.org/lkml/2016/10/4/1

Hence why I've been giving WARN_ON() suggestions throughout this review.

Thanks,
Jay

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/7] genalloc: selftest
  2018-02-23 14:48 ` [PATCH 2/7] genalloc: selftest Igor Stoppa
@ 2018-02-23 22:42   ` J Freyensee
  2018-02-26 12:11     ` Igor Stoppa
  0 siblings, 1 reply; 32+ messages in thread
From: J Freyensee @ 2018-02-23 22:42 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening


> +	locations[action->location] = gen_pool_alloc(pool, action->size);
> +	BUG_ON(!locations[action->location]);

Again, I'd think it through if you really want to use BUG_ON() or not:

https://lwn.net/Articles/13183/
https://lkml.org/lkml/2016/10/4/1

Thanks,
Jay

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/7] Protectable Memory
  2018-02-23 14:48 ` [PATCH 4/7] Protectable Memory Igor Stoppa
@ 2018-02-24  0:10   ` J Freyensee
  2018-02-26 14:28     ` Igor Stoppa
  2018-02-25  2:33   ` kbuild test robot
  1 sibling, 1 reply; 32+ messages in thread
From: J Freyensee @ 2018-02-24  0:10 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 2/23/18 6:48 AM, Igor Stoppa wrote:

> The MMU available in many systems running Linux can often provide R/O
> protection to the memory pages it handles.
>
> However, the MMU-based protection works efficiently only when said pages
> contain exclusively data that will not need further modifications.
>
> Statically allocated variables can be segregated into a dedicated
> section, but this does not sit very well with dynamically allocated
> ones.
>
> Dynamic allocation does not provide, currently, any means for grouping
> variables in memory pages that would contain exclusively data suitable
> for conversion to read only access mode.
>
> The allocator here provided (pmalloc - protectable memory allocator)
> introduces the concept of pools of protectable memory.
>
> A module can request a pool and then refer any allocation request to the
> pool handler it has received.
>
> Once all the chunks of memory associated to a specific pool are
> initialized, the pool can be protected.
>
> After this point, the pool can only be destroyed (it is up to the module
> to avoid any further references to the memory from the pool, after
> the destruction is invoked).
>
> The latter case is mainly meant for releasing memory, when a module is
> unloaded.
>
> A module can have as many pools as needed, for example to support the
> protection of data that is initialized in sufficiently distinct phases.
>
> Since pmalloc memory is obtained from vmalloc, an attacker that has
> gained access to the physical mapping, still has to identify where the
> target of the attack is actually located.
>
> At the same time, being also based on genalloc, pmalloc does not
> generate as much trashing of the TLB as it would be caused by using
> directly only vmalloc.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>   include/linux/genalloc.h |   3 +
>   include/linux/pmalloc.h  | 242 +++++++++++++++++++++++
>   include/linux/vmalloc.h  |   1 +
>   lib/genalloc.c           |  27 +++
>   mm/Kconfig               |   6 +
>   mm/Makefile              |   1 +
>   mm/pmalloc.c             | 499 +++++++++++++++++++++++++++++++++++++++++++++++
>   mm/usercopy.c            |  33 ++++
>   8 files changed, 812 insertions(+)
>   create mode 100644 include/linux/pmalloc.h
>   create mode 100644 mm/pmalloc.c
>
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index dcaa33e74b1c..b6c4cea9fbd8 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
>   extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
>   		dma_addr_t *dma);
>   extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> +
> +extern void gen_pool_flush_chunk(struct gen_pool *pool,
> +				 struct gen_pool_chunk *chunk);
>   extern void gen_pool_for_each_chunk(struct gen_pool *,
>   	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
>   extern size_t gen_pool_avail(struct gen_pool *);
> diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
> new file mode 100644
> index 000000000000..afc2068d5545
> --- /dev/null
> +++ b/include/linux/pmalloc.h
> @@ -0,0 +1,242 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * pmalloc.h: Header for Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#ifndef _LINUX_PMALLOC_H
> +#define _LINUX_PMALLOC_H
> +
> +
> +#include <linux/genalloc.h>
> +#include <linux/string.h>
> +
> +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
> +
> +/*
> + * Library for dynamic allocation of pools of memory that can be,
> + * after initialization, marked as read-only.
> + *
> + * This is intended to complement __read_only_after_init, for those cases
> + * where either it is not possible to know the initialization value before
> + * init is completed, or the amount of data is variable and can be
> + * determined only at run-time.
> + *
> + * ***WARNING***
> + * The user of the API is expected to synchronize:
> + * 1) allocation,
> + * 2) writes to the allocated memory,
> + * 3) write protection of the pool,
> + * 4) freeing of the allocated memory, and
> + * 5) destruction of the pool.
> + *
> + * For a non-threaded scenario, this type of locking is not even required.
> + *
> + * Even if the library were to provide support for locking, point 2)
> + * would still depend on the user taking the lock.
> + */
> +
> +
> +/**
> + * pmalloc_create_pool() - create a new protectable memory pool
> + * @name: the name of the pool, enforced to be unique
> + * @min_alloc_order: log2 of the minimum allocation size obtainable
> + *                   from the pool
> + *
> + * Creates a new (empty) memory pool for allocation of protectable
> + * memory. Memory will be allocated upon request (through pmalloc).
> + *
> + * Return:
> + * * pointer to the new pool	- success
> + * * NULL			- error
> + */
> +struct gen_pool *pmalloc_create_pool(const char *name,
> +					 int min_alloc_order);

Same comments as earlier.  If this is new API with new code being 
introduced into the kernel, can the variables be declared to avoid weird 
problems?  Like min_alloc_order being a negative value makes little 
sense (based on the description here), so can it be declared as size_t 
or unsigned int?
> +
> +/**
> + * is_pmalloc_object() - validates the existence of an alleged object
> + * @ptr: address of the object
> + * @n: size of the object, in bytes
> + *
> + * Return:
> + * * 0		- the object does not belong to pmalloc
> + * * 1		- the object belongs to pmalloc
> + * * \-1	- the object overlaps pmalloc memory incorrectly
> + */
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +/**
> + * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + *
> + * Prepares a chunk of the requested size.
> + * This is intended to both minimize latency in later memory requests and
> + * avoid sleeping during allocation.
> + * Memory allocated with prealloc is stored in one single chunk, as
> + * opposed to what is allocated on-demand when pmalloc runs out of free
> + * space already existing in the pool and has to invoke vmalloc.
> + * One additional advantage of pre-allocating larger chunks of memory is
> + * that the total slack tends to be smaller.
> + *
> + * Return:
> + * * true	- the vmalloc call was successful
> + * * false	- error
> + */
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
> +
> +/**
> + * pmalloc() - allocate protectable memory from a pool
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Allocates memory from an unprotected pool. If the pool doesn't have
> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
> + * is made to add a new chunk of memory to the pool
> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> + * Otherwise, NULL is returned.
> + *
> + * Return:
> + * * pointer to the memory requested	- success
> + * * NULL				- either no memory available or
> + *					  pool already read-only
> + */

I don't know if an errno value is being set, but setting a variable 
somewhere using EROFS or ENOMEM would more accurate diagnose those two 
NULL conditions.
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
> +
> +
> +/**
> + * pzalloc() - zero-initialized version of pmalloc
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Executes pmalloc, initializing the memory requested to 0,
> + * before returning the pointer to it.
> + *
> + * Return:
> + * * pointer to the memory requested	- success
> + * * NULL				- either no memory available or
> + *					  pool already read-only
> + */
Same comment here, though that inline function below looks highly 
optimized...
> +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	return pmalloc(pool, size, gfp | __GFP_ZERO);
> +}
> +
> +/**
> + * pmalloc_array() - allocates an array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested for each element
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc, if it has a chance to succeed.
> + *
> + * Return:
> + * * the pmalloc result	- success
> + * * NULL		- error
> + */
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> +				  size_t size, gfp_t flags)
> +{
> +	if (unlikely(!(pool && n && size)))
Has this code been run through sparse?  I know one thing sparse looks at 
is if NULL is being treated like a 0, and sparse does check cases when 0 
is being used in place for NULL for pointer checks, and I'm wondering if 
that line of code would pass.

> +		return NULL;
> +	return pmalloc(pool, n * size, flags);
> +}
> +
> +/**
> + * pcalloc() - allocates a 0-initialized array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc_array, if it has a chance to succeed.
> + *
> + * Return:
> + * * the pmalloc result	- success
> + * * NULL		- error
> + */
> +static inline void *pcalloc(struct gen_pool *pool, size_t n,
> +			    size_t size, gfp_t flags)
> +{
> +	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
> +}
> +
> +/**
> + * pstrdup() - duplicate a string, using pmalloc as allocator
> + * @pool: handle to the pool to be used for memory allocation
> + * @s: string to duplicate
> + * @gfp: flags for page allocation
> + *
> + * Generates a copy of the given string, allocating sufficient memory
> + * from the given pmalloc pool.
> + *
> + * Return:
> + * * pointer to the replica	- success
> + * * NULL			- error
> + */
> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> +	size_t len;
> +	char *buf;
> +
> +	if (unlikely(pool == NULL || s == NULL))
Here, the right check is being done, so at the very least, I would make 
the last line I commented on the same as this one for code continuity.
> +		return NULL;
> +
> +	len = strlen(s) + 1;
> +	buf = pmalloc(pool, len, gfp);
> +	if (likely(buf))
> +		strncpy(buf, s, len);
> +	return buf;
> +}
> +
> +/**
> + * pmalloc_protect_pool() - turn a read/write pool read-only
> + * @pool: the pool to protect
> + *
> + * Write-protects all the memory chunks assigned to the pool.
> + * This prevents any further allocation.
> + *
> + * Return:
> + * * 0		- success
> + * * -EINVAL	- error
> + */
> +int pmalloc_protect_pool(struct gen_pool *pool);
> +
> +/**
> + * pfree() - mark as unused memory that was previously in use
> + * @pool: handle to the pool to be used for memory allocation
> + * @addr: the beginning of the memory area to be freed
> + *
> + * The behavior of pfree is different, depending on the state of the
> + * protection.
> + * If the pool is not yet protected, the memory is marked as unused and
> + * will be available for further allocations.
> + * If the pool is already protected, the memory is marked as unused, but
> + * it will still be impossible to perform further allocation, because of
> + * the existing protection.
> + * The freed memory, in this case, will be truly released only when the
> + * pool is destroyed.
> + */
> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> +	gen_pool_free(pool, (unsigned long)addr, 0);
> +}
> +
> +/**
> + * pmalloc_destroy_pool() - destroys a pool and all the associated memory
> + * @pool: the pool to destroy
> + *
> + * All the memory that was allocated through pmalloc in the pool will be freed.
> + *
> + * Return:
> + * * 0		- success
> + * * -EINVAL	- error
> + */
> +int pmalloc_destroy_pool(struct gen_pool *pool);
> +
> +#endif
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 1e5d8c392f15..116d280cca53 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
>   #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
>   #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>   #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
> +#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
>   /* bits [20..32] reserved for arch specific ioremap internals */
>   
>   /*
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> index 87f62f31b52f..24ed35035095 100644
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -625,6 +625,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>   }
>   EXPORT_SYMBOL(gen_pool_free);
>   
> +
> +/**
> + * gen_pool_flush_chunk() - drops all the allocations from a specific chunk
> + * @pool:	the generic memory pool
> + * @chunk:	The chunk to wipe clear.
> + *
> + * This is meant to be called only while destroying a pool. It's up to the
> + * caller to avoid races, but really, at this point the pool should have
> + * already been retired and it should have become unavailable for any other
> + * sort of operation.
> + */
> +void gen_pool_flush_chunk(struct gen_pool *pool,
> +			  struct gen_pool_chunk *chunk)
> +{
> +	size_t size;
> +
> +	if (unlikely(!(pool && chunk)))

Please make this check the same as the last line I commented on, 
especially since it's the same struct being checked.

> +		return;
> +
> +	size = chunk->end_addr + 1 - chunk->start_addr;
> +	memset(chunk->entries, 0,
> +	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
> +			    BITS_PER_BYTE));
> +	atomic_long_set(&chunk->avail, size);
> +}
> +
> +
>   /**
>    * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
>    * @pool:	the generic memory pool
> diff --git a/mm/Kconfig b/mm/Kconfig
> index c782e8fb7235..be578fbdce6d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -760,3 +760,9 @@ config GUP_BENCHMARK
>   	  performance of get_user_pages_fast().
>   
>   	  See tools/testing/selftests/vm/gup_benchmark.c
> +
> +config PROTECTABLE_MEMORY
> +    bool
> +    depends on ARCH_HAS_SET_MEMORY
> +    select GENERIC_ALLOCATOR
> +    default y
> diff --git a/mm/Makefile b/mm/Makefile
> index e669f02c5a54..959fdbdac118 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
>   obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
>   obj-$(CONFIG_SLOB) += slob.o
>   obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
> +obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
>   obj-$(CONFIG_KSM) += ksm.o
>   obj-$(CONFIG_PAGE_POISONING) += page_poison.o
>   obj-$(CONFIG_SLAB) += slab.o
> diff --git a/mm/pmalloc.c b/mm/pmalloc.c
> new file mode 100644
> index 000000000000..abddba90a9f6
> --- /dev/null
> +++ b/mm/pmalloc.c
> @@ -0,0 +1,499 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * pmalloc.c: Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/genalloc.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/atomic.h>
> +#include <linux/rculist.h>
> +#include <linux/set_memory.h>
> +#include <asm/cacheflush.h>
> +#include <asm/page.h>
> +
> +#include <linux/pmalloc.h>
> +/*
> + * pmalloc_data contains the data specific to a pmalloc pool,
> + * in a format compatible with the design of gen_alloc.
> + * Some of the fields are used for exposing the corresponding parameter
> + * to userspace, through sysfs.
> + */
> +struct pmalloc_data {
> +	struct gen_pool *pool;  /* Link back to the associated pool. */
> +	bool protected;     /* Status of the pool: RO or RW. */
> +	struct kobj_attribute attr_protected; /* Sysfs attribute. */
> +	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
> +	struct kobj_attribute attr_size;      /* Sysfs attribute. */
> +	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
> +	struct kobject *pool_kobject;
> +	struct list_head node; /* list of pools */
> +};
> +
> +static LIST_HEAD(pmalloc_final_list);
> +static LIST_HEAD(pmalloc_tmp_list);
> +static struct list_head *pmalloc_list = &pmalloc_tmp_list;
> +static DEFINE_MUTEX(pmalloc_mutex);
> +static struct kobject *pmalloc_kobject;
> +
> +static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
> +					   struct kobj_attribute *attr,
> +					   char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_protected);
> +	if (data->protected)
> +		return sprintf(buf, "protected\n");
> +	else
> +		return sprintf(buf, "unprotected\n");
> +}
> +
> +static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
> +				       struct kobj_attribute *attr,
> +				       char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_avail);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_avail(data->pool));
> +}
> +
> +static ssize_t pmalloc_pool_show_size(struct kobject *dev,
> +				      struct kobj_attribute *attr,
> +				      char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_size);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_size(data->pool));
> +}
> +
> +static void pool_chunk_number(struct gen_pool *pool,
> +			      struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long *counter = data;
> +
> +	(*counter)++;
> +}
> +
> +static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	struct pmalloc_data *data;
> +	unsigned long chunks_num = 0;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_chunks);
> +	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
> +	return sprintf(buf, "%lu\n", chunks_num);
> +}
> +
> +/* Exposes the pool and its attributes through sysfs. */
> +static struct kobject *pmalloc_connect(struct pmalloc_data *data)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +	struct kobject *kobj;
> +
> +	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
> +	if (unlikely(!kobj))
> +		return NULL;
> +
> +	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
> +		kobject_put(kobj);
> +		kobj = NULL;
> +	}
> +	return kobj;
> +}
> +
> +/* Removes the pool and its attributes from sysfs. */
> +static void pmalloc_disconnect(struct pmalloc_data *data,
> +			       struct kobject *kobj)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +
> +	sysfs_remove_files(kobj, attrs);
> +	kobject_put(kobj);
> +}
> +
> +/* Declares an attribute of the pool. */
> +#define pmalloc_attr_init(data, attr_name) \
> +do { \
> +	sysfs_attr_init(&data->attr_##attr_name.attr); \
> +	data->attr_##attr_name.attr.name = #attr_name; \
> +	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
> +	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
> +} while (0)
> +
> +struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
> +{
> +	struct gen_pool *pool;
> +	const char *pool_name;
> +	struct pmalloc_data *data;
> +
> +	if (!name) {
> +		WARN_ON(1);
??  Maybe the name check should be in WARN_ON()?
> +		return NULL;
> +	}
> +
> +	if (min_alloc_order < 0)
> +		min_alloc_order = ilog2(sizeof(unsigned long));
> +
> +	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
> +	if (unlikely(!pool))
> +		return NULL;
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_for_each_entry(data, pmalloc_list, node)
> +		if (!strcmp(name, data->pool->name))
> +			goto same_name_err;
> +
> +	pool_name = kstrdup(name, GFP_KERNEL);
> +	if (unlikely(!pool_name))
> +		goto name_alloc_err;
> +
> +	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
> +	if (unlikely(!data))
> +		goto data_alloc_err;
> +
> +	data->protected = false;
> +	data->pool = pool;
> +	pmalloc_attr_init(data, protected);
> +	pmalloc_attr_init(data, avail);
> +	pmalloc_attr_init(data, size);
> +	pmalloc_attr_init(data, chunks);
> +	pool->data = data;
> +	pool->name = pool_name;
> +
> +	list_add(&data->node, pmalloc_list);
> +	if (pmalloc_list == &pmalloc_final_list)
> +		data->pool_kobject = pmalloc_connect(data);
> +	mutex_unlock(&pmalloc_mutex);
> +	return pool;
> +
> +data_alloc_err:
> +	kfree(pool_name);
> +name_alloc_err:
> +same_name_err:
> +	mutex_unlock(&pmalloc_mutex);
> +	gen_pool_destroy(pool);
> +	return NULL;
> +}
> +
> +static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(!req_size || !pool))
same unlikely() check problem mentioned before.
> +		return -1;
Can we use an errno value instead for better diagnosibility?
> +
> +	data = pool->data;
> +
> +	if (data == NULL)
> +		return -1;
Same here (ENOMEM or ENXIO comes to mind).
> +
> +	if (unlikely(data->protected)) {
> +		WARN_ON(1);
Maybe re-write this with the check inside WARN_ON()?
> +		return -1;

Same here, how about a different errno value for this case?
> +	}
> +	return 0;
> +}
> +
> +
> +static inline bool chunk_tagging(void *chunk, bool tag)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +
> +	if (!is_vmalloc_addr(chunk))
> +		return false;
> +
> +	page = vmalloc_to_page(chunk);
> +	if (unlikely(!page))
> +		return false;
> +
> +	area = page->area;
> +	if (tag)
> +		area->flags |= VM_PMALLOC;
> +	else
> +		area->flags &= ~VM_PMALLOC;
> +	return true;
> +}
> +
> +
> +static inline bool tag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, true);
> +}
> +
> +
> +static inline bool untag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, false);
> +}
> +
> +enum {
> +	INVALID_PMALLOC_OBJECT = -1,
> +	NOT_PMALLOC_OBJECT = 0,
> +	VALID_PMALLOC_OBJECT = 1,
> +};
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +	unsigned long area_start;
> +	unsigned long area_end;
> +	unsigned long object_start;
> +	unsigned long object_end;
> +
> +
> +	/*
> +	 * is_pmalloc_object gets called pretty late, so chances are high
> +	 * that the object is indeed of vmalloc type
> +	 */
> +	if (unlikely(!is_vmalloc_addr(ptr)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	page = vmalloc_to_page(ptr);
> +	if (unlikely(!page))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area = page->area;
> +
> +	if (likely(!(area->flags & VM_PMALLOC)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area_start = (unsigned long)area->addr;
> +	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
> +	object_start = (unsigned long)ptr;
> +	object_end = object_start + n - 1;
> +
> +	if (likely((area_start <= object_start) &&
> +		   (object_end <= area_end)))
> +		return VALID_PMALLOC_OBJECT;
> +	else
> +		return INVALID_PMALLOC_OBJECT;
> +}
> +
> +
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +
> +	if (check_alloc_params(pool, size))
> +		return false;
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(chunk == NULL))
> +		return false;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error != 0))
> +		goto abort;
> +
> +	return true;
> +abort:
> +	vfree_atomic(chunk);
> +	return false;
> +
> +}
> +
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned long retval;
> +
> +	if (check_alloc_params(pool, size))
> +		return NULL;
> +
> +retry_alloc_from_pool:
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval)
> +		goto return_allocation;
> +
> +	if (unlikely((gfp & __GFP_ATOMIC))) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(!chunk)) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +	if (unlikely(!tag_chunk(chunk)))
> +		goto free;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error))
> +		goto abort;
> +
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval) {
> +return_allocation:
> +		*(size_t *)retval = size;
> +		if (gfp & __GFP_ZERO)
> +			memset((void *)retval, 0, size);
> +		return (void *)retval;
> +	}
> +	/*
> +	 * Here there is no test for __GFP_NO_FAIL because, in case of
> +	 * concurrent allocation, one thread might add a chunk to the
> +	 * pool and this memory could be allocated by another thread,
> +	 * before the first thread gets a chance to use it.
> +	 * As long as vmalloc succeeds, it's ok to retry.
> +	 */
> +	goto retry_alloc_from_pool;
> +abort:
> +	untag_chunk(chunk);
> +free:
> +	vfree_atomic(chunk);
> +	return NULL;
> +}
> +
> +static void pmalloc_chunk_set_protection(struct gen_pool *pool,
> +
> +					 struct gen_pool_chunk *chunk,
> +					 void *data)
> +{
> +	const bool *flag = data;
> +	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
> +	unsigned long pages = chunk_size / PAGE_SIZE;
> +
> +	BUG_ON(chunk_size & (PAGE_SIZE - 1));
Re-think WARN_ON() for BUG_ON()?  And also check chunk as well, as it's 
being used below?
> +
> +	if (*flag)
> +		set_memory_ro(chunk->start_addr, pages);
> +	else
> +		set_memory_rw(chunk->start_addr, pages);
> +}
> +
> +static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
> +{
> +	struct pmalloc_data *data;
> +	struct gen_pool_chunk *chunk;
> +
> +	if (unlikely(!pool))
> +		return -EINVAL;
This is example of what I'd perfer seeing in check_alloc_params().
> +
> +	data = pool->data;
> +
> +	if (unlikely(!data))
> +		return -EINVAL;
ENXIO or EIO or ENOMEM sound better?
> +
> +	if (unlikely(data->protected == protection)) {
> +		WARN_ON(1);
Better to put the check inside WARN_ON, me thinks...
> +		return 0;
> +	}
> +
> +	data->protected = protection;
> +	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
> +		pmalloc_chunk_set_protection(pool, chunk, &protection);
> +	return 0;
> +}
> +
> +int pmalloc_protect_pool(struct gen_pool *pool)
> +{
> +	return pmalloc_pool_set_protection(pool, true);
Is pool == NULL being checked somewhere, similar to previous functions 
in this patch?
> +}
> +
> +
> +static void pmalloc_chunk_free(struct gen_pool *pool,
> +			       struct gen_pool_chunk *chunk, void *data)
> +{
Wat is 'data' being used for? Looks unused.  Should parameters be 
checked, like other ones?
> +	untag_chunk(chunk);
> +	gen_pool_flush_chunk(pool, chunk);
> +	vfree_atomic((void *)chunk->start_addr);
> +}
> +
> +
> +int pmalloc_destroy_pool(struct gen_pool *pool)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(pool == NULL))
> +		return -EINVAL;
> +
> +	data = pool->data;
> +
> +	if (unlikely(data == NULL))
> +		return -EINVAL;

I'd use a different errno value since you already used it for pool.
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_del(&data->node);
> +	mutex_unlock(&pmalloc_mutex);
> +
> +	if (likely(data->pool_kobject))
> +		pmalloc_disconnect(data, data->pool_kobject);
> +
> +	pmalloc_pool_set_protection(pool, false);
> +	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
> +	gen_pool_destroy(pool);
> +	kfree(data);
Does data need to be set to NULL in this case, as data is a member of 
pool (pool->data)?  I'm worried about dangling pointer scenarios which 
probably isn't good for security modules?
> +	return 0;
> +}
> +
> +/*
> + * When the sysfs is ready to receive registrations, connect all the
> + * pools previously created. Also enable further pools to be connected
> + * right away.
> + */
> +static int __init pmalloc_late_init(void)
> +{
> +	struct pmalloc_data *data, *n;
> +
> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
> +
> +	mutex_lock(&pmalloc_mutex);
> +	pmalloc_list = &pmalloc_final_list;
> +
> +	if (likely(pmalloc_kobject != NULL)) {
> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
> +			list_move(&data->node, &pmalloc_final_list);
> +			pmalloc_connect(data);
> +		}
> +	}
It would be nice to have the init() return an error value in case of 
failure.

Thanks,
Jay

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] Documentation for Pmalloc
  2018-02-23 14:48 ` [PATCH 7/7] Documentation for Pmalloc Igor Stoppa
@ 2018-02-24  0:26   ` J Freyensee
  2018-02-26 15:39     ` Igor Stoppa
  0 siblings, 1 reply; 32+ messages in thread
From: J Freyensee @ 2018-02-24  0:26 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 2/23/18 6:48 AM, Igor Stoppa wrote:
> Detailed documentation about the protectable memory allocator.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>   Documentation/core-api/index.rst   |   1 +
>   Documentation/core-api/pmalloc.rst | 114 +++++++++++++++++++++++++++++++++++++
>   2 files changed, 115 insertions(+)
>   create mode 100644 Documentation/core-api/pmalloc.rst
>
> diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
> index c670a8031786..8f5de42d6571 100644
> --- a/Documentation/core-api/index.rst
> +++ b/Documentation/core-api/index.rst
> @@ -25,6 +25,7 @@ Core utilities
>      genalloc
>      errseq
>      printk-formats
> +   pmalloc
>   
>   Interfaces for kernel debugging
>   ===============================
> diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
> new file mode 100644
> index 000000000000..d9725870444e
> --- /dev/null
> +++ b/Documentation/core-api/pmalloc.rst
> @@ -0,0 +1,114 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Protectable memory allocator
> +============================
> +
> +Purpose
> +-------
> +
> +The pmalloc library is meant to provide R/O status to data that, for some
> +reason, could neither be declared as constant, nor could it take advantage
> +of the qualifier __ro_after_init, but is write-once and read-only in spirit.
> +It protects data from both accidental and malicious overwrites.
> +
> +Example: A policy that is loaded from userspace.
> +
> +
> +Concept
> +-------
> +
> +pmalloc builds on top of genalloc, using the same concept of memory pools.
> +
> +The value added by pmalloc is that now the memory contained in a pool can
> +become R/O, for the rest of the life of the pool.
> +
> +Different kernel drivers and threads can use different pools, for finer
> +control of what becomes R/O and when. And for improved lockless concurrency.
> +
> +
> +Caveats
> +-------
> +
> +- Memory freed while a pool is not yet protected will be reused.
> +
> +- Once a pool is protected, it's not possible to allocate any more memory
> +  from it.
> +
> +- Memory "freed" from a protected pool indicates that such memory is not
> +  in use anymore by the requester; however, it will not become available
> +  for further use, until the pool is destroyed.
> +
> +- Before destroying a pool, all the memory allocated from it must be
> +  released.

Is that true?  pmalloc_destroy_pool() has:

.
.
+    pmalloc_pool_set_protection(pool, false);
+    gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+    gen_pool_destroy(pool);
+    kfree(data);

which to me looks like is the opposite, the data (ie, "memory") is being 
released first, then the pool is destroyed.



> +
> +- pmalloc does not provide locking support with respect to allocating vs
> +  protecting an individual pool, for performance reasons.

What is the recommendation to using locks then, as the computing 
real-world mainly operates in multi-threaded/process world?  Maybe show 
an example of an issue that occur if locks aren't used and give a coding 
example.

> +  It is recommended not to share the same pool between unrelated functions.
> +  Should sharing be a necessity, the user of the shared pool is expected
> +  to implement locking for that pool.
> +
> +- pmalloc uses genalloc to optimize the use of the space it allocates
> +  through vmalloc. Some more TLB entries will be used, however less than
> +  in the case of using vmalloc directly. The exact number depends on the
> +  size of each allocation request and possible slack.
> +
> +- Considering that not much data is supposed to be dynamically allocated
> +  and then marked as read-only, it shouldn't be an issue that the address
> +  range for pmalloc is limited, on 32-bit systems.

Why is 32-bit systems mentioned and not 64-bit?  Is there a problem with 
64-bit here?

Thanks,
Jay

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/7] Protectable Memory
  2018-02-23 14:48 ` [PATCH 4/7] Protectable Memory Igor Stoppa
  2018-02-24  0:10   ` J Freyensee
@ 2018-02-25  2:33   ` kbuild test robot
  1 sibling, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2018-02-25  2:33 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, david, willy, keescook, mhocko, labbott,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening,
	Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2]
[cannot apply to next-20180223]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180225-081601
config: arm-allnoconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x280): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc':
   pmalloc.c:(.text+0x2c4): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
   pmalloc.c:(.text+0x9e): undefined reference to `vfree_atomic'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6071 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] genalloc: track beginning of allocations
  2018-02-23 14:48 ` [PATCH 1/7] genalloc: track beginning of allocations Igor Stoppa
  2018-02-23 22:28   ` J Freyensee
@ 2018-02-25  3:37   ` kbuild test robot
  1 sibling, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2018-02-25  3:37 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, david, willy, keescook, mhocko, labbott,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening,
	Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180223]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180225-081601
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

Note: the linux-review/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180225-081601 HEAD 9c42e8278ff35263f8381461592f6737e4b7b129 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> ERROR: "gen_pool_best_fit" [drivers/tee/tee.ko] undefined!
   ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
   ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50234 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] struct page: add field for vm_struct
  2018-02-23 14:48 ` [PATCH 3/7] struct page: add field for vm_struct Igor Stoppa
@ 2018-02-25  3:38   ` Matthew Wilcox
  2018-02-26 16:37     ` Igor Stoppa
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2018-02-25  3:38 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: david, keescook, mhocko, labbott, linux-security-module,
	linux-mm, linux-kernel, kernel-hardening

On Fri, Feb 23, 2018 at 04:48:03PM +0200, Igor Stoppa wrote:
> @@ -1769,6 +1771,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  
>  	kmemleak_vmalloc(area, size, gfp_mask);
>  
> +	for (i = 0; i < area->nr_pages; i++)
> +		area->pages[i]->area = area;
> +
>  	return addr;
>  
>  fail:

IMO, this is the wrong place to initialise the page->area.  It should be
done in __vmalloc_area_node() like so:

                        area->nr_pages = i;
                        goto fail;
                }
+		page->area = area;
                area->pages[i] = page;
                if (gfpflags_allow_blocking(gfp_mask))
                        cond_resched();

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var
  2018-02-23 14:48 ` [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var Igor Stoppa
@ 2018-02-25  3:46   ` kbuild test robot
  2018-03-06 17:05   ` J Freyensee
  1 sibling, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2018-02-25  3:46 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, david, willy, keescook, mhocko, labbott,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening,
	Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2]
[cannot apply to next-20180223]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180225-081601
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "pmalloc" [drivers/misc/lkdtm.ko] undefined!
>> ERROR: "pmalloc_create_pool" [drivers/misc/lkdtm.ko] undefined!
>> ERROR: "pmalloc_destroy_pool" [drivers/misc/lkdtm.ko] undefined!
>> ERROR: "pmalloc_protect_pool" [drivers/misc/lkdtm.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40858 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] genalloc: track beginning of allocations
  2018-02-23 22:28   ` J Freyensee
@ 2018-02-26 12:09     ` Igor Stoppa
  2018-02-26 17:32       ` J Freyensee
  0 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 12:09 UTC (permalink / raw)
  To: J Freyensee, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening

Hello,
and thanks for the reviews, my replies inlined below.

On 24/02/18 00:28, J Freyensee wrote:
> some code snipping
> .
> .
> .
>> +/**
>> + * get_bitmap_entry() - extracts the specified entry from the bitmap
>> + * @map: pointer to a bitmap
>> + * @entry_index: the index of the desired entry in the bitmap
>> + *
>> + * Return: The requested bitmap.
>> + */
>> +static inline unsigned long get_bitmap_entry(unsigned long *map,
>> +					    int entry_index)
>> +{
> 
> Apologies if this has been mentioned before, but since this function is 
> expecting map to not be NULL, shouldn't something like:
> 
> WARN_ON(map == NULL);
> 
> be used to check the parameters for bad values?

TBH I do not know.
Actually I'd rather ask back: when is it preferred to do (and not do)
parameter sanitation?

I was thinking that doing it at API level is the right balance.
This function, for example, is not part of the API, it's used only
internally in this file.

Is it assuming too much that the function will be used correctly, inside
the module it belongs to?

And even at API level, I'd tend to say that if there are chances that
the data received is corrupted, then it should be sanitized, but otherwise,
why adding overhead?

Unless you expect some form of memory corruption. Is that what you have
in mind?

[...]

>>   static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
>>   {
> Same problem here, always expecting chunk to not but NULL.

What would be the case that makes it not NULL?
There are already tests in place when the memory is allocated.

If I really have to pick a place where to do the test, it's at API
level, where the user of the API might fail to notice that the creation
of a pool failed and try to get memory from a non-existing pool.
That is the only scenario I can think of, where bogus data would be
received.

> 
>>   	return chunk->end_addr - chunk->start_addr + 1;
>>   }
>>   
>> -static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
>> +
>> +/**
>> + * set_bits_ll() - based on value and mask, sets bits at address
>> + * @addr: where to write
>> + * @mask: filter to apply for the bits to alter
>> + * @value: actual configuration of bits to store
>> + *
>> + * Return:
>> + * * 0		- success
>> + * * -EBUSY	- otherwise
>> + */
>> +static int set_bits_ll(unsigned long *addr,
>> +		       unsigned long mask, unsigned long value)
>>   {
>> -	unsigned long val, nval;
>> +	unsigned long nval;
>> +	unsigned long present;
>> +	unsigned long target;
>>   
>>   	nval = *addr;
> 
> Same issue here with addr.

Again, I am more leaning toward believing that the user of the API might
forget to check for errors, and pass a NULL pointer as pool, than to
believe something like this would happen.

This is an address obtained from data managed automatically by the library.

Can you please explain why you think it would be NULL?
I'll skip further similar comment.

[...]

>> +	/*
>> +	 * Prepare for writing the initial part of the allocation, from
>> +	 * starting entry, to the end of the UL bitmap element which
>> +	 * contains it. It might be larger than the actual allocation.
>> +	 */
>> +	start_bit = ENTRIES_TO_BITS(start_entry);
>> +	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
>> +	nbits = ENTRIES_TO_BITS(nentries);
> 
> these statements won't make any sense if start_entry and nentries are 
> negative values, which is possible based on the function definition 
> alter_bitmap_ll().  Am I missing something that it's ok for these 
> parameters to be negative?

This patch is extending the handling of the bitmap, it's not trying to
rewrite genalloc, thus it tries to not alter parts which are unrelated.
Like the type of parameters passed.

What you are suggesting is a further cleanup of genalloc.
I'm not against it, but it's unrelated to this patchset.

Incidentally, nobody really seems to be maintaining genalloc, so I'm
hesitant in adding more changes, when there isn't a dedicated maintainer
to say Yes/No.

>> +	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
>> +	mask = BITMAP_FIRST_WORD_MASK(start_bit);
>> +	/* Mark the beginning of the allocation. */
>> +	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
>> +	index = BITS_DIV_LONGS(start_bit);
>> +
>> +	/*
>> +	 * Writes entries to the bitmap, as long as the reminder is
>> +	 * positive or zero.
>> +	 * Might be skipped if the entries to write do not reach the end
>> +	 * of a bitmap UL unit.
>> +	 */
>> +	while (nbits >= bits_to_write) {
>> +		if (action(map + index, mask, value & mask))
>> +			return BITS_DIV_ENTRIES(nbits);
>> +		nbits -= bits_to_write;
>> +		bits_to_write = BITS_PER_LONG;
>> +		mask = ~0UL;
>> +		value = MASK;
>> +		index++;
>>   	}
>>   
>> +	/* Takes care of the ending part of the entries to mark. */
>> +	if (nbits > 0) {
>> +		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
>> +		bits_to_write = nbits;
>> +		if (action(map + index, mask, value & mask))
>> +			return BITS_DIV_ENTRIES(nbits);
>> +	}
>>   	return 0;
>>   }
>>   
>> +
>>   /**
>> - * gen_pool_create - create a new special memory pool
>> - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
>> - * @nid: node id of the node the pool structure should be allocated on, or -1
>> + * gen_pool_create() - create a new special memory pool
>> + * @min_alloc_order: log base 2 of number of bytes each bitmap entry
>> + *		     represents
>> + * @nid: node id of the node the pool structure should be allocated on,
>> + *	 or -1
>>    *
>> - * Create a new special memory pool that can be used to manage special purpose
>> - * memory not managed by the regular kmalloc/kfree interface.
>> + * Create a new special memory pool that can be used to manage special
>> + * purpose memory not managed by the regular kmalloc/kfree interface.
>> + *
>> + * Return:
>> + * * pointer to the pool	- success
>> + * * NULL			- otherwise
>>    */
>>   struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>>   {
>> @@ -167,7 +364,7 @@ struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>>   EXPORT_SYMBOL(gen_pool_create);
>>   
>>   /**
>> - * gen_pool_add_virt - add a new chunk of special memory to the pool
>> + * gen_pool_add_virt() - add a new chunk of special memory to the pool
>>    * @pool: pool to add new memory chunk to
>>    * @virt: virtual starting address of memory chunk to add to pool
>>    * @phys: physical starting address of memory chunk to add to pool
>> @@ -177,16 +374,20 @@ EXPORT_SYMBOL(gen_pool_create);
>>    *
>>    * Add a new chunk of special memory to the specified pool.
>>    *
>> - * Returns 0 on success or a -ve errno on failure.
>> + * Return:
>> + * * 0		- success
>> + * * -ve errno	- failure
>>    */
>> -int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
>> -		 size_t size, int nid)
>> +int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt,
>> +		      phys_addr_t phys, size_t size, int nid)
>>   {
> WARN_ON(pool == NULL);
> ?
>>   	struct gen_pool_chunk *chunk;
>> -	int nbits = size >> pool->min_alloc_order;
>> -	int nbytes = sizeof(struct gen_pool_chunk) +
>> -				BITS_TO_LONGS(nbits) * sizeof(long);
>> +	int nentries;
>> +	int nbytes;
>>   
>> +	nentries = size >> pool->min_alloc_order;
>> +	nbytes = sizeof(struct gen_pool_chunk) +
>> +		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
>>   	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
>>   	if (unlikely(chunk == NULL))
>>   		return -ENOMEM;
>> @@ -205,11 +406,13 @@ int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phy
>>   EXPORT_SYMBOL(gen_pool_add_virt);
>>   
>>   /**
>> - * gen_pool_virt_to_phys - return the physical address of memory
>> + * gen_pool_virt_to_phys() - return the physical address of memory
>>    * @pool: pool to allocate from
>>    * @addr: starting address of memory
>>    *
>> - * Returns the physical address on success, or -1 on error.
>> + * Return:
>> + * * the physical address	- success
>> + * * \-1			- error
>>    */
>>   phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>>   {
>> @@ -230,7 +433,7 @@ phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>>   EXPORT_SYMBOL(gen_pool_virt_to_phys);
>>   
>>   /**
>> - * gen_pool_destroy - destroy a special memory pool
>> + * gen_pool_destroy() - destroy a special memory pool
>>    * @pool: pool to destroy
>>    *
>>    * Destroy the specified special memory pool. Verifies that there are no
>> @@ -248,7 +451,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>>   		list_del(&chunk->next_chunk);
>>   
>>   		end_bit = chunk_size(chunk) >> order;
>> -		bit = find_next_bit(chunk->bits, end_bit, 0);
>> +		bit = find_next_bit(chunk->entries, end_bit, 0);
>>   		BUG_ON(bit < end_bit);
>>   
>>   		kfree(chunk);
>> @@ -259,7 +462,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>>   EXPORT_SYMBOL(gen_pool_destroy);
>>   
>>   /**
>> - * gen_pool_alloc - allocate special memory from the pool
>> + * gen_pool_alloc() - allocate special memory from the pool
>>    * @pool: pool to allocate from
>>    * @size: number of bytes to allocate from the pool
>>    *
>> @@ -267,6 +470,10 @@ EXPORT_SYMBOL(gen_pool_destroy);
>>    * Uses the pool allocation function (with first-fit algorithm by default).
>>    * Can not be used in NMI handler on architectures without
>>    * NMI-safe cmpxchg implementation.
>> + *
>> + * Return:
>> + * * address of the memory allocated	- success
>> + * * NULL				- error
>>    */
>>   unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>>   {
>> @@ -275,7 +482,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>>   EXPORT_SYMBOL(gen_pool_alloc);
>>   
>>   /**
>> - * gen_pool_alloc_algo - allocate special memory from the pool
>> + * gen_pool_alloc_algo() - allocate special memory from the pool
>>    * @pool: pool to allocate from
>>    * @size: number of bytes to allocate from the pool
>>    * @algo: algorithm passed from caller
>> @@ -285,6 +492,10 @@ EXPORT_SYMBOL(gen_pool_alloc);
>>    * Uses the pool allocation function (with first-fit algorithm by default).
>>    * Can not be used in NMI handler on architectures without
>>    * NMI-safe cmpxchg implementation.
>> + *
>> + * Return:
>> + * * address of the memory allocated	- success
>> + * * NULL				- error
>>    */
>>   unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>   		genpool_algo_t algo, void *data)
>> @@ -292,7 +503,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>   	struct gen_pool_chunk *chunk;
>>   	unsigned long addr = 0;
>>   	int order = pool->min_alloc_order;
>> -	int nbits, start_bit, end_bit, remain;
>> +	int nentries, start_entry, end_entry, remain;
> Be nicer to use "unsigned int", but it's not clear from this diff that 
> this could work with other existing code.
> 
>>   
>>   #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>   	BUG_ON(in_nmi());
>> @@ -301,29 +512,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>   	if (size == 0)
>>   		return 0;
>>   
>> -	nbits = (size + (1UL << order) - 1) >> order;
>> +	nentries = mem_to_units(size, order);
>>   	rcu_read_lock();
>>   	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>   		if (size > atomic_long_read(&chunk->avail))
>>   			continue;
>>   
>> -		start_bit = 0;
>> -		end_bit = chunk_size(chunk) >> order;
>> +		start_entry = 0;
>> +		end_entry = chunk_size(chunk) >> order;
>>   retry:
>> -		start_bit = algo(chunk->bits, end_bit, start_bit,
>> -				 nbits, data, pool);
>> -		if (start_bit >= end_bit)
>> +		start_entry = algo(chunk->entries, end_entry, start_entry,
>> +				  nentries, data, pool);
>> +		if (start_entry >= end_entry)
>>   			continue;
>> -		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
>> +		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
>> +					 start_entry, nentries);
>>   		if (remain) {
>> -			remain = bitmap_clear_ll(chunk->bits, start_bit,
>> -						 nbits - remain);
>> -			BUG_ON(remain);
>> +			remain = alter_bitmap_ll(CLEAR_BITS,
>> +						 chunk->entries,
>> +						 start_entry,
>> +						 nentries - remain);
>>   			goto retry;
>>   		}
>>   
>> -		addr = chunk->start_addr + ((unsigned long)start_bit << order);
>> -		size = nbits << order;
>> +		addr = chunk->start_addr +
>> +			((unsigned long)start_entry << order);
>> +		size = nentries << order;
>>   		atomic_long_sub(size, &chunk->avail);
>>   		break;
>>   	}
>> @@ -333,7 +547,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>   EXPORT_SYMBOL(gen_pool_alloc_algo);
>>   
>>   /**
>> - * gen_pool_dma_alloc - allocate special memory from the pool for DMA usage
>> + * gen_pool_dma_alloc() - allocate special memory from the pool for DMA usage
>>    * @pool: pool to allocate from
>>    * @size: number of bytes to allocate from the pool
>>    * @dma: dma-view physical address return value.  Use NULL if unneeded.
>> @@ -342,6 +556,10 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
>>    * Uses the pool allocation function (with first-fit algorithm by default).
>>    * Can not be used in NMI handler on architectures without
>>    * NMI-safe cmpxchg implementation.
>> + *
>> + * Return:
>> + * * address of the memory allocated	- success
>> + * * NULL				- error
>>    */
>>   void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>>   {
>> @@ -362,10 +580,10 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>>   EXPORT_SYMBOL(gen_pool_dma_alloc);
>>   
>>   /**
>> - * gen_pool_free - free allocated special memory back to the pool
>> + * gen_pool_free() - free allocated special memory back to the pool
>>    * @pool: pool to free to
>>    * @addr: starting address of memory to free back to pool
>> - * @size: size in bytes of memory to free
>> + * @size: size in bytes of memory to free or 0, for auto-detection
>>    *
>>    * Free previously allocated special memory back to the specified
>>    * pool.  Can not be used in NMI handler on architectures without
>> @@ -375,22 +593,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>>   {
>>   	struct gen_pool_chunk *chunk;
>>   	int order = pool->min_alloc_order;
>> -	int start_bit, nbits, remain;
>> +	int start_entry, remaining_entries, nentries, remain;
>> +	int boundary;
>>   
>>   #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>   	BUG_ON(in_nmi());
>>   #endif
>>   
>> -	nbits = (size + (1UL << order) - 1) >> order;
>>   	rcu_read_lock();
>>   	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>   		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
>>   			BUG_ON(addr + size - 1 > chunk->end_addr);
>> -			start_bit = (addr - chunk->start_addr) >> order;
>> -			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
>> +			start_entry = (addr - chunk->start_addr) >> order;
>> +			remaining_entries = (chunk->end_addr - addr) >> order;
>> +			boundary = get_boundary(chunk->entries, start_entry,
>> +						remaining_entries);
>> +			BUG_ON(boundary < 0);
> 
> Do you really want to use BUG_ON()?  I've thought twice about using 
> BUG_ON() based on Linus's wrath with BUG_ON() code causing an issue with 
> the 4.8 release:
> 
> https://lkml.org/lkml/2016/10/4/1
> 
> Hence why I've been giving WARN_ON() suggestions throughout this review.

Oh, I thought I had added explanations here too, but I did it only for
pmalloc :-(

Thanks for spotting this.

To answer the question, do I really want to do it?
Maybe not here, I wrote this before introducing the self test.
But in the self test probably yes. The self test is optional and the
idea is to prevent cases where there could be corruption of permanent
storage.

I have read Linus' comments, but what is still not clear to me is:
is there _any_ case where BUG_ON() is acceptable?

I'd think that avoiding potential data corruption would be  ok, but if
the answer is that, no, there is no acceptable case for using it, then
why is it still available?


--
igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/7] genalloc: selftest
  2018-02-23 22:42   ` J Freyensee
@ 2018-02-26 12:11     ` Igor Stoppa
  2018-02-26 17:46       ` J Freyensee
  0 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 12:11 UTC (permalink / raw)
  To: J Freyensee, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 24/02/18 00:42, J Freyensee wrote:
> 
>> +	locations[action->location] = gen_pool_alloc(pool, action->size);
>> +	BUG_ON(!locations[action->location]);
> 
> Again, I'd think it through if you really want to use BUG_ON() or not:
> 
> https://lwn.net/Articles/13183/
> https://lkml.org/lkml/2016/10/4/1

Is it acceptable to display only a WARNing, in case of risking damaging
a mounted filesystem?

--
igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/7] Protectable Memory
  2018-02-24  0:10   ` J Freyensee
@ 2018-02-26 14:28     ` Igor Stoppa
  2018-02-26 18:25       ` J Freyensee
  0 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 14:28 UTC (permalink / raw)
  To: J Freyensee, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 24/02/18 02:10, J Freyensee wrote:
> On 2/23/18 6:48 AM, Igor Stoppa wrote:

[...]

>> +struct gen_pool *pmalloc_create_pool(const char *name,
>> +					 int min_alloc_order);
> 
> Same comments as earlier.  If this is new API with new code being 
> introduced into the kernel, can the variables be declared to avoid weird 
> problems?  Like min_alloc_order being a negative value makes little 
> sense (based on the description here), so can it be declared as size_t 
> or unsigned int?

in this case, yes, but I see it as different case

[...]

>> + * * NULL				- either no memory available or
>> + *					  pool already read-only
>> + */
> 
> I don't know if an errno value is being set, but setting a variable 
> somewhere using EROFS or ENOMEM would more accurate diagnose those two 
> NULL conditions.

I expect that the latter is highly unlikely to happen, because the user
of the API controls if the pool is locked or not.

I think it shouldn't come as a surprise to the one who locked the pool,
if the pool is locked.

If the pool is used with concurrent users, attention should be paid to
not lock it before ever user is happy (this is where the user of the API
has to provide own locking.)

Since the information if the pool is already protected is actually
present in the pmalloc_data structure associated with the pool, I was
tempted to make it available through the API, but that seemed wrong.

The user of the API should be very aware of the state of the pool, since
the user is the one who sets it. Why would it have to be read back?

>> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
>> +
>> +
>> +/**
>> + * pzalloc() - zero-initialized version of pmalloc
>> + * @pool: handle to the pool to be used for memory allocation
>> + * @size: amount of memory (in bytes) requested
>> + * @gfp: flags for page allocation
>> + *
>> + * Executes pmalloc, initializing the memory requested to 0,
>> + * before returning the pointer to it.
>> + *
>> + * Return:
>> + * * pointer to the memory requested	- success
>> + * * NULL				- either no memory available or
>> + *					  pool already read-only
>> + */
> Same comment here, though that inline function below looks highly 
> optimized...

The same applies here.
I'm not very fond of the idea of returning the status elsewhere and in a
way that is not intrinsically connected with the action that has
determined the change of state.

AFAIK *alloc functions return either the memory requested or NULL.
I wonder how realistic this case is.

[...]

>> +	if (unlikely(!(pool && n && size)))
> Has this code been run through sparse?

I use "make C=1 W=1"

> I know one thing sparse looks at 
> is if NULL is being treated like a 0, and sparse does check cases when 0 
> is being used in place for NULL for pointer checks, and I'm wondering if 
> that line of code would pass.

It's a logical AND: wouldn't NULL translate to false, rather 0?
I can add an explicit check against NULL, it's probably more readable
too, but I don't think that the current construct treats NULL as 0.

[...]

>> +	if (unlikely(pool == NULL || s == NULL))
> Here, the right check is being done, so at the very least, I would make 
> the last line I commented on the same as this one for code continuity.

ok

[...]

>> +	if (unlikely(!(pool && chunk)))
> 
> Please make this check the same as the last line I commented on, 
> especially since it's the same struct being checked.

yes

[...]

>> +	if (!name) {
>> +		WARN_ON(1);
> ??  Maybe the name check should be in WARN_ON()?

true :-(

[...]

>> +	if (unlikely(!req_size || !pool))
> same unlikely() check problem mentioned before.
>> +		return -1;
> Can we use an errno value instead for better diagnosibility?
>> +
>> +	data = pool->data;
>> +
>> +	if (data == NULL)
>> +		return -1;
> Same here (ENOMEM or ENXIO comes to mind).
>> +
>> +	if (unlikely(data->protected)) {
>> +		WARN_ON(1);
> Maybe re-write this with the check inside WARN_ON()?
>> +		return -1;
> 
> Same here, how about a different errno value for this case?

yes, to all of the above

[...]

>> +static void pmalloc_chunk_set_protection(struct gen_pool *pool,
>> +
>> +					 struct gen_pool_chunk *chunk,
>> +					 void *data)
>> +{
>> +	const bool *flag = data;
>> +	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
>> +	unsigned long pages = chunk_size / PAGE_SIZE;
>> +
>> +	BUG_ON(chunk_size & (PAGE_SIZE - 1));
> Re-think WARN_ON() for BUG_ON()?  And also check chunk as well, as it's 
> being used below?

ok

>> +
>> +	if (*flag)
>> +		set_memory_ro(chunk->start_addr, pages);
>> +	else
>> +		set_memory_rw(chunk->start_addr, pages);
>> +}
>> +
>> +static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
>> +{
>> +	struct pmalloc_data *data;
>> +	struct gen_pool_chunk *chunk;
>> +
>> +	if (unlikely(!pool))
>> +		return -EINVAL;
> This is example of what I'd perfer seeing in check_alloc_params().

yes

>> +
>> +	data = pool->data;
>> +
>> +	if (unlikely(!data))
>> +		return -EINVAL;
> ENXIO or EIO or ENOMEM sound better?

Why? At least based on he description from errno-base.h, EINVAL seemed
the most appropriate:

#define	EIO		 5	/* I/O error */
#define	ENXIO		 6	/* No such device or address */
#define	ENOMEM		12	/* Out of memory */

#define	EINVAL		22	/* Invalid argument */

If I was really pressed to change it, I'd rather pick:

#define	EFAULT		14	/* Bad address */


>> +
>> +	if (unlikely(data->protected == protection)) {
>> +		WARN_ON(1);
> Better to put the check inside WARN_ON, me thinks...

yes, I have no idea why I wrote it like that  :-(

>> +		return 0;
>> +	}
>> +
>> +	data->protected = protection;
>> +	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
>> +		pmalloc_chunk_set_protection(pool, chunk, &protection);
>> +	return 0;
>> +}
>> +
>> +int pmalloc_protect_pool(struct gen_pool *pool)
>> +{
>> +	return pmalloc_pool_set_protection(pool, true);
> Is pool == NULL being checked somewhere, similar to previous functions 
> in this patch?

right.

>> +}
>> +
>> +
>> +static void pmalloc_chunk_free(struct gen_pool *pool,
>> +			       struct gen_pool_chunk *chunk, void *data)
>> +{
> Wat is 'data' being used for? Looks unused.  Should parameters be 
> checked, like other ones?


This is the iterator that is passed to genalloc
genalloc defines the format for the iterator, because it *will* want to
pass the pointer to the opaque data structure.

>> +	untag_chunk(chunk);
>> +	gen_pool_flush_chunk(pool, chunk);
>> +	vfree_atomic((void *)chunk->start_addr);
>> +}
>> +
>> +
>> +int pmalloc_destroy_pool(struct gen_pool *pool)
>> +{
>> +	struct pmalloc_data *data;
>> +
>> +	if (unlikely(pool == NULL))
>> +		return -EINVAL;
>> +
>> +	data = pool->data;
>> +
>> +	if (unlikely(data == NULL))
>> +		return -EINVAL;
> 
> I'd use a different errno value since you already used it for pool.

Thinking more about this, how about collapsing them?
I do need to check for the value of data, before I dereference it,
however the causes are very different:

* wrong pool pointer - definitely possible
* corrupted data structure (data member overwritten) - highly unlikely

>> +
>> +	mutex_lock(&pmalloc_mutex);
>> +	list_del(&data->node);
>> +	mutex_unlock(&pmalloc_mutex);
>> +
>> +	if (likely(data->pool_kobject))
>> +		pmalloc_disconnect(data, data->pool_kobject);
>> +
>> +	pmalloc_pool_set_protection(pool, false);
>> +	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
>> +	gen_pool_destroy(pool);
>> +	kfree(data);
> Does data need to be set to NULL in this case, as data is a member of 
> pool (pool->data)?  I'm worried about dangling pointer scenarios which 
> probably isn't good for security modules?

The pool was destroyed in the previous step.
There is nothing left that can be set to NULL.
Unless we are expecting an use-after-free of the pool structure.
But everything it was referring to is gone as well.

If we really want to go after this case, then basically everything that
has been allocated should be poisoned before being freed.

Isn't it a bit too much?

>> +	return 0;
>> +}
>> +
>> +/*
>> + * When the sysfs is ready to receive registrations, connect all the
>> + * pools previously created. Also enable further pools to be connected
>> + * right away.
>> + */
>> +static int __init pmalloc_late_init(void)
>> +{
>> +	struct pmalloc_data *data, *n;
>> +
>> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
>> +
>> +	mutex_lock(&pmalloc_mutex);
>> +	pmalloc_list = &pmalloc_final_list;
>> +
>> +	if (likely(pmalloc_kobject != NULL)) {
>> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
>> +			list_move(&data->node, &pmalloc_final_list);
>> +			pmalloc_connect(data);
>> +		}
>> +	}
> It would be nice to have the init() return an error value in case of 
> failure.

ok

--
igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] Documentation for Pmalloc
  2018-02-24  0:26   ` J Freyensee
@ 2018-02-26 15:39     ` Igor Stoppa
  2018-02-26 18:32       ` J Freyensee
  0 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 15:39 UTC (permalink / raw)
  To: J Freyensee, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 24/02/18 02:26, J Freyensee wrote:
> 
> 
> On 2/23/18 6:48 AM, Igor Stoppa wrote:

[...]

>> +- Before destroying a pool, all the memory allocated from it must be
>> +  released.
> 
> Is that true?  pmalloc_destroy_pool() has:
> 
> .
> .
> +    pmalloc_pool_set_protection(pool, false);
> +    gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
> +    gen_pool_destroy(pool);
> +    kfree(data);
> 
> which to me looks like is the opposite, the data (ie, "memory") is being 
> released first, then the pool is destroyed.

well, this is embarrassing ... yes I had this prototype code, because I
was wondering if it wouldn't make more sense to tear down the pool as
fast as possible. It slipped in, apparently.

I'm actually tempted to leave it in and fix the comment.

[...]

>> +
>> +- pmalloc does not provide locking support with respect to allocating vs
>> +  protecting an individual pool, for performance reasons.
> 
> What is the recommendation to using locks then, as the computing 
> real-world mainly operates in multi-threaded/process world? 

How common are multi-threaded allocations of write-once memory?
Here we are talking exclusively about the part of the memory life-cycle
where it is allocated (from pmalloc).

> Maybe show 
> an example of an issue that occur if locks aren't used and give a coding 
> example.

An example of how to use a mutex to access a shared resource? :-O

This part below, under your question, was supposed to be the answer :-(

>> +  It is recommended not to share the same pool between unrelated functions.
>> +  Should sharing be a necessity, the user of the shared pool is expected
>> +  to implement locking for that pool.

[...]

>> +- pmalloc uses genalloc to optimize the use of the space it allocates
>> +  through vmalloc. Some more TLB entries will be used, however less than
>> +  in the case of using vmalloc directly. The exact number depends on the
>> +  size of each allocation request and possible slack.
>> +
>> +- Considering that not much data is supposed to be dynamically allocated
>> +  and then marked as read-only, it shouldn't be an issue that the address
>> +  range for pmalloc is limited, on 32-bit systems.
> 
> Why is 32-bit systems mentioned and not 64-bit?

Because, as written, on 32 bit system the vmalloc range is relatively
small, so one might wonder if there are enough addresses.

>  Is there a problem with 64-bit here?

Quite the opposite.
I thought it was clear, but obviously it isn't, I'll reword this.

-igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] struct page: add field for vm_struct
  2018-02-25  3:38   ` Matthew Wilcox
@ 2018-02-26 16:37     ` Igor Stoppa
  0 siblings, 0 replies; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 16:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: david, keescook, mhocko, labbott, linux-security-module,
	linux-mm, linux-kernel, kernel-hardening



On 25/02/18 05:38, Matthew Wilcox wrote:
> On Fri, Feb 23, 2018 at 04:48:03PM +0200, Igor Stoppa wrote:
>> @@ -1769,6 +1771,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>>  
>>  	kmemleak_vmalloc(area, size, gfp_mask);
>>  
>> +	for (i = 0; i < area->nr_pages; i++)
>> +		area->pages[i]->area = area;
>> +
>>  	return addr;
>>  
>>  fail:
> 
> IMO, this is the wrong place to initialise the page->area.  It should be
> done in __vmalloc_area_node() like so:
> 
>                         area->nr_pages = i;
>                         goto fail;
>                 }
> +		page->area = area;
>                 area->pages[i] = page;
>                 if (gfpflags_allow_blocking(gfp_mask))
>                         cond_resched();
> 


ok

--
igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] genalloc: track beginning of allocations
  2018-02-26 12:09     ` Igor Stoppa
@ 2018-02-26 17:32       ` J Freyensee
  2018-02-26 18:44         ` Igor Stoppa
  0 siblings, 1 reply; 32+ messages in thread
From: J Freyensee @ 2018-02-26 17:32 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening

My replies also inlined.

On 2/26/18 4:09 AM, Igor Stoppa wrote:
> Hello,
> and thanks for the reviews, my replies inlined below.
>
> On 24/02/18 00:28, J Freyensee wrote:
>> some code snipping
>> .
>> .
>> .
>>> +/**
>>> + * get_bitmap_entry() - extracts the specified entry from the bitmap
>>> + * @map: pointer to a bitmap
>>> + * @entry_index: the index of the desired entry in the bitmap
>>> + *
>>> + * Return: The requested bitmap.
>>> + */
>>> +static inline unsigned long get_bitmap_entry(unsigned long *map,
>>> +					    int entry_index)
>>> +{
>> Apologies if this has been mentioned before, but since this function is
>> expecting map to not be NULL, shouldn't something like:
>>
>> WARN_ON(map == NULL);
>>
>> be used to check the parameters for bad values?
> TBH I do not know.
> Actually I'd rather ask back: when is it preferred to do (and not do)
> parameter sanitation?
>
> I was thinking that doing it at API level is the right balance.
> This function, for example, is not part of the API, it's used only
> internally in this file.


I agree.  But some of the code looks API'like to me, partly because of 
all the function header documentation, which thank you for that, but I 
wasn't sure where you drew your "API line" where the checks would be.

>
> Is it assuming too much that the function will be used correctly, inside
> the module it belongs to?
>
> And even at API level, I'd tend to say that if there are chances that
> the data received is corrupted, then it should be sanitized, but otherwise,
> why adding overhead?

It's good secure coding practice to check your parameters, you are 
adding code to a security module after all ;-).

If it's brand-new code entering the kernel, it's better to err on the 
side of having the extra checks and have a maintainer tell you to remove 
it than the other way around- especially since this code is part of the 
LSM solution.  What's worse- a tad bit of overhead catching a 
corner-case scenario that can be more easily fixed or something not 
caught that makes the kernel unstable?

>
> Unless you expect some form of memory corruption. Is that what you have
> in mind?
>
> [...]
>
>>>    static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
>>>    {
>> Same problem here, always expecting chunk to not but NULL.
> What would be the case that makes it not NULL?
> There are already tests in place when the memory is allocated.
>
> If I really have to pick a place where to do the test, it's at API
> level,

I agree, and if that is the case, I'm fine.

> where the user of the API might fail to notice that the creation
> of a pool failed and try to get memory from a non-existing pool.
> That is the only scenario I can think of, where bogus data would be
> received.
>
>>>    	return chunk->end_addr - chunk->start_addr + 1;
>>>    }
>>>    
>>> -static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
>>> +
>>> +/**
>>> + * set_bits_ll() - based on value and mask, sets bits at address
>>> + * @addr: where to write
>>> + * @mask: filter to apply for the bits to alter
>>> + * @value: actual configuration of bits to store
>>> + *
>>> + * Return:
>>> + * * 0		- success
>>> + * * -EBUSY	- otherwise
>>> + */
>>> +static int set_bits_ll(unsigned long *addr,
>>> +		       unsigned long mask, unsigned long value)
>>>    {
>>> -	unsigned long val, nval;
>>> +	unsigned long nval;
>>> +	unsigned long present;
>>> +	unsigned long target;
>>>    
>>>    	nval = *addr;
>> Same issue here with addr.
> Again, I am more leaning toward believing that the user of the API might
> forget to check for errors,

Same in agreement, so if that is the case, I'm ok.  It was a little hard 
to tell what is exactly your API is.  I'm used to reviewing kernel code 
where important API-like functions were heavily documented, and inner 
routines were not...so seeing the function documentation (which is a 
good thing :-)) made me think this was some sort of new API code I was 
looking at.

> and pass a NULL pointer as pool, than to
> believe something like this would happen.
>
> This is an address obtained from data managed automatically by the library.
>
> Can you please explain why you think it would be NULL?

Why would it be NULL?  I don't know, I'm not intimately familiar with 
the code; but I default to implementing code defensively.  But I'll turn 
the question around on you- why would it NOT be NULL?  Are you sure this 
will never be NULL?  Are you going to trust the library that it always 
provides a good address?  You should add to your function header 
documentation why addr will NOT be NULL.


> I'll skip further similar comment.
>
> [...]
>
>>> +	/*
>>> +	 * Prepare for writing the initial part of the allocation, from
>>> +	 * starting entry, to the end of the UL bitmap element which
>>> +	 * contains it. It might be larger than the actual allocation.
>>> +	 */
>>> +	start_bit = ENTRIES_TO_BITS(start_entry);
>>> +	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
>>> +	nbits = ENTRIES_TO_BITS(nentries);
>> these statements won't make any sense if start_entry and nentries are
>> negative values, which is possible based on the function definition
>> alter_bitmap_ll().  Am I missing something that it's ok for these
>> parameters to be negative?
> This patch is extending the handling of the bitmap, it's not trying to
> rewrite genalloc, thus it tries to not alter parts which are unrelated.
> Like the type of parameters passed.
>
> What you are suggesting is a further cleanup of genalloc.
> I'm not against it, but it's unrelated to this patchset.

OK, very reasonable.  Then I would think this would be a case to add a 
check for negative values in the function parameters start_entry and 
nentries as it's possible (though maybe not realistic) to have negative 
values supplied, especially if there is currently no active maintainer 
for genalloc().  Since you are fitting new code to genalloc's behavior 
and this is a security module, I'll err on the side of checking the 
parameters for bad values, or document in your function header comments 
why it is expected for these parameters to never have negative values.
>
> Incidentally, nobody really seems to be maintaining genalloc, so I'm
> hesitant in adding more changes, when there isn't a dedicated maintainer
> to say Yes/No.

Very understandable.  I think this is another reason it's good to have 
more checks around your new function parameters- be good to be sure your 
code behaves to the expectations of genalloc().

>
>>> +	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
>>> +	mask = BITMAP_FIRST_WORD_MASK(start_bit);
>>> +	/* Mark the beginning of the allocation. */
>>> +	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
>>> +	index = BITS_DIV_LONGS(start_bit);
>>> +
>>> +	/*
>>> +	 * Writes entries to the bitmap, as long as the reminder is
>>> +	 * positive or zero.
>>> +	 * Might be skipped if the entries to write do not reach the end
>>> +	 * of a bitmap UL unit.
>>> +	 */
>>> +	while (nbits >= bits_to_write) {
>>> +		if (action(map + index, mask, value & mask))
>>> +			return BITS_DIV_ENTRIES(nbits);
>>> +		nbits -= bits_to_write;
>>> +		bits_to_write = BITS_PER_LONG;
>>> +		mask = ~0UL;
>>> +		value = MASK;
>>> +		index++;
>>>    	}
>>>    
>>> +	/* Takes care of the ending part of the entries to mark. */
>>> +	if (nbits > 0) {
>>> +		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
>>> +		bits_to_write = nbits;
>>> +		if (action(map + index, mask, value & mask))
>>> +			return BITS_DIV_ENTRIES(nbits);
>>> +	}
>>>    	return 0;
>>>    }
>>>    
>>> +
>>>    /**
>>> - * gen_pool_create - create a new special memory pool
>>> - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
>>> - * @nid: node id of the node the pool structure should be allocated on, or -1
>>> + * gen_pool_create() - create a new special memory pool
>>> + * @min_alloc_order: log base 2 of number of bytes each bitmap entry
>>> + *		     represents
>>> + * @nid: node id of the node the pool structure should be allocated on,
>>> + *	 or -1
>>>     *
>>> - * Create a new special memory pool that can be used to manage special purpose
>>> - * memory not managed by the regular kmalloc/kfree interface.
>>> + * Create a new special memory pool that can be used to manage special
>>> + * purpose memory not managed by the regular kmalloc/kfree interface.
>>> + *
>>> + * Return:
>>> + * * pointer to the pool	- success
>>> + * * NULL			- otherwise
>>>     */
>>>    struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>>>    {
>>> @@ -167,7 +364,7 @@ struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>>>    EXPORT_SYMBOL(gen_pool_create);
>>>    
>>>    /**
>>> - * gen_pool_add_virt - add a new chunk of special memory to the pool
>>> + * gen_pool_add_virt() - add a new chunk of special memory to the pool
>>>     * @pool: pool to add new memory chunk to
>>>     * @virt: virtual starting address of memory chunk to add to pool
>>>     * @phys: physical starting address of memory chunk to add to pool
>>> @@ -177,16 +374,20 @@ EXPORT_SYMBOL(gen_pool_create);
>>>     *
>>>     * Add a new chunk of special memory to the specified pool.
>>>     *
>>> - * Returns 0 on success or a -ve errno on failure.
>>> + * Return:
>>> + * * 0		- success
>>> + * * -ve errno	- failure
>>>     */
>>> -int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
>>> -		 size_t size, int nid)
>>> +int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt,
>>> +		      phys_addr_t phys, size_t size, int nid)
>>>    {
>> WARN_ON(pool == NULL);
>> ?
>>>    	struct gen_pool_chunk *chunk;
>>> -	int nbits = size >> pool->min_alloc_order;
>>> -	int nbytes = sizeof(struct gen_pool_chunk) +
>>> -				BITS_TO_LONGS(nbits) * sizeof(long);
>>> +	int nentries;
>>> +	int nbytes;
>>>    
>>> +	nentries = size >> pool->min_alloc_order;
>>> +	nbytes = sizeof(struct gen_pool_chunk) +
>>> +		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
>>>    	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
>>>    	if (unlikely(chunk == NULL))
>>>    		return -ENOMEM;
>>> @@ -205,11 +406,13 @@ int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phy
>>>    EXPORT_SYMBOL(gen_pool_add_virt);
>>>    
>>>    /**
>>> - * gen_pool_virt_to_phys - return the physical address of memory
>>> + * gen_pool_virt_to_phys() - return the physical address of memory
>>>     * @pool: pool to allocate from
>>>     * @addr: starting address of memory
>>>     *
>>> - * Returns the physical address on success, or -1 on error.
>>> + * Return:
>>> + * * the physical address	- success
>>> + * * \-1			- error
>>>     */
>>>    phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>>>    {
>>> @@ -230,7 +433,7 @@ phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>>>    EXPORT_SYMBOL(gen_pool_virt_to_phys);
>>>    
>>>    /**
>>> - * gen_pool_destroy - destroy a special memory pool
>>> + * gen_pool_destroy() - destroy a special memory pool
>>>     * @pool: pool to destroy
>>>     *
>>>     * Destroy the specified special memory pool. Verifies that there are no
>>> @@ -248,7 +451,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>>>    		list_del(&chunk->next_chunk);
>>>    
>>>    		end_bit = chunk_size(chunk) >> order;
>>> -		bit = find_next_bit(chunk->bits, end_bit, 0);
>>> +		bit = find_next_bit(chunk->entries, end_bit, 0);
>>>    		BUG_ON(bit < end_bit);
>>>    
>>>    		kfree(chunk);
>>> @@ -259,7 +462,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>>>    EXPORT_SYMBOL(gen_pool_destroy);
>>>    
>>>    /**
>>> - * gen_pool_alloc - allocate special memory from the pool
>>> + * gen_pool_alloc() - allocate special memory from the pool
>>>     * @pool: pool to allocate from
>>>     * @size: number of bytes to allocate from the pool
>>>     *
>>> @@ -267,6 +470,10 @@ EXPORT_SYMBOL(gen_pool_destroy);
>>>     * Uses the pool allocation function (with first-fit algorithm by default).
>>>     * Can not be used in NMI handler on architectures without
>>>     * NMI-safe cmpxchg implementation.
>>> + *
>>> + * Return:
>>> + * * address of the memory allocated	- success
>>> + * * NULL				- error
>>>     */
>>>    unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>>>    {
>>> @@ -275,7 +482,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>>>    EXPORT_SYMBOL(gen_pool_alloc);
>>>    
>>>    /**
>>> - * gen_pool_alloc_algo - allocate special memory from the pool
>>> + * gen_pool_alloc_algo() - allocate special memory from the pool
>>>     * @pool: pool to allocate from
>>>     * @size: number of bytes to allocate from the pool
>>>     * @algo: algorithm passed from caller
>>> @@ -285,6 +492,10 @@ EXPORT_SYMBOL(gen_pool_alloc);
>>>     * Uses the pool allocation function (with first-fit algorithm by default).
>>>     * Can not be used in NMI handler on architectures without
>>>     * NMI-safe cmpxchg implementation.
>>> + *
>>> + * Return:
>>> + * * address of the memory allocated	- success
>>> + * * NULL				- error
>>>     */
>>>    unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>>    		genpool_algo_t algo, void *data)
>>> @@ -292,7 +503,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>>    	struct gen_pool_chunk *chunk;
>>>    	unsigned long addr = 0;
>>>    	int order = pool->min_alloc_order;
>>> -	int nbits, start_bit, end_bit, remain;
>>> +	int nentries, start_entry, end_entry, remain;
>> Be nicer to use "unsigned int", but it's not clear from this diff that
>> this could work with other existing code.
>>
>>>    
>>>    #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>    	BUG_ON(in_nmi());
>>> @@ -301,29 +512,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>>    	if (size == 0)
>>>    		return 0;
>>>    
>>> -	nbits = (size + (1UL << order) - 1) >> order;
>>> +	nentries = mem_to_units(size, order);
>>>    	rcu_read_lock();
>>>    	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>>    		if (size > atomic_long_read(&chunk->avail))
>>>    			continue;
>>>    
>>> -		start_bit = 0;
>>> -		end_bit = chunk_size(chunk) >> order;
>>> +		start_entry = 0;
>>> +		end_entry = chunk_size(chunk) >> order;
>>>    retry:
>>> -		start_bit = algo(chunk->bits, end_bit, start_bit,
>>> -				 nbits, data, pool);
>>> -		if (start_bit >= end_bit)
>>> +		start_entry = algo(chunk->entries, end_entry, start_entry,
>>> +				  nentries, data, pool);
>>> +		if (start_entry >= end_entry)
>>>    			continue;
>>> -		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
>>> +		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
>>> +					 start_entry, nentries);
>>>    		if (remain) {
>>> -			remain = bitmap_clear_ll(chunk->bits, start_bit,
>>> -						 nbits - remain);
>>> -			BUG_ON(remain);
>>> +			remain = alter_bitmap_ll(CLEAR_BITS,
>>> +						 chunk->entries,
>>> +						 start_entry,
>>> +						 nentries - remain);
>>>    			goto retry;
>>>    		}
>>>    
>>> -		addr = chunk->start_addr + ((unsigned long)start_bit << order);
>>> -		size = nbits << order;
>>> +		addr = chunk->start_addr +
>>> +			((unsigned long)start_entry << order);
>>> +		size = nentries << order;
>>>    		atomic_long_sub(size, &chunk->avail);
>>>    		break;
>>>    	}
>>> @@ -333,7 +547,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>>>    EXPORT_SYMBOL(gen_pool_alloc_algo);
>>>    
>>>    /**
>>> - * gen_pool_dma_alloc - allocate special memory from the pool for DMA usage
>>> + * gen_pool_dma_alloc() - allocate special memory from the pool for DMA usage
>>>     * @pool: pool to allocate from
>>>     * @size: number of bytes to allocate from the pool
>>>     * @dma: dma-view physical address return value.  Use NULL if unneeded.
>>> @@ -342,6 +556,10 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
>>>     * Uses the pool allocation function (with first-fit algorithm by default).
>>>     * Can not be used in NMI handler on architectures without
>>>     * NMI-safe cmpxchg implementation.
>>> + *
>>> + * Return:
>>> + * * address of the memory allocated	- success
>>> + * * NULL				- error
>>>     */
>>>    void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>>>    {
>>> @@ -362,10 +580,10 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>>>    EXPORT_SYMBOL(gen_pool_dma_alloc);
>>>    
>>>    /**
>>> - * gen_pool_free - free allocated special memory back to the pool
>>> + * gen_pool_free() - free allocated special memory back to the pool
>>>     * @pool: pool to free to
>>>     * @addr: starting address of memory to free back to pool
>>> - * @size: size in bytes of memory to free
>>> + * @size: size in bytes of memory to free or 0, for auto-detection
>>>     *
>>>     * Free previously allocated special memory back to the specified
>>>     * pool.  Can not be used in NMI handler on architectures without
>>> @@ -375,22 +593,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>>>    {
>>>    	struct gen_pool_chunk *chunk;
>>>    	int order = pool->min_alloc_order;
>>> -	int start_bit, nbits, remain;
>>> +	int start_entry, remaining_entries, nentries, remain;
>>> +	int boundary;
>>>    
>>>    #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>    	BUG_ON(in_nmi());
>>>    #endif
>>>    
>>> -	nbits = (size + (1UL << order) - 1) >> order;
>>>    	rcu_read_lock();
>>>    	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>>    		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
>>>    			BUG_ON(addr + size - 1 > chunk->end_addr);
>>> -			start_bit = (addr - chunk->start_addr) >> order;
>>> -			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
>>> +			start_entry = (addr - chunk->start_addr) >> order;
>>> +			remaining_entries = (chunk->end_addr - addr) >> order;
>>> +			boundary = get_boundary(chunk->entries, start_entry,
>>> +						remaining_entries);
>>> +			BUG_ON(boundary < 0);
>> Do you really want to use BUG_ON()?  I've thought twice about using
>> BUG_ON() based on Linus's wrath with BUG_ON() code causing an issue with
>> the 4.8 release:
>>
>> https://lkml.org/lkml/2016/10/4/1
>>
>> Hence why I've been giving WARN_ON() suggestions throughout this review.
> Oh, I thought I had added explanations here too, but I did it only for
> pmalloc :-(
>
> Thanks for spotting this.
>
> To answer the question, do I really want to do it?
> Maybe not here, I wrote this before introducing the self test.
> But in the self test probably yes. The self test is optional and the
> idea is to prevent cases where there could be corruption of permanent
> storage.

Yah, it would be much better to have this in self-tests.


>
> I have read Linus' comments, but what is still not clear to me is:
> is there _any_ case where BUG_ON() is acceptable?

Probably not :-P.

Thanks,
Jay

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/7] genalloc: selftest
  2018-02-26 12:11     ` Igor Stoppa
@ 2018-02-26 17:46       ` J Freyensee
  2018-02-26 18:00         ` Igor Stoppa
  0 siblings, 1 reply; 32+ messages in thread
From: J Freyensee @ 2018-02-26 17:46 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 2/26/18 4:11 AM, Igor Stoppa wrote:
>
> On 24/02/18 00:42, J Freyensee wrote:
>>> +	locations[action->location] = gen_pool_alloc(pool, action->size);
>>> +	BUG_ON(!locations[action->location]);
>> Again, I'd think it through if you really want to use BUG_ON() or not:
>>
>> https://lwn.net/Articles/13183/
>> https://lkml.org/lkml/2016/10/4/1
> Is it acceptable to display only a WARNing, in case of risking damaging
> a mounted filesystem?

That's a good question.  Based upon those articles, 'yes'.  But it seems 
like a 'darned-if-you-do, darned-if-you-don't' question as couldn't you 
also corrupt a mounted filesystem by crashing the kernel, yes/no?

If you really want a system crash, maybe just do a panic() like 
filesystems also use?
>
> --
> igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/7] genalloc: selftest
  2018-02-26 17:46       ` J Freyensee
@ 2018-02-26 18:00         ` Igor Stoppa
  2018-02-26 19:12           ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 18:00 UTC (permalink / raw)
  To: J Freyensee, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 26/02/18 19:46, J Freyensee wrote:
> 
> 
> On 2/26/18 4:11 AM, Igor Stoppa wrote:
>>
>> On 24/02/18 00:42, J Freyensee wrote:
>>>> +	locations[action->location] = gen_pool_alloc(pool, action->size);
>>>> +	BUG_ON(!locations[action->location]);
>>> Again, I'd think it through if you really want to use BUG_ON() or not:
>>>
>>> https://lwn.net/Articles/13183/
>>> https://lkml.org/lkml/2016/10/4/1
>> Is it acceptable to display only a WARNing, in case of risking damaging
>> a mounted filesystem?
> 
> That's a good question.  Based upon those articles, 'yes'.  But it seems 
> like a 'darned-if-you-do, darned-if-you-don't' question as couldn't you 
> also corrupt a mounted filesystem by crashing the kernel, yes/no?

The idea is to do it very early in the boot phase, before early init,
when the kernel has not gotten even close to any storage device.

> If you really want a system crash, maybe just do a panic() like 
> filesystems also use?

ok, if that's a more acceptable way to halt the kernel, I do not mind.

--
igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/7] Protectable Memory
  2018-02-26 14:28     ` Igor Stoppa
@ 2018-02-26 18:25       ` J Freyensee
  0 siblings, 0 replies; 32+ messages in thread
From: J Freyensee @ 2018-02-26 18:25 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening

inlined responses.


On 2/26/18 6:28 AM, Igor Stoppa wrote:
>
> On 24/02/18 02:10, J Freyensee wrote:
>> On 2/23/18 6:48 AM, Igor Stoppa wrote:
> [...]
>
>>> +struct gen_pool *pmalloc_create_pool(const char *name,
>>> +					 int min_alloc_order);
>> Same comments as earlier.  If this is new API with new code being
>> introduced into the kernel, can the variables be declared to avoid weird
>> problems?  Like min_alloc_order being a negative value makes little
>> sense (based on the description here), so can it be declared as size_t
>> or unsigned int?
> in this case, yes, but I see it as different case

OK.

>
> [...]
>
>>> + * * NULL				- either no memory available or
>>> + *					  pool already read-only
>>> + */
>> I don't know if an errno value is being set, but setting a variable
>> somewhere using EROFS or ENOMEM would more accurate diagnose those two
>> NULL conditions.
> I expect that the latter is highly unlikely to happen, because the user
> of the API controls if the pool is locked or not.
>
> I think it shouldn't come as a surprise to the one who locked the pool,
> if the pool is locked.
>
> If the pool is used with concurrent users, attention should be paid to
> not lock it before ever user is happy (this is where the user of the API
> has to provide own locking.)
>
> Since the information if the pool is already protected is actually
> present in the pmalloc_data structure associated with the pool, I was
> tempted to make it available through the API, but that seemed wrong.
>
> The user of the API should be very aware of the state of the pool, since
> the user is the one who sets it. Why would it have to be read back?

OK, sounds good :-).

>
>>> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
>>> +
>>> +
>>> +/**
>>> + * pzalloc() - zero-initialized version of pmalloc
>>> + * @pool: handle to the pool to be used for memory allocation
>>> + * @size: amount of memory (in bytes) requested
>>> + * @gfp: flags for page allocation
>>> + *
>>> + * Executes pmalloc, initializing the memory requested to 0,
>>> + * before returning the pointer to it.
>>> + *
>>> + * Return:
>>> + * * pointer to the memory requested	- success
>>> + * * NULL				- either no memory available or
>>> + *					  pool already read-only
>>> + */
>> Same comment here, though that inline function below looks highly
>> optimized...
> The same applies here.
> I'm not very fond of the idea of returning the status elsewhere and in a
> way that is not intrinsically connected with the action that has
> determined the change of state.
>
> AFAIK *alloc functions return either the memory requested or NULL.
> I wonder how realistic this case is.

OK.

>
> [...]
>
>>> +	if (unlikely(!(pool && n && size)))
>> Has this code been run through sparse?
> I use "make C=1 W=1"

OK.

>
>> I know one thing sparse looks at
>> is if NULL is being treated like a 0, and sparse does check cases when 0
>> is being used in place for NULL for pointer checks, and I'm wondering if
>> that line of code would pass.
> It's a logical AND: wouldn't NULL translate to false, rather 0?
> I can add an explicit check against NULL, it's probably more readable
> too, but I don't think that the current construct treats NULL as 0.

If it passes sparse, I think it's fine.

>
> [...]
>
>>> +	if (unlikely(pool == NULL || s == NULL))
>> Here, the right check is being done, so at the very least, I would make
>> the last line I commented on the same as this one for code continuity.
> ok
>
> [...]
>
>>> +	if (unlikely(!(pool && chunk)))
>> Please make this check the same as the last line I commented on,
>> especially since it's the same struct being checked.
> yes

OK :-)


>
> [...]
>
>>> +	if (!name) {
>>> +		WARN_ON(1);
>> ??  Maybe the name check should be in WARN_ON()?
> true :-(

OK.


>
> [...]
>
>>> +	if (unlikely(!req_size || !pool))
>> same unlikely() check problem mentioned before.
>>> +		return -1;
>> Can we use an errno value instead for better diagnosibility?
>>> +
>>> +	data = pool->data;
>>> +
>>> +	if (data == NULL)
>>> +		return -1;
>> Same here (ENOMEM or ENXIO comes to mind).
>>> +
>>> +	if (unlikely(data->protected)) {
>>> +		WARN_ON(1);
>> Maybe re-write this with the check inside WARN_ON()?
>>> +		return -1;
>> Same here, how about a different errno value for this case?
> yes, to all of the above

OK.

>
> [...]
>
>>> +static void pmalloc_chunk_set_protection(struct gen_pool *pool,
>>> +
>>> +					 struct gen_pool_chunk *chunk,
>>> +					 void *data)
>>> +{
>>> +	const bool *flag = data;
>>> +	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
>>> +	unsigned long pages = chunk_size / PAGE_SIZE;
>>> +
>>> +	BUG_ON(chunk_size & (PAGE_SIZE - 1));
>> Re-think WARN_ON() for BUG_ON()?  And also check chunk as well, as it's
>> being used below?
> ok
>
>>> +
>>> +	if (*flag)
>>> +		set_memory_ro(chunk->start_addr, pages);
>>> +	else
>>> +		set_memory_rw(chunk->start_addr, pages);
>>> +}
>>> +
>>> +static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
>>> +{
>>> +	struct pmalloc_data *data;
>>> +	struct gen_pool_chunk *chunk;
>>> +
>>> +	if (unlikely(!pool))
>>> +		return -EINVAL;
>> This is example of what I'd perfer seeing in check_alloc_params().
> yes
>
>>> +
>>> +	data = pool->data;
>>> +
>>> +	if (unlikely(!data))
>>> +		return -EINVAL;
>> ENXIO or EIO or ENOMEM sound better?
> Why? At least based on he description from errno-base.h, EINVAL seemed
> the most appropriate:
>
> #define	EIO		 5	/* I/O error */
> #define	ENXIO		 6	/* No such device or address */
> #define	ENOMEM		12	/* Out of memory */
>
> #define	EINVAL		22	/* Invalid argument */
>
> If I was really pressed to change it, I'd rather pick:
>
> #define	EFAULT		14	/* Bad address */
>

OK, very reasonable.


>>> +
>>> +	if (unlikely(data->protected == protection)) {
>>> +		WARN_ON(1);
>> Better to put the check inside WARN_ON, me thinks...
> yes, I have no idea why I wrote it like that  :-(

No worries :-).


>
>>> +		return 0;
>>> +	}
>>> +
>>> +	data->protected = protection;
>>> +	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
>>> +		pmalloc_chunk_set_protection(pool, chunk, &protection);
>>> +	return 0;
>>> +}
>>> +
>>> +int pmalloc_protect_pool(struct gen_pool *pool)
>>> +{
>>> +	return pmalloc_pool_set_protection(pool, true);
>> Is pool == NULL being checked somewhere, similar to previous functions
>> in this patch?
> right.

OK.


>
>>> +}
>>> +
>>> +
>>> +static void pmalloc_chunk_free(struct gen_pool *pool,
>>> +			       struct gen_pool_chunk *chunk, void *data)
>>> +{
>> Wat is 'data' being used for? Looks unused.  Should parameters be
>> checked, like other ones?
>
> This is the iterator that is passed to genalloc
> genalloc defines the format for the iterator, because it *will* want to
> pass the pointer to the opaque data structure.

OK.


>
>>> +	untag_chunk(chunk);
>>> +	gen_pool_flush_chunk(pool, chunk);
>>> +	vfree_atomic((void *)chunk->start_addr);
>>> +}
>>> +
>>> +
>>> +int pmalloc_destroy_pool(struct gen_pool *pool)
>>> +{
>>> +	struct pmalloc_data *data;
>>> +
>>> +	if (unlikely(pool == NULL))
>>> +		return -EINVAL;
>>> +
>>> +	data = pool->data;
>>> +
>>> +	if (unlikely(data == NULL))
>>> +		return -EINVAL;
>> I'd use a different errno value since you already used it for pool.
> Thinking more about this, how about collapsing them?
> I do need to check for the value of data, before I dereference it,
> however the causes are very different:
>
> * wrong pool pointer - definitely possible
> * corrupted data structure (data member overwritten) - highly unlikely


That sounds good.


>>> +
>>> +	mutex_lock(&pmalloc_mutex);
>>> +	list_del(&data->node);
>>> +	mutex_unlock(&pmalloc_mutex);
>>> +
>>> +	if (likely(data->pool_kobject))
>>> +		pmalloc_disconnect(data, data->pool_kobject);
>>> +
>>> +	pmalloc_pool_set_protection(pool, false);
>>> +	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
>>> +	gen_pool_destroy(pool);
>>> +	kfree(data);
>> Does data need to be set to NULL in this case, as data is a member of
>> pool (pool->data)?  I'm worried about dangling pointer scenarios which
>> probably isn't good for security modules?
> The pool was destroyed in the previous step.
> There is nothing left that can be set to NULL.
> Unless we are expecting an use-after-free of the pool structure.
> But everything it was referring to is gone as well.
>
> If we really want to go after this case, then basically everything that
> has been allocated should be poisoned before being freed.
>
> Isn't it a bit too much?

I'd ask one of the maintainers (James or Serge) if it's too much, this 
is the Linux Security Module after all.  But it also could be a kernel 
hardening thing to do?

>
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * When the sysfs is ready to receive registrations, connect all the
>>> + * pools previously created. Also enable further pools to be connected
>>> + * right away.
>>> + */
>>> +static int __init pmalloc_late_init(void)
>>> +{
>>> +	struct pmalloc_data *data, *n;
>>> +
>>> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
>>> +
>>> +	mutex_lock(&pmalloc_mutex);
>>> +	pmalloc_list = &pmalloc_final_list;
>>> +
>>> +	if (likely(pmalloc_kobject != NULL)) {
>>> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
>>> +			list_move(&data->node, &pmalloc_final_list);
>>> +			pmalloc_connect(data);
>>> +		}
>>> +	}
>> It would be nice to have the init() return an error value in case of
>> failure.
> ok
>
> --
> igor

Thanks,
Jay

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] Documentation for Pmalloc
  2018-02-26 15:39     ` Igor Stoppa
@ 2018-02-26 18:32       ` J Freyensee
  0 siblings, 0 replies; 32+ messages in thread
From: J Freyensee @ 2018-02-26 18:32 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening

[...]

On 2/26/18 7:39 AM, Igor Stoppa wrote:
>
> On 24/02/18 02:26, J Freyensee wrote:
>>
>> On 2/23/18 6:48 AM, Igor Stoppa wrote:
> [...]
>
>>> +- Before destroying a pool, all the memory allocated from it must be
>>> +  released.
>> Is that true?  pmalloc_destroy_pool() has:
>>
>> .
>> .
>> +    pmalloc_pool_set_protection(pool, false);
>> +    gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
>> +    gen_pool_destroy(pool);
>> +    kfree(data);
>>
>> which to me looks like is the opposite, the data (ie, "memory") is being
>> released first, then the pool is destroyed.
> well, this is embarrassing ... yes I had this prototype code, because I
> was wondering if it wouldn't make more sense to tear down the pool as
> fast as possible. It slipped in, apparently.
>
> I'm actually tempted to leave it in and fix the comment.

Sure, one or the other.

>
> [...]
>
>>> +
>>> +- pmalloc does not provide locking support with respect to allocating vs
>>> +  protecting an individual pool, for performance reasons.
>> What is the recommendation to using locks then, as the computing
>> real-world mainly operates in multi-threaded/process world?
> How common are multi-threaded allocations of write-once memory?
> Here we are talking exclusively about the part of the memory life-cycle
> where it is allocated (from pmalloc).

Yah, that's true, good point.

>
>> Maybe show
>> an example of an issue that occur if locks aren't used and give a coding
>> example.
> An example of how to use a mutex to access a shared resource? :-O
>
> This part below, under your question, was supposed to be the answer :-(
>
>>> +  It is recommended not to share the same pool between unrelated functions.
>>> +  Should sharing be a necessity, the user of the shared pool is expected
>>> +  to implement locking for that pool.

My bad, I was suggesting a code sample, if there was a simple code 
sample to provide (like 5-10 lines?).  If it's a lot of code to write, 
no bother.

> [...]
>
>>> +- pmalloc uses genalloc to optimize the use of the space it allocates
>>> +  through vmalloc. Some more TLB entries will be used, however less than
>>> +  in the case of using vmalloc directly. The exact number depends on the
>>> +  size of each allocation request and possible slack.
>>> +
>>> +- Considering that not much data is supposed to be dynamically allocated
>>> +  and then marked as read-only, it shouldn't be an issue that the address
>>> +  range for pmalloc is limited, on 32-bit systems.
>> Why is 32-bit systems mentioned and not 64-bit?
> Because, as written, on 32 bit system the vmalloc range is relatively
> small, so one might wonder if there are enough addresses.
>
>>    Is there a problem with 64-bit here?
> Quite the opposite.
> I thought it was clear, but obviously it isn't, I'll reword this.

Sounds good, thank you,
Jay

>
> -igor
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] genalloc: track beginning of allocations
  2018-02-26 17:32       ` J Freyensee
@ 2018-02-26 18:44         ` Igor Stoppa
  0 siblings, 0 replies; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 18:44 UTC (permalink / raw)
  To: J Freyensee, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 26/02/18 19:32, J Freyensee wrote:
> My replies also inlined.
> 
> On 2/26/18 4:09 AM, Igor Stoppa wrote:

[...]

> But some of the code looks API'like to me, partly because of 
> all the function header documentation, which thank you for that, but I 
> wasn't sure where you drew your "API line" where the checks would be.

static and, even more, inlined static functions are not API, if found in
the .c file.


>> Is it assuming too much that the function will be used correctly, inside
>> the module it belongs to?
>>
>> And even at API level, I'd tend to say that if there are chances that
>> the data received is corrupted, then it should be sanitized, but otherwise,
>> why adding overhead?
> 
> It's good secure coding practice to check your parameters, you are 
> adding code to a security module after all ;-).

genalloc is not a security module :-P

it seems to be used in various places and for different purposes, also
depending on the architecture

For this reason I'm reluctant to add overhead.

> If it's brand-new code entering the kernel, it's better to err on the 
> side of having the extra checks and have a maintainer tell you to remove 
> it than the other way around- especially since this code is part of the 
> LSM solution.  What's worse- a tad bit of overhead catching a 
> corner-case scenario that can be more easily fixed or something not 
> caught that makes the kernel unstable?

ok, fair enough

[...]

>> If I really have to pick a place where to do the test, it's at API
>> level,
> 
> I agree, and if that is the case, I'm fine.

so I'll make sue that the API does sanitation

>> where the user of the API might fail to notice that the creation
>> of a pool failed and try to get memory from a non-existing pool.
>> That is the only scenario I can think of, where bogus data would be
>> received.
>>
>>>>    	return chunk->end_addr - chunk->start_addr + 1;
>>>>    }
>>>>    
>>>> -static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
>>>> +
>>>> +/**
>>>> + * set_bits_ll() - based on value and mask, sets bits at address
>>>> + * @addr: where to write
>>>> + * @mask: filter to apply for the bits to alter
>>>> + * @value: actual configuration of bits to store
>>>> + *
>>>> + * Return:
>>>> + * * 0		- success
>>>> + * * -EBUSY	- otherwise
>>>> + */
>>>> +static int set_bits_ll(unsigned long *addr,
>>>> +		       unsigned long mask, unsigned long value)
>>>>    {
>>>> -	unsigned long val, nval;
>>>> +	unsigned long nval;
>>>> +	unsigned long present;
>>>> +	unsigned long target;
>>>>    
>>>>    	nval = *addr;
>>> Same issue here with addr.
>> Again, I am more leaning toward believing that the user of the API might
>> forget to check for errors,
> 
> Same in agreement, so if that is the case, I'm ok.  It was a little hard 
> to tell what is exactly your API is.  I'm used to reviewing kernel code 
> where important API-like functions were heavily documented, and inner 
> routines were not...so seeing the function documentation (which is a 
> good thing :-)) made me think this was some sort of new API code I was 
> looking at.
it's static, therefore no API

> 
>> and pass a NULL pointer as pool, than to
>> believe something like this would happen.
>>
>> This is an address obtained from data managed automatically by the library.
>>
>> Can you please explain why you think it would be NULL?
> 
> Why would it be NULL?  I don't know, I'm not intimately familiar with 
> the code; but I default to implementing code defensively.  But I'll turn 
> the question around on you- why would it NOT be NULL?  Are you sure this 
> will never be NULL?  Are you going to trust the library that it always 
> provides a good address?  You should add to your function header 
> documentation why addr will NOT be NULL.

ok, I can add the explanation
which is: the corresponding memory is allocated when a pool is created.
should the allocation fail, the pool creation will fail consequently

The only cases which can cause this to be NULL within a pool are:
* accidental corruption
* attacker tampering with kernel memory

However they are both quite unlikely:
* accidental corruption should not happen so easily and, in case it
happens, it's likely to plow also some surrounding memory.
* this is just metadata, supposed to be useful mostly before the pool is
write-protected. If an attacker is capable of altering arbitrary kernel
data memory, there are far better targets.

[...]

>>>> +	/*
>>>> +	 * Prepare for writing the initial part of the allocation, from
>>>> +	 * starting entry, to the end of the UL bitmap element which
>>>> +	 * contains it. It might be larger than the actual allocation.
>>>> +	 */
>>>> +	start_bit = ENTRIES_TO_BITS(start_entry);
>>>> +	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
>>>> +	nbits = ENTRIES_TO_BITS(nentries);
>>> these statements won't make any sense if start_entry and nentries are
>>> negative values, which is possible based on the function definition
>>> alter_bitmap_ll().  Am I missing something that it's ok for these
>>> parameters to be negative?
>> This patch is extending the handling of the bitmap, it's not trying to
>> rewrite genalloc, thus it tries to not alter parts which are unrelated.
>> Like the type of parameters passed.
>>
>> What you are suggesting is a further cleanup of genalloc.
>> I'm not against it, but it's unrelated to this patchset.
> 
> OK, very reasonable.  Then I would think this would be a case to add a 
> check for negative values in the function parameters start_entry and 
> nentries as it's possible (though maybe not realistic) to have negative 
> values supplied, especially if there is currently no active maintainer 
> for genalloc().  Since you are fitting new code to genalloc's behavior 
> and this is a security module, I'll err on the side of checking the 
> parameters for bad values, or document in your function header comments 
> why it is expected for these parameters to never have negative values.

I'll figure out which alternative I dislike the least :-)

Probably just fix the data types in a separate patch.
This patch for genalloc has generated various comments which are
actually more about the original implementation than what I'm adding.


--
igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/7] genalloc: selftest
  2018-02-26 18:00         ` Igor Stoppa
@ 2018-02-26 19:12           ` Matthew Wilcox
  2018-02-26 19:26             ` Igor Stoppa
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2018-02-26 19:12 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: J Freyensee, david, keescook, mhocko, labbott,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On Mon, Feb 26, 2018 at 08:00:26PM +0200, Igor Stoppa wrote:
> On 26/02/18 19:46, J Freyensee wrote:
> > That's a good question.  Based upon those articles, 'yes'.  But it seems 
> > like a 'darned-if-you-do, darned-if-you-don't' question as couldn't you 
> > also corrupt a mounted filesystem by crashing the kernel, yes/no?
> 
> The idea is to do it very early in the boot phase, before early init,
> when the kernel has not gotten even close to any storage device.
> 
> > If you really want a system crash, maybe just do a panic() like 
> > filesystems also use?
> 
> ok, if that's a more acceptable way to halt the kernel, I do not mind.

panic() halts the kernel
BUG_ON() kills the thread
WARN_ON() just prints messages

Now, if we're at boot time and we're still executing code from the init
thread, killing init is equivalent to halting the kernel.

The question is, what is appropriate for test modules?  I would say
WARN_ON is not appropriate because people ignore warnings.  BUG_ON is
reasonable for development.  panic() is probably not.

Also, calling BUG_ON while holding a lock is not a good idea; if anything
needs to acquire that lock to shut down in a reasonable fashion, it's
going to hang.

And there's no need to do something like BUG_ON(!foo); foo->wibble = 1;
Dereferencing a NULL pointer already produces a nice informative splat.
In general, we assume other parts of the kernel are sane and if they pass
us a NULL pool, it's no good returning -EINVAL, we may as well just oops
and let somebody else debug it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/7] genalloc: selftest
  2018-02-26 19:12           ` Matthew Wilcox
@ 2018-02-26 19:26             ` Igor Stoppa
  0 siblings, 0 replies; 32+ messages in thread
From: Igor Stoppa @ 2018-02-26 19:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: J Freyensee, david, keescook, mhocko, labbott,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 26/02/18 21:12, Matthew Wilcox wrote:
[...]

> panic() halts the kernel
> BUG_ON() kills the thread
> WARN_ON() just prints messages
> 
> Now, if we're at boot time and we're still executing code from the init
> thread, killing init is equivalent to halting the kernel.
> 
> The question is, what is appropriate for test modules?  I would say
> WARN_ON is not appropriate because people ignore warnings.  BUG_ON is
> reasonable for development.  panic() is probably not.

Ok, so I can leave WARN_ON() in the libraries, and keep the more
restrictive BUG_ON() for the self test, which is optional for both
genalloc and pmalloc.

> Also, calling BUG_ON while holding a lock is not a good idea; if anything
> needs to acquire that lock to shut down in a reasonable fashion, it's
> going to hang.
> 
> And there's no need to do something like BUG_ON(!foo); foo->wibble = 1;
> Dereferencing a NULL pointer already produces a nice informative splat.
> In general, we assume other parts of the kernel are sane and if they pass
> us a NULL pool, it's no good returning -EINVAL, we may as well just oops
> and let somebody else debug it.

Great, that makes the code even simpler.

--
igor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] Pmalloc selftest
  2018-02-23 14:48 ` [PATCH 5/7] Pmalloc selftest Igor Stoppa
@ 2018-03-06 16:59   ` J Freyensee
  0 siblings, 0 replies; 32+ messages in thread
From: J Freyensee @ 2018-03-06 16:59 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening

Looks good,

Reviewed-by: Jay Freyensee <why2jjj.linux@gmail.com>


On 2/23/18 6:48 AM, Igor Stoppa wrote:
> Add basic self-test functionality for pmalloc.
>
> The testing is introduced as early as possible, right after the main
> dependency, genalloc, has passed successfully, so that it can help
> diagnosing failures in pmalloc users.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>   include/linux/test_pmalloc.h | 24 ++++++++++++++
>   init/main.c                  |  2 ++
>   mm/Kconfig                   |  9 +++++
>   mm/Makefile                  |  1 +
>   mm/test_pmalloc.c            | 79 ++++++++++++++++++++++++++++++++++++++++++++
>   5 files changed, 115 insertions(+)
>   create mode 100644 include/linux/test_pmalloc.h
>   create mode 100644 mm/test_pmalloc.c
>
> diff --git a/include/linux/test_pmalloc.h b/include/linux/test_pmalloc.h
> new file mode 100644
> index 000000000000..c7e2e451c17c
> --- /dev/null
> +++ b/include/linux/test_pmalloc.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * test_pmalloc.h
> + *
> + * (C) Copyright 2018 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +
> +#ifndef __LINUX_TEST_PMALLOC_H
> +#define __LINUX_TEST_PMALLOC_H
> +
> +
> +#ifdef CONFIG_TEST_PROTECTABLE_MEMORY
> +
> +void test_pmalloc(void);
> +
> +#else
> +
> +static inline void test_pmalloc(void){};
> +
> +#endif
> +
> +#endif
> diff --git a/init/main.c b/init/main.c
> index bfccf1fd463c..a61e3a5fd321 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -90,6 +90,7 @@
>   #include <linux/cache.h>
>   #include <linux/rodata_test.h>
>   #include <linux/test_genalloc.h>
> +#include <linux/test_pmalloc.h>
>   
>   #include <asm/io.h>
>   #include <asm/bugs.h>
> @@ -662,6 +663,7 @@ asmlinkage __visible void __init start_kernel(void)
>   	mem_encrypt_init();
>   
>   	test_genalloc();
> +	test_pmalloc();
>   #ifdef CONFIG_BLK_DEV_INITRD
>   	if (initrd_start && !initrd_below_start_ok &&
>   	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be578fbdce6d..81dcc5345631 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
>       depends on ARCH_HAS_SET_MEMORY
>       select GENERIC_ALLOCATOR
>       default y
> +
> +config TEST_PROTECTABLE_MEMORY
> +	bool "Run self test for pmalloc memory allocator"
> +	depends on ARCH_HAS_SET_MEMORY
> +	select PROTECTABLE_MEMORY
> +	default n
> +	help
> +	  Tries to verify that pmalloc works correctly and that the memory
> +	  is effectively protected.
> diff --git a/mm/Makefile b/mm/Makefile
> index 959fdbdac118..1de4be5fd0bc 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
>   obj-$(CONFIG_SLOB) += slob.o
>   obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
>   obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
> +obj-$(CONFIG_TEST_PROTECTABLE_MEMORY) += test_pmalloc.o
>   obj-$(CONFIG_KSM) += ksm.o
>   obj-$(CONFIG_PAGE_POISONING) += page_poison.o
>   obj-$(CONFIG_SLAB) += slab.o
> diff --git a/mm/test_pmalloc.c b/mm/test_pmalloc.c
> new file mode 100644
> index 000000000000..5de21c462d3d
> --- /dev/null
> +++ b/mm/test_pmalloc.c
> @@ -0,0 +1,79 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * test_pmalloc.c
> + *
> + * (C) Copyright 2018 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/pmalloc.h>
> +#include <linux/mm.h>
> +#include <linux/test_pmalloc.h>
> +
> +#define SIZE_1 (PAGE_SIZE * 3)
> +#define SIZE_2 1000
> +
> +#define validate_alloc(expected, variable, size)	\
> +	pr_notice("must be " expected ": %s",		\
> +		  is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
> +
> +#define is_alloc_ok(variable, size)	\
> +	validate_alloc("ok", variable, size)
> +
> +#define is_alloc_no(variable, size)	\
> +	validate_alloc("no", variable, size)
> +
> +void test_pmalloc(void)
> +{
> +	struct gen_pool *pool_unprot;
> +	struct gen_pool *pool_prot;
> +	void *var_prot, *var_unprot, *var_vmall;
> +
> +	pr_notice("pmalloc-selftest");
> +	pool_unprot = pmalloc_create_pool("unprotected", 0);
> +	if (unlikely(!pool_unprot))
> +		goto error;
> +	pool_prot = pmalloc_create_pool("protected", 0);
> +	if (unlikely(!(pool_prot)))
> +		goto error_release;
> +
> +	var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
> +	var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
> +	*(int *)var_prot = 0;
> +	var_vmall = vmalloc(SIZE_2);
> +	is_alloc_ok(var_unprot, 10);
> +	is_alloc_ok(var_unprot, SIZE_1);
> +	is_alloc_ok(var_unprot, PAGE_SIZE);
> +	is_alloc_no(var_unprot, SIZE_1 + 1);
> +	is_alloc_no(var_vmall, 10);
> +
> +
> +	pfree(pool_unprot, var_unprot);
> +	vfree(var_vmall);
> +
> +	pmalloc_protect_pool(pool_prot);
> +
> +	/*
> +	 * This will intentionally trigger a WARN because the pool being
> +	 * destroyed is not protected, which is unusual and should happen
> +	 * on error paths only, where probably other warnings are already
> +	 * displayed.
> +	 */
> +	pr_notice("pmalloc-selftest:"
> +		  " Expect WARN in pmalloc_pool_set_protection below.");
> +	pmalloc_destroy_pool(pool_unprot);
> +	pr_notice("pmalloc-selftest:"
> +		  "Critical point for expected WARN passed.");
> +
> +	/* This must not cause WARNings */
> +	pr_notice("pmalloc-selftest:"
> +		  "Expect no WARN below.");
> +	pmalloc_destroy_pool(pool_prot);
> +	pr_notice("pmalloc-selftest:"
> +		  "Critical point for unexpected WARN passed.");
> +	return;
> +error_release:
> +	pmalloc_destroy_pool(pool_unprot);
> +error:
> +	WARN(true, "Unable to allocate memory for pmalloc selftest.");
> +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var
  2018-02-23 14:48 ` [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var Igor Stoppa
  2018-02-25  3:46   ` kbuild test robot
@ 2018-03-06 17:05   ` J Freyensee
  2018-03-06 17:08     ` J Freyensee
  1 sibling, 1 reply; 32+ messages in thread
From: J Freyensee @ 2018-03-06 17:05 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening


>   
> +#ifdef CONFIG_PROTECTABLE_MEMORY
> +void lkdtm_WRITE_RO_PMALLOC(void)
> +{
> +	struct gen_pool *pool;
> +	int *i;
> +
> +	pool = pmalloc_create_pool("pool", 0);
> +	if (unlikely(!pool)) {
> +		pr_info("Failed preparing pool for pmalloc test.");
> +		return;
> +	}
> +
> +	i = (int *)pmalloc(pool, sizeof(int), GFP_KERNEL);
> +	if (unlikely(!i)) {
> +		pr_info("Failed allocating memory for pmalloc test.");
> +		pmalloc_destroy_pool(pool);
> +		return;
> +	}
> +
> +	*i = INT_MAX;
> +	pmalloc_protect_pool(pool);
> +
> +	pr_info("attempting bad pmalloc write at %p\n", i);
> +	*i = 0;

Seems harmless, but I don't get why *i local variable needs to be set to 
0 at the end of this function.


Otherwise,

Reviewed-by: Jay Freyensee <why2jjj.linux@gmail.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var
  2018-03-06 17:05   ` J Freyensee
@ 2018-03-06 17:08     ` J Freyensee
  0 siblings, 0 replies; 32+ messages in thread
From: J Freyensee @ 2018-03-06 17:08 UTC (permalink / raw)
  To: Igor Stoppa, david, willy, keescook, mhocko
  Cc: labbott, linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 3/6/18 9:05 AM, J Freyensee wrote:
>
>>   +#ifdef CONFIG_PROTECTABLE_MEMORY
>> +void lkdtm_WRITE_RO_PMALLOC(void)
>> +{
>> +    struct gen_pool *pool;
>> +    int *i;
>> +
>> +    pool = pmalloc_create_pool("pool", 0);
>> +    if (unlikely(!pool)) {
>> +        pr_info("Failed preparing pool for pmalloc test.");
>> +        return;
>> +    }
>> +
>> +    i = (int *)pmalloc(pool, sizeof(int), GFP_KERNEL);
>> +    if (unlikely(!i)) {
>> +        pr_info("Failed allocating memory for pmalloc test.");
>> +        pmalloc_destroy_pool(pool);
>> +        return;
>> +    }
>> +
>> +    *i = INT_MAX;
>> +    pmalloc_protect_pool(pool);
>> +
>> +    pr_info("attempting bad pmalloc write at %p\n", i);
>> +    *i = 0;
>

Opps, disregard this, this is the last series of this patch series, not 
the most recent one :-(.



> Seems harmless, but I don't get why *i local variable needs to be set 
> to 0 at the end of this function.
>
>
> Otherwise,
>
> Reviewed-by: Jay Freyensee <why2jjj.linux@gmail.com>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-03-06 17:08 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-23 14:48 [RFC PATCH v17 0/7] mm: security: ro protection for dynamic data Igor Stoppa
2018-02-23 14:48 ` [PATCH 1/7] genalloc: track beginning of allocations Igor Stoppa
2018-02-23 22:28   ` J Freyensee
2018-02-26 12:09     ` Igor Stoppa
2018-02-26 17:32       ` J Freyensee
2018-02-26 18:44         ` Igor Stoppa
2018-02-25  3:37   ` kbuild test robot
2018-02-23 14:48 ` [PATCH 2/7] genalloc: selftest Igor Stoppa
2018-02-23 22:42   ` J Freyensee
2018-02-26 12:11     ` Igor Stoppa
2018-02-26 17:46       ` J Freyensee
2018-02-26 18:00         ` Igor Stoppa
2018-02-26 19:12           ` Matthew Wilcox
2018-02-26 19:26             ` Igor Stoppa
2018-02-23 14:48 ` [PATCH 3/7] struct page: add field for vm_struct Igor Stoppa
2018-02-25  3:38   ` Matthew Wilcox
2018-02-26 16:37     ` Igor Stoppa
2018-02-23 14:48 ` [PATCH 4/7] Protectable Memory Igor Stoppa
2018-02-24  0:10   ` J Freyensee
2018-02-26 14:28     ` Igor Stoppa
2018-02-26 18:25       ` J Freyensee
2018-02-25  2:33   ` kbuild test robot
2018-02-23 14:48 ` [PATCH 5/7] Pmalloc selftest Igor Stoppa
2018-03-06 16:59   ` J Freyensee
2018-02-23 14:48 ` [PATCH 6/7] lkdtm: crash on overwriting protected pmalloc var Igor Stoppa
2018-02-25  3:46   ` kbuild test robot
2018-03-06 17:05   ` J Freyensee
2018-03-06 17:08     ` J Freyensee
2018-02-23 14:48 ` [PATCH 7/7] Documentation for Pmalloc Igor Stoppa
2018-02-24  0:26   ` J Freyensee
2018-02-26 15:39     ` Igor Stoppa
2018-02-26 18:32       ` J Freyensee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).