All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v15 0/6] mm: security: ro protection for dynamic data
@ 2018-02-11  3:19 ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v14:
[http://www.openwall.com/lists/kernel-hardening/2018/02/04/2]

- fix various warnings from sparse
- multiline comments
- fix naming of headers guards
- fix compilation of individual patches, for bisect
- split genalloc documentation about bitmap for allocation
- fix headers to match kerneldoc format for "Return:" field
- fix variable naming according to coding guidelines
- fix wrong default value for pmalloc Kconfig option
- refreshed integration of pmalloc with hardened usercopy
- removed unnecessary include that was causing compilation failures
- changed license of pmalloc documentation from GPL 2.0 to CC-BY-SA-4.0

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 ++++++++
 include/linux/genalloc-selftest.h  |  26 ++
 include/linux/genalloc.h           |   7 +-
 include/linux/mm_types.h           |   1 +
 include/linux/pmalloc.h            | 222 +++++++++++++++
 include/linux/vmalloc.h            |   1 +
 init/main.c                        |   2 +
 lib/Kconfig                        |  15 +
 lib/Makefile                       |   1 +
 lib/genalloc-selftest.c            | 400 ++++++++++++++++++++++++++
 lib/genalloc.c                     | 554 +++++++++++++++++++++++++++----------
 mm/Kconfig                         |  15 +
 mm/Makefile                        |   2 +
 mm/pmalloc-selftest.c              |  63 +++++
 mm/pmalloc-selftest.h              |  24 ++
 mm/pmalloc.c                       | 499 +++++++++++++++++++++++++++++++++
 mm/usercopy.c                      |  33 +++
 mm/vmalloc.c                       |  18 +-
 19 files changed, 1852 insertions(+), 146 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.14.1

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [RFC PATCH v15 0/6] mm: security: ro protection for dynamic data
@ 2018-02-11  3:19 ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: linux-security-module

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v14:
[http://www.openwall.com/lists/kernel-hardening/2018/02/04/2]

- fix various warnings from sparse
- multiline comments
- fix naming of headers guards
- fix compilation of individual patches, for bisect
- split genalloc documentation about bitmap for allocation
- fix headers to match kerneldoc format for "Return:" field
- fix variable naming according to coding guidelines
- fix wrong default value for pmalloc Kconfig option
- refreshed integration of pmalloc with hardened usercopy
- removed unnecessary include that was causing compilation failures
- changed license of pmalloc documentation from GPL 2.0 to CC-BY-SA-4.0

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 ++++++++
 include/linux/genalloc-selftest.h  |  26 ++
 include/linux/genalloc.h           |   7 +-
 include/linux/mm_types.h           |   1 +
 include/linux/pmalloc.h            | 222 +++++++++++++++
 include/linux/vmalloc.h            |   1 +
 init/main.c                        |   2 +
 lib/Kconfig                        |  15 +
 lib/Makefile                       |   1 +
 lib/genalloc-selftest.c            | 400 ++++++++++++++++++++++++++
 lib/genalloc.c                     | 554 +++++++++++++++++++++++++++----------
 mm/Kconfig                         |  15 +
 mm/Makefile                        |   2 +
 mm/pmalloc-selftest.c              |  63 +++++
 mm/pmalloc-selftest.h              |  24 ++
 mm/pmalloc.c                       | 499 +++++++++++++++++++++++++++++++++
 mm/usercopy.c                      |  33 +++
 mm/vmalloc.c                       |  18 +-
 19 files changed, 1852 insertions(+), 146 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [RFC PATCH v15 0/6] mm: security: ro protection for dynamic data
@ 2018-02-11  3:19 ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v14:
[http://www.openwall.com/lists/kernel-hardening/2018/02/04/2]

- fix various warnings from sparse
- multiline comments
- fix naming of headers guards
- fix compilation of individual patches, for bisect
- split genalloc documentation about bitmap for allocation
- fix headers to match kerneldoc format for "Return:" field
- fix variable naming according to coding guidelines
- fix wrong default value for pmalloc Kconfig option
- refreshed integration of pmalloc with hardened usercopy
- removed unnecessary include that was causing compilation failures
- changed license of pmalloc documentation from GPL 2.0 to CC-BY-SA-4.0

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 ++++++++
 include/linux/genalloc-selftest.h  |  26 ++
 include/linux/genalloc.h           |   7 +-
 include/linux/mm_types.h           |   1 +
 include/linux/pmalloc.h            | 222 +++++++++++++++
 include/linux/vmalloc.h            |   1 +
 init/main.c                        |   2 +
 lib/Kconfig                        |  15 +
 lib/Makefile                       |   1 +
 lib/genalloc-selftest.c            | 400 ++++++++++++++++++++++++++
 lib/genalloc.c                     | 554 +++++++++++++++++++++++++++----------
 mm/Kconfig                         |  15 +
 mm/Makefile                        |   2 +
 mm/pmalloc-selftest.c              |  63 +++++
 mm/pmalloc-selftest.h              |  24 ++
 mm/pmalloc.c                       | 499 +++++++++++++++++++++++++++++++++
 mm/usercopy.c                      |  33 +++
 mm/vmalloc.c                       |  18 +-
 19 files changed, 1852 insertions(+), 146 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [RFC PATCH v15 0/6] mm: security: ro protection for dynamic data
@ 2018-02-11  3:19 ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v14:
[http://www.openwall.com/lists/kernel-hardening/2018/02/04/2]

- fix various warnings from sparse
- multiline comments
- fix naming of headers guards
- fix compilation of individual patches, for bisect
- split genalloc documentation about bitmap for allocation
- fix headers to match kerneldoc format for "Return:" field
- fix variable naming according to coding guidelines
- fix wrong default value for pmalloc Kconfig option
- refreshed integration of pmalloc with hardened usercopy
- removed unnecessary include that was causing compilation failures
- changed license of pmalloc documentation from GPL 2.0 to CC-BY-SA-4.0

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 ++++++++
 include/linux/genalloc-selftest.h  |  26 ++
 include/linux/genalloc.h           |   7 +-
 include/linux/mm_types.h           |   1 +
 include/linux/pmalloc.h            | 222 +++++++++++++++
 include/linux/vmalloc.h            |   1 +
 init/main.c                        |   2 +
 lib/Kconfig                        |  15 +
 lib/Makefile                       |   1 +
 lib/genalloc-selftest.c            | 400 ++++++++++++++++++++++++++
 lib/genalloc.c                     | 554 +++++++++++++++++++++++++++----------
 mm/Kconfig                         |  15 +
 mm/Makefile                        |   2 +
 mm/pmalloc-selftest.c              |  63 +++++
 mm/pmalloc-selftest.h              |  24 ++
 mm/pmalloc.c                       | 499 +++++++++++++++++++++++++++++++++
 mm/usercopy.c                      |  33 +++
 mm/vmalloc.c                       |  18 +-
 19 files changed, 1852 insertions(+), 146 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.14.1

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
  2018-02-11  3:19 ` Igor Stoppa
  (?)
  (?)
@ 2018-02-11  3:19   ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c           | 527 ++++++++++++++++++++++++++++++++++-------------
 2 files changed, 390 insertions(+), 141 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
-#include <linux/atomic.h>
+#include <linux/slab.h>
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
 	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
 	unsigned long start_addr;	/* start address of memory chunk */
 	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
-	unsigned long bits[0];		/* bitmap for allocating memory chunk */
+	unsigned long entries[0];	/* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..044347163acb 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -26,6 +26,74 @@
  *
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
+ *
+ *
+ *
+ * Encoding of the bitmap tracking the allocations
+ * -----------------------------------------------
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ *  - allocation status (busy/free)
+ *  - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb.......LSb:
+ *
+ *  ...000010111100000010101011   <-- Read in this direction.
+ *     \__|\__|\|\____|\______|
+ *        |   | |     |       \___ 4 used allocation units
+ *        |   | |     \___________ 3 empty allocation units
+ *        |   | \_________________ 1 used allocation unit
+ *        |   \___________________ 2 used allocation units
+ *        \_______________________ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the research for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only at even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking internally.
+ *
+ * If ever needed, the bitmap could be extended, assigning larger amounts
+ * of bits to each allocation unit (the increase must follow powers of 2),
+ * to track other properties of the allocations.
  */
 
 #include <linux/slab.h>
@@ -36,118 +104,230 @@
 #include <linux/genalloc.h>
 #include <linux/of_device.h>
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
+
+/* Binary pattern of 1010...1010 that spans one unsigned long. */
+#define MASK (~0UL / 3 * 2)
+
+/**
+ * get_bitmap_entry - extracts the specified entry from the bitmap
+ * @map: pointer to a bitmap
+ * @entry_index: the index of the desired entry in the bitmap
+ *
+ * Return: The requested bitmap.
+ */
+static inline unsigned long get_bitmap_entry(unsigned long *map,
+					    int entry_index)
+{
+	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
+		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
+		ENTRY_MASK;
+}
+
+
+/**
+ * mem_to_units - convert references to memory into orders of allocation
+ * @size: amount in bytes
+ * @order: power of 2 represented by each entry in the bitmap
+ *
+ * Returns the number of units representing the size.
+ */
+static inline unsigned long mem_to_units(unsigned long size,
+					 unsigned long order)
+{
+	return (size + (1UL << order) - 1) >> order;
+}
+
+/**
+ * chunk_size - dimension of a chunk of memory, in bytes
+ * @chunk: pointer to the struct describing the chunk
+ *
+ * Return: The size of the chunk, in bytes.
+ */
 static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
 {
 	return chunk->end_addr - chunk->start_addr + 1;
 }
 
-static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+
+/**
+ * set_bits_ll - according to the mask, sets the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to store
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int set_bits_ll(unsigned long *addr,
+		       unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
 	do {
-		val = nval;
-		if (val & mask_to_set)
+		present = nval;
+		if (present & mask)
 			return -EBUSY;
+		target =  present | value;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+
+/**
+ * clear_bits_ll - according to the mask, clears the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to clear
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int clear_bits_ll(unsigned long *addr,
+			 unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
+	present = nval;
+	if (unlikely((present & mask) ^ value))
+		return -EBUSY;
 	do {
-		val = nval;
-		if ((val & mask_to_clear) != mask_to_clear)
+		present = nval;
+		if (unlikely((present & mask) ^ value))
 			return -EBUSY;
+		target =  present & ~mask;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-/*
- * bitmap_set_ll - set the specified number of bits at the specified position
+
+/**
+ * get_boundary - verify that an allocation effectively
+ * starts at the given address, then measure its length.
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start_entry: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
  *
- * Set @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users set the same bit, one user will return remain bits, otherwise
- * return 0.
+ * Return: the length of an allocation, otherwise -EINVAL if the
+ * parameters do not refer to a correct allocation.
  */
-static int bitmap_set_ll(unsigned long *map, int start, int nr)
+static int get_boundary(unsigned long *map, int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_set >= 0) {
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-		nr -= bits_to_set;
-		bits_to_set = BITS_PER_LONG;
-		mask_to_set = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-	}
+	int i;
+	unsigned long bitmap_entry;
 
-	return 0;
+
+	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
+		return -EINVAL;
+	for (i = start_entry + 1; i < nentries; i++) {
+		bitmap_entry = get_bitmap_entry(map, i);
+		if (bitmap_entry == ENTRY_HEAD ||
+		    bitmap_entry == ENTRY_UNUSED)
+			return i;
+	}
+	return nentries - start_entry;
 }
 
+
+#define SET_BITS 1
+#define CLEAR_BITS 0
+
 /*
- * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * alter_bitmap_ll - set or clear the entries associated with an allocation
+ * @alteration: indicates if the bits selected should be set or cleared
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
+ *
+ * The modification happens lock-lessly.
+ * Several users can write to the same map simultaneously, without lock.
  *
- * Clear @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users clear the same bit, one user will return remain bits,
- * otherwise return 0.
+ * Return: If two users alter the same bit, to one it will return
+ * remaining entries, to the other it will return 0.
  */
-static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+static int alter_bitmap_ll(bool alteration, unsigned long *map,
+			   int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_clear >= 0) {
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
-		nr -= bits_to_clear;
-		bits_to_clear = BITS_PER_LONG;
-		mask_to_clear = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
+	unsigned long start_bit;
+	unsigned long end_bit;
+	unsigned long mask;
+	unsigned long value;
+	int nbits;
+	int bits_to_write;
+	int index;
+	int (*action)(unsigned long *addr,
+		      unsigned long mask, unsigned long value);
+
+	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
+
+	/*
+	 * Prepare for writing the initial part of the allocation, from
+	 * starting entry, to the end of the UL bitmap element which
+	 * contains it. It might be larger than the actual allocation.
+	 */
+	start_bit = ENTRIES_TO_BITS(start_entry);
+	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
+	nbits = ENTRIES_TO_BITS(nentries);
+	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
+	mask = BITMAP_FIRST_WORD_MASK(start_bit);
+	/* Mark the beginning of the allocation. */
+	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
+	index = BITS_DIV_LONGS(start_bit);
+
+	/*
+	 * Writes entries to the bitmap, as long as the reminder is
+	 * positive or zero.
+	 * Might be skipped if the entries to write do not reach the end
+	 * of a bitmap UL unit.
+	 */
+	while (nbits >= bits_to_write) {
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+		nbits -= bits_to_write;
+		bits_to_write = BITS_PER_LONG;
+		mask = ~0UL;
+		value = MASK;
+		index++;
 	}
 
+	/* Takes care of the ending part of the entries to mark. */
+	if (nbits > 0) {
+		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
+		bits_to_write = nbits;
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+	}
 	return 0;
 }
 
+
 /**
  * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
+ * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
  * @nid: node id of the node the pool structure should be allocated on, or -1
  *
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface.
+ *
+ * Return: pointer to the pool, if successful, NULL otherwise
  */
 struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
 {
@@ -177,16 +357,18 @@ EXPORT_SYMBOL(gen_pool_create);
  *
  * Add a new chunk of special memory to the specified pool.
  *
- * Returns 0 on success or a -ve errno on failure.
+ * Return: 0 on success or a -ve errno on failure.
  */
 int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
 		 size_t size, int nid)
 {
 	struct gen_pool_chunk *chunk;
-	int nbits = size >> pool->min_alloc_order;
-	int nbytes = sizeof(struct gen_pool_chunk) +
-				BITS_TO_LONGS(nbits) * sizeof(long);
+	int nentries;
+	int nbytes;
 
+	nentries = size >> pool->min_alloc_order;
+	nbytes = sizeof(struct gen_pool_chunk) +
+		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
 	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
 	if (unlikely(chunk == NULL))
 		return -ENOMEM;
@@ -209,7 +391,7 @@ EXPORT_SYMBOL(gen_pool_add_virt);
  * @pool: pool to allocate from
  * @addr: starting address of memory
  *
- * Returns the physical address on success, or -1 on error.
+ * Return: the physical address on success, or -1 on error.
  */
 phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
 {
@@ -248,7 +430,7 @@ void gen_pool_destroy(struct gen_pool *pool)
 		list_del(&chunk->next_chunk);
 
 		end_bit = chunk_size(chunk) >> order;
-		bit = find_next_bit(chunk->bits, end_bit, 0);
+		bit = find_next_bit(chunk->entries, end_bit, 0);
 		BUG_ON(bit < end_bit);
 
 		kfree(chunk);
@@ -267,6 +449,8 @@ EXPORT_SYMBOL(gen_pool_destroy);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 {
@@ -285,6 +469,8 @@ EXPORT_SYMBOL(gen_pool_alloc);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 		genpool_algo_t algo, void *data)
@@ -292,7 +478,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	struct gen_pool_chunk *chunk;
 	unsigned long addr = 0;
 	int order = pool->min_alloc_order;
-	int nbits, start_bit, end_bit, remain;
+	int nentries, start_entry, end_entry, remain;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
@@ -301,29 +487,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	if (size == 0)
 		return 0;
 
-	nbits = (size + (1UL << order) - 1) >> order;
+	nentries = mem_to_units(size, order);
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (size > atomic_long_read(&chunk->avail))
 			continue;
 
-		start_bit = 0;
-		end_bit = chunk_size(chunk) >> order;
+		start_entry = 0;
+		end_entry = chunk_size(chunk) >> order;
 retry:
-		start_bit = algo(chunk->bits, end_bit, start_bit,
-				 nbits, data, pool);
-		if (start_bit >= end_bit)
+		start_entry = algo(chunk->entries, end_entry, start_entry,
+				  nentries, data, pool);
+		if (start_entry >= end_entry)
 			continue;
-		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
+					 start_entry, nentries);
 		if (remain) {
-			remain = bitmap_clear_ll(chunk->bits, start_bit,
-						 nbits - remain);
-			BUG_ON(remain);
+			remain = alter_bitmap_ll(CLEAR_BITS,
+						 chunk->entries,
+						 start_entry,
+						 nentries - remain);
 			goto retry;
 		}
 
-		addr = chunk->start_addr + ((unsigned long)start_bit << order);
-		size = nbits << order;
+		addr = chunk->start_addr +
+			((unsigned long)start_entry << order);
+		size = nentries << order;
 		atomic_long_sub(size, &chunk->avail);
 		break;
 	}
@@ -342,6 +531,8 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
 {
@@ -365,7 +556,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
  * gen_pool_free - free allocated special memory back to the pool
  * @pool: pool to free to
  * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * @size: size in bytes of memory to free or 0, for auto-detection
  *
  * Free previously allocated special memory back to the specified
  * pool.  Can not be used in NMI handler on architectures without
@@ -375,22 +566,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 {
 	struct gen_pool_chunk *chunk;
 	int order = pool->min_alloc_order;
-	int start_bit, nbits, remain;
+	int start_entry, remaining_entries, nentries, remain;
+	int boundary;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
 #endif
 
-	nbits = (size + (1UL << order) - 1) >> order;
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
 			BUG_ON(addr + size - 1 > chunk->end_addr);
-			start_bit = (addr - chunk->start_addr) >> order;
-			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+			start_entry = (addr - chunk->start_addr) >> order;
+			remaining_entries = (chunk->end_addr - addr) >> order;
+			boundary = get_boundary(chunk->entries, start_entry,
+						remaining_entries);
+			BUG_ON(boundary < 0);
+			nentries = boundary - start_entry;
+			BUG_ON(size &&
+			       (nentries != mem_to_units(size, order)));
+			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
+						 start_entry, nentries);
 			BUG_ON(remain);
-			size = nbits << order;
-			atomic_long_add(size, &chunk->avail);
+			atomic_long_add(nentries << order, &chunk->avail);
 			rcu_read_unlock();
 			return;
 		}
@@ -428,8 +626,9 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk);
  * @start:	start address
  * @size:	size of the region
  *
- * Check if the range of addresses falls within the specified pool. Returns
- * true if the entire range is contained in the pool and false otherwise.
+ * Check if the range of addresses falls within the specified pool.
+ *
+ * Return: true if the entire range is contained in the pool, false otherwise.
  */
 bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
 			size_t size)
@@ -455,7 +654,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
  * gen_pool_avail - get available free space of the pool
  * @pool: pool to get available free space
  *
- * Return available free space of the specified pool.
+ * Return: available free space of the specified pool.
  */
 size_t gen_pool_avail(struct gen_pool *pool)
 {
@@ -474,7 +673,7 @@ EXPORT_SYMBOL_GPL(gen_pool_avail);
  * gen_pool_size - get size in bytes of memory managed by the pool
  * @pool: pool to get size
  *
- * Return size in bytes of memory managed by the pool.
+ * Return: size in bytes of memory managed by the pool.
  */
 size_t gen_pool_size(struct gen_pool *pool)
 {
@@ -517,17 +716,27 @@ EXPORT_SYMBOL(gen_pool_set_algo);
  * gen_pool_first_fit - find the first available region
  * of memory matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	return bitmap_find_next_zero_area(map, size, start, nr, 0);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit);
 
@@ -535,11 +744,13 @@ EXPORT_SYMBOL(gen_pool_first_fit);
  * gen_pool_first_fit_align - find the first available region
  * of memory matching the size requirement (alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -547,23 +758,32 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_align *alignment;
 	unsigned long align_mask;
+	unsigned long bit_index;
 	int order;
 
 	alignment = data;
 	order = pool->min_alloc_order;
-	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	align_mask = roundup_pow_of_two(
+			ENTRIES_TO_BITS(mem_to_units(alignment->align,
+						     order))) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_align);
 
 /**
  * gen_pool_fixed_alloc - reserve a specific region
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -571,20 +791,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_fixed *fixed_data;
 	int order;
-	unsigned long offset_bit;
-	unsigned long start_bit;
+	unsigned long offset;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
 	fixed_data = data;
 	order = pool->min_alloc_order;
-	offset_bit = fixed_data->offset >> order;
 	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
 		return size;
+	offset = fixed_data->offset >> order;
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start + offset),
+					       ENTRIES_TO_BITS(nr), align_mask);
+	if (bit_index != ENTRIES_TO_BITS(offset))
+		return size;
 
-	start_bit = bitmap_find_next_zero_area(map, size,
-			start + offset_bit, nr, 0);
-	if (start_bit != offset_bit)
-		start_bit = size;
-	return start_bit;
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_fixed_alloc);
 
@@ -593,60 +816,84 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
  * of memory matching the size requirement. The region will be aligned
  * to the order of the size specified.
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_order_align(unsigned long *map,
 		unsigned long size, unsigned long start,
 		unsigned int nr, void *data, struct gen_pool *pool)
 {
-	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
-
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_order_align);
 
 /**
  * gen_pool_best_fit - find the best fitting region of memory
- * macthing the size requirement (no alignment constraint)
+ * matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
  *
  * Iterate over the bitmap to find the smallest free region
  * which we can allocate the memory.
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	unsigned long start_bit = size;
+	unsigned long start_bit = ENTRIES_TO_BITS(size);
 	unsigned long len = size + 1;
 	unsigned long index;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
-	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	index = BITS_DIV_ENTRIES(bit_index);
 
 	while (index < size) {
-		int next_bit = find_next_bit(map, size, index + nr);
-		if ((next_bit - index) < len) {
-			len = next_bit - index;
-			start_bit = index;
+		int next_bit;
+
+		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
+					 ENTRIES_TO_BITS(index + nr));
+		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
+			len = BITS_DIV_ENTRIES(next_bit) - index;
+			start_bit = ENTRIES_TO_BITS(index);
 			if (len == nr)
-				return start_bit;
+				return BITS_DIV_ENTRIES(start_bit);
 		}
-		index = bitmap_find_next_zero_area(map, size,
-						   next_bit + 1, nr, 0);
+		bit_index =
+			bitmap_find_next_zero_area(map,
+						   ENTRIES_TO_BITS(size),
+						   next_bit + 1,
+						   ENTRIES_TO_BITS(nr),
+						   align_mask);
+		index = BITS_DIV_ENTRIES(bit_index);
 	}
 
-	return start_bit;
+	return BITS_DIV_ENTRIES(start_bit);
 }
-EXPORT_SYMBOL(gen_pool_best_fit);
 
 static void devm_gen_pool_release(struct device *dev, void *res)
 {
@@ -672,7 +919,7 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
  * @dev: device to retrieve the gen_pool from
  * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
  *
- * Returns the gen_pool for the device if one is present, or NULL.
+ * Return: the gen_pool for the device if one is present, or NULL.
  */
 struct gen_pool *gen_pool_get(struct device *dev, const char *name)
 {
@@ -696,6 +943,8 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface. The pool will be
  * automatically destroyed by the device management code.
+ *
+ * Return: the address of the pool, if successful, otherwise NULL
  */
 struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
 				      int nid, const char *name)
@@ -743,7 +992,7 @@ EXPORT_SYMBOL(devm_gen_pool_create);
  * @propname: property name containing phandle(s)
  * @index: index into the phandle array
  *
- * Returns the pool that contains the chunk starting at the physical
+ * Return: the pool that contains the chunk starting at the physical
  * address of the device tree node pointed at by the phandle property,
  * or NULL if not found.
  */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: linux-security-module

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c           | 527 ++++++++++++++++++++++++++++++++++-------------
 2 files changed, 390 insertions(+), 141 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
-#include <linux/atomic.h>
+#include <linux/slab.h>
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
 	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
 	unsigned long start_addr;	/* start address of memory chunk */
 	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
-	unsigned long bits[0];		/* bitmap for allocating memory chunk */
+	unsigned long entries[0];	/* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..044347163acb 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -26,6 +26,74 @@
  *
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
+ *
+ *
+ *
+ * Encoding of the bitmap tracking the allocations
+ * -----------------------------------------------
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ *  - allocation status (busy/free)
+ *  - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb.......LSb:
+ *
+ *  ...000010111100000010101011   <-- Read in this direction.
+ *     \__|\__|\|\____|\______|
+ *        |   | |     |       \___ 4 used allocation units
+ *        |   | |     \___________ 3 empty allocation units
+ *        |   | \_________________ 1 used allocation unit
+ *        |   \___________________ 2 used allocation units
+ *        \_______________________ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the research for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only@even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking internally.
+ *
+ * If ever needed, the bitmap could be extended, assigning larger amounts
+ * of bits to each allocation unit (the increase must follow powers of 2),
+ * to track other properties of the allocations.
  */
 
 #include <linux/slab.h>
@@ -36,118 +104,230 @@
 #include <linux/genalloc.h>
 #include <linux/of_device.h>
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
+
+/* Binary pattern of 1010...1010 that spans one unsigned long. */
+#define MASK (~0UL / 3 * 2)
+
+/**
+ * get_bitmap_entry - extracts the specified entry from the bitmap
+ * @map: pointer to a bitmap
+ * @entry_index: the index of the desired entry in the bitmap
+ *
+ * Return: The requested bitmap.
+ */
+static inline unsigned long get_bitmap_entry(unsigned long *map,
+					    int entry_index)
+{
+	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
+		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
+		ENTRY_MASK;
+}
+
+
+/**
+ * mem_to_units - convert references to memory into orders of allocation
+ * @size: amount in bytes
+ * @order: power of 2 represented by each entry in the bitmap
+ *
+ * Returns the number of units representing the size.
+ */
+static inline unsigned long mem_to_units(unsigned long size,
+					 unsigned long order)
+{
+	return (size + (1UL << order) - 1) >> order;
+}
+
+/**
+ * chunk_size - dimension of a chunk of memory, in bytes
+ * @chunk: pointer to the struct describing the chunk
+ *
+ * Return: The size of the chunk, in bytes.
+ */
 static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
 {
 	return chunk->end_addr - chunk->start_addr + 1;
 }
 
-static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+
+/**
+ * set_bits_ll - according to the mask, sets the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to store
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int set_bits_ll(unsigned long *addr,
+		       unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
 	do {
-		val = nval;
-		if (val & mask_to_set)
+		present = nval;
+		if (present & mask)
 			return -EBUSY;
+		target =  present | value;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+
+/**
+ * clear_bits_ll - according to the mask, clears the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to clear
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int clear_bits_ll(unsigned long *addr,
+			 unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
+	present = nval;
+	if (unlikely((present & mask) ^ value))
+		return -EBUSY;
 	do {
-		val = nval;
-		if ((val & mask_to_clear) != mask_to_clear)
+		present = nval;
+		if (unlikely((present & mask) ^ value))
 			return -EBUSY;
+		target =  present & ~mask;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-/*
- * bitmap_set_ll - set the specified number of bits at the specified position
+
+/**
+ * get_boundary - verify that an allocation effectively
+ * starts at the given address, then measure its length.
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start_entry: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
  *
- * Set @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users set the same bit, one user will return remain bits, otherwise
- * return 0.
+ * Return: the length of an allocation, otherwise -EINVAL if the
+ * parameters do not refer to a correct allocation.
  */
-static int bitmap_set_ll(unsigned long *map, int start, int nr)
+static int get_boundary(unsigned long *map, int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_set >= 0) {
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-		nr -= bits_to_set;
-		bits_to_set = BITS_PER_LONG;
-		mask_to_set = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-	}
+	int i;
+	unsigned long bitmap_entry;
 
-	return 0;
+
+	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
+		return -EINVAL;
+	for (i = start_entry + 1; i < nentries; i++) {
+		bitmap_entry = get_bitmap_entry(map, i);
+		if (bitmap_entry == ENTRY_HEAD ||
+		    bitmap_entry == ENTRY_UNUSED)
+			return i;
+	}
+	return nentries - start_entry;
 }
 
+
+#define SET_BITS 1
+#define CLEAR_BITS 0
+
 /*
- * bitmap_clear_ll - clear the specified number of bits@the specified position
+ * alter_bitmap_ll - set or clear the entries associated with an allocation
+ * @alteration: indicates if the bits selected should be set or cleared
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
+ *
+ * The modification happens lock-lessly.
+ * Several users can write to the same map simultaneously, without lock.
  *
- * Clear @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users clear the same bit, one user will return remain bits,
- * otherwise return 0.
+ * Return: If two users alter the same bit, to one it will return
+ * remaining entries, to the other it will return 0.
  */
-static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+static int alter_bitmap_ll(bool alteration, unsigned long *map,
+			   int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_clear >= 0) {
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
-		nr -= bits_to_clear;
-		bits_to_clear = BITS_PER_LONG;
-		mask_to_clear = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
+	unsigned long start_bit;
+	unsigned long end_bit;
+	unsigned long mask;
+	unsigned long value;
+	int nbits;
+	int bits_to_write;
+	int index;
+	int (*action)(unsigned long *addr,
+		      unsigned long mask, unsigned long value);
+
+	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
+
+	/*
+	 * Prepare for writing the initial part of the allocation, from
+	 * starting entry, to the end of the UL bitmap element which
+	 * contains it. It might be larger than the actual allocation.
+	 */
+	start_bit = ENTRIES_TO_BITS(start_entry);
+	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
+	nbits = ENTRIES_TO_BITS(nentries);
+	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
+	mask = BITMAP_FIRST_WORD_MASK(start_bit);
+	/* Mark the beginning of the allocation. */
+	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
+	index = BITS_DIV_LONGS(start_bit);
+
+	/*
+	 * Writes entries to the bitmap, as long as the reminder is
+	 * positive or zero.
+	 * Might be skipped if the entries to write do not reach the end
+	 * of a bitmap UL unit.
+	 */
+	while (nbits >= bits_to_write) {
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+		nbits -= bits_to_write;
+		bits_to_write = BITS_PER_LONG;
+		mask = ~0UL;
+		value = MASK;
+		index++;
 	}
 
+	/* Takes care of the ending part of the entries to mark. */
+	if (nbits > 0) {
+		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
+		bits_to_write = nbits;
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+	}
 	return 0;
 }
 
+
 /**
  * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
+ * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
  * @nid: node id of the node the pool structure should be allocated on, or -1
  *
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface.
+ *
+ * Return: pointer to the pool, if successful, NULL otherwise
  */
 struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
 {
@@ -177,16 +357,18 @@ EXPORT_SYMBOL(gen_pool_create);
  *
  * Add a new chunk of special memory to the specified pool.
  *
- * Returns 0 on success or a -ve errno on failure.
+ * Return: 0 on success or a -ve errno on failure.
  */
 int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
 		 size_t size, int nid)
 {
 	struct gen_pool_chunk *chunk;
-	int nbits = size >> pool->min_alloc_order;
-	int nbytes = sizeof(struct gen_pool_chunk) +
-				BITS_TO_LONGS(nbits) * sizeof(long);
+	int nentries;
+	int nbytes;
 
+	nentries = size >> pool->min_alloc_order;
+	nbytes = sizeof(struct gen_pool_chunk) +
+		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
 	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
 	if (unlikely(chunk == NULL))
 		return -ENOMEM;
@@ -209,7 +391,7 @@ EXPORT_SYMBOL(gen_pool_add_virt);
  * @pool: pool to allocate from
  * @addr: starting address of memory
  *
- * Returns the physical address on success, or -1 on error.
+ * Return: the physical address on success, or -1 on error.
  */
 phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
 {
@@ -248,7 +430,7 @@ void gen_pool_destroy(struct gen_pool *pool)
 		list_del(&chunk->next_chunk);
 
 		end_bit = chunk_size(chunk) >> order;
-		bit = find_next_bit(chunk->bits, end_bit, 0);
+		bit = find_next_bit(chunk->entries, end_bit, 0);
 		BUG_ON(bit < end_bit);
 
 		kfree(chunk);
@@ -267,6 +449,8 @@ EXPORT_SYMBOL(gen_pool_destroy);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 {
@@ -285,6 +469,8 @@ EXPORT_SYMBOL(gen_pool_alloc);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 		genpool_algo_t algo, void *data)
@@ -292,7 +478,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	struct gen_pool_chunk *chunk;
 	unsigned long addr = 0;
 	int order = pool->min_alloc_order;
-	int nbits, start_bit, end_bit, remain;
+	int nentries, start_entry, end_entry, remain;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
@@ -301,29 +487,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	if (size == 0)
 		return 0;
 
-	nbits = (size + (1UL << order) - 1) >> order;
+	nentries = mem_to_units(size, order);
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (size > atomic_long_read(&chunk->avail))
 			continue;
 
-		start_bit = 0;
-		end_bit = chunk_size(chunk) >> order;
+		start_entry = 0;
+		end_entry = chunk_size(chunk) >> order;
 retry:
-		start_bit = algo(chunk->bits, end_bit, start_bit,
-				 nbits, data, pool);
-		if (start_bit >= end_bit)
+		start_entry = algo(chunk->entries, end_entry, start_entry,
+				  nentries, data, pool);
+		if (start_entry >= end_entry)
 			continue;
-		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
+					 start_entry, nentries);
 		if (remain) {
-			remain = bitmap_clear_ll(chunk->bits, start_bit,
-						 nbits - remain);
-			BUG_ON(remain);
+			remain = alter_bitmap_ll(CLEAR_BITS,
+						 chunk->entries,
+						 start_entry,
+						 nentries - remain);
 			goto retry;
 		}
 
-		addr = chunk->start_addr + ((unsigned long)start_bit << order);
-		size = nbits << order;
+		addr = chunk->start_addr +
+			((unsigned long)start_entry << order);
+		size = nentries << order;
 		atomic_long_sub(size, &chunk->avail);
 		break;
 	}
@@ -342,6 +531,8 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
 {
@@ -365,7 +556,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
  * gen_pool_free - free allocated special memory back to the pool
  * @pool: pool to free to
  * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * @size: size in bytes of memory to free or 0, for auto-detection
  *
  * Free previously allocated special memory back to the specified
  * pool.  Can not be used in NMI handler on architectures without
@@ -375,22 +566,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 {
 	struct gen_pool_chunk *chunk;
 	int order = pool->min_alloc_order;
-	int start_bit, nbits, remain;
+	int start_entry, remaining_entries, nentries, remain;
+	int boundary;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
 #endif
 
-	nbits = (size + (1UL << order) - 1) >> order;
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
 			BUG_ON(addr + size - 1 > chunk->end_addr);
-			start_bit = (addr - chunk->start_addr) >> order;
-			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+			start_entry = (addr - chunk->start_addr) >> order;
+			remaining_entries = (chunk->end_addr - addr) >> order;
+			boundary = get_boundary(chunk->entries, start_entry,
+						remaining_entries);
+			BUG_ON(boundary < 0);
+			nentries = boundary - start_entry;
+			BUG_ON(size &&
+			       (nentries != mem_to_units(size, order)));
+			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
+						 start_entry, nentries);
 			BUG_ON(remain);
-			size = nbits << order;
-			atomic_long_add(size, &chunk->avail);
+			atomic_long_add(nentries << order, &chunk->avail);
 			rcu_read_unlock();
 			return;
 		}
@@ -428,8 +626,9 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk);
  * @start:	start address
  * @size:	size of the region
  *
- * Check if the range of addresses falls within the specified pool. Returns
- * true if the entire range is contained in the pool and false otherwise.
+ * Check if the range of addresses falls within the specified pool.
+ *
+ * Return: true if the entire range is contained in the pool, false otherwise.
  */
 bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
 			size_t size)
@@ -455,7 +654,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
  * gen_pool_avail - get available free space of the pool
  * @pool: pool to get available free space
  *
- * Return available free space of the specified pool.
+ * Return: available free space of the specified pool.
  */
 size_t gen_pool_avail(struct gen_pool *pool)
 {
@@ -474,7 +673,7 @@ EXPORT_SYMBOL_GPL(gen_pool_avail);
  * gen_pool_size - get size in bytes of memory managed by the pool
  * @pool: pool to get size
  *
- * Return size in bytes of memory managed by the pool.
+ * Return: size in bytes of memory managed by the pool.
  */
 size_t gen_pool_size(struct gen_pool *pool)
 {
@@ -517,17 +716,27 @@ EXPORT_SYMBOL(gen_pool_set_algo);
  * gen_pool_first_fit - find the first available region
  * of memory matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	return bitmap_find_next_zero_area(map, size, start, nr, 0);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit);
 
@@ -535,11 +744,13 @@ EXPORT_SYMBOL(gen_pool_first_fit);
  * gen_pool_first_fit_align - find the first available region
  * of memory matching the size requirement (alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -547,23 +758,32 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_align *alignment;
 	unsigned long align_mask;
+	unsigned long bit_index;
 	int order;
 
 	alignment = data;
 	order = pool->min_alloc_order;
-	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	align_mask = roundup_pow_of_two(
+			ENTRIES_TO_BITS(mem_to_units(alignment->align,
+						     order))) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_align);
 
 /**
  * gen_pool_fixed_alloc - reserve a specific region
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -571,20 +791,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_fixed *fixed_data;
 	int order;
-	unsigned long offset_bit;
-	unsigned long start_bit;
+	unsigned long offset;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
 	fixed_data = data;
 	order = pool->min_alloc_order;
-	offset_bit = fixed_data->offset >> order;
 	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
 		return size;
+	offset = fixed_data->offset >> order;
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start + offset),
+					       ENTRIES_TO_BITS(nr), align_mask);
+	if (bit_index != ENTRIES_TO_BITS(offset))
+		return size;
 
-	start_bit = bitmap_find_next_zero_area(map, size,
-			start + offset_bit, nr, 0);
-	if (start_bit != offset_bit)
-		start_bit = size;
-	return start_bit;
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_fixed_alloc);
 
@@ -593,60 +816,84 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
  * of memory matching the size requirement. The region will be aligned
  * to the order of the size specified.
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_order_align(unsigned long *map,
 		unsigned long size, unsigned long start,
 		unsigned int nr, void *data, struct gen_pool *pool)
 {
-	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
-
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_order_align);
 
 /**
  * gen_pool_best_fit - find the best fitting region of memory
- * macthing the size requirement (no alignment constraint)
+ * matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
  *
  * Iterate over the bitmap to find the smallest free region
  * which we can allocate the memory.
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	unsigned long start_bit = size;
+	unsigned long start_bit = ENTRIES_TO_BITS(size);
 	unsigned long len = size + 1;
 	unsigned long index;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
-	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	index = BITS_DIV_ENTRIES(bit_index);
 
 	while (index < size) {
-		int next_bit = find_next_bit(map, size, index + nr);
-		if ((next_bit - index) < len) {
-			len = next_bit - index;
-			start_bit = index;
+		int next_bit;
+
+		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
+					 ENTRIES_TO_BITS(index + nr));
+		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
+			len = BITS_DIV_ENTRIES(next_bit) - index;
+			start_bit = ENTRIES_TO_BITS(index);
 			if (len == nr)
-				return start_bit;
+				return BITS_DIV_ENTRIES(start_bit);
 		}
-		index = bitmap_find_next_zero_area(map, size,
-						   next_bit + 1, nr, 0);
+		bit_index =
+			bitmap_find_next_zero_area(map,
+						   ENTRIES_TO_BITS(size),
+						   next_bit + 1,
+						   ENTRIES_TO_BITS(nr),
+						   align_mask);
+		index = BITS_DIV_ENTRIES(bit_index);
 	}
 
-	return start_bit;
+	return BITS_DIV_ENTRIES(start_bit);
 }
-EXPORT_SYMBOL(gen_pool_best_fit);
 
 static void devm_gen_pool_release(struct device *dev, void *res)
 {
@@ -672,7 +919,7 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
  * @dev: device to retrieve the gen_pool from
  * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
  *
- * Returns the gen_pool for the device if one is present, or NULL.
+ * Return: the gen_pool for the device if one is present, or NULL.
  */
 struct gen_pool *gen_pool_get(struct device *dev, const char *name)
 {
@@ -696,6 +943,8 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface. The pool will be
  * automatically destroyed by the device management code.
+ *
+ * Return: the address of the pool, if successful, otherwise NULL
  */
 struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
 				      int nid, const char *name)
@@ -743,7 +992,7 @@ EXPORT_SYMBOL(devm_gen_pool_create);
  * @propname: property name containing phandle(s)
  * @index: index into the phandle array
  *
- * Returns the pool that contains the chunk starting at the physical
+ * Return: the pool that contains the chunk starting at the physical
  * address of the device tree node pointed at by the phandle property,
  * or NULL if not found.
  */
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info@ http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c           | 527 ++++++++++++++++++++++++++++++++++-------------
 2 files changed, 390 insertions(+), 141 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
-#include <linux/atomic.h>
+#include <linux/slab.h>
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
 	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
 	unsigned long start_addr;	/* start address of memory chunk */
 	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
-	unsigned long bits[0];		/* bitmap for allocating memory chunk */
+	unsigned long entries[0];	/* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..044347163acb 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -26,6 +26,74 @@
  *
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
+ *
+ *
+ *
+ * Encoding of the bitmap tracking the allocations
+ * -----------------------------------------------
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ *  - allocation status (busy/free)
+ *  - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb.......LSb:
+ *
+ *  ...000010111100000010101011   <-- Read in this direction.
+ *     \__|\__|\|\____|\______|
+ *        |   | |     |       \___ 4 used allocation units
+ *        |   | |     \___________ 3 empty allocation units
+ *        |   | \_________________ 1 used allocation unit
+ *        |   \___________________ 2 used allocation units
+ *        \_______________________ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the research for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only at even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking internally.
+ *
+ * If ever needed, the bitmap could be extended, assigning larger amounts
+ * of bits to each allocation unit (the increase must follow powers of 2),
+ * to track other properties of the allocations.
  */
 
 #include <linux/slab.h>
@@ -36,118 +104,230 @@
 #include <linux/genalloc.h>
 #include <linux/of_device.h>
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
+
+/* Binary pattern of 1010...1010 that spans one unsigned long. */
+#define MASK (~0UL / 3 * 2)
+
+/**
+ * get_bitmap_entry - extracts the specified entry from the bitmap
+ * @map: pointer to a bitmap
+ * @entry_index: the index of the desired entry in the bitmap
+ *
+ * Return: The requested bitmap.
+ */
+static inline unsigned long get_bitmap_entry(unsigned long *map,
+					    int entry_index)
+{
+	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
+		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
+		ENTRY_MASK;
+}
+
+
+/**
+ * mem_to_units - convert references to memory into orders of allocation
+ * @size: amount in bytes
+ * @order: power of 2 represented by each entry in the bitmap
+ *
+ * Returns the number of units representing the size.
+ */
+static inline unsigned long mem_to_units(unsigned long size,
+					 unsigned long order)
+{
+	return (size + (1UL << order) - 1) >> order;
+}
+
+/**
+ * chunk_size - dimension of a chunk of memory, in bytes
+ * @chunk: pointer to the struct describing the chunk
+ *
+ * Return: The size of the chunk, in bytes.
+ */
 static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
 {
 	return chunk->end_addr - chunk->start_addr + 1;
 }
 
-static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+
+/**
+ * set_bits_ll - according to the mask, sets the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to store
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int set_bits_ll(unsigned long *addr,
+		       unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
 	do {
-		val = nval;
-		if (val & mask_to_set)
+		present = nval;
+		if (present & mask)
 			return -EBUSY;
+		target =  present | value;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+
+/**
+ * clear_bits_ll - according to the mask, clears the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to clear
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int clear_bits_ll(unsigned long *addr,
+			 unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
+	present = nval;
+	if (unlikely((present & mask) ^ value))
+		return -EBUSY;
 	do {
-		val = nval;
-		if ((val & mask_to_clear) != mask_to_clear)
+		present = nval;
+		if (unlikely((present & mask) ^ value))
 			return -EBUSY;
+		target =  present & ~mask;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-/*
- * bitmap_set_ll - set the specified number of bits at the specified position
+
+/**
+ * get_boundary - verify that an allocation effectively
+ * starts at the given address, then measure its length.
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start_entry: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
  *
- * Set @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users set the same bit, one user will return remain bits, otherwise
- * return 0.
+ * Return: the length of an allocation, otherwise -EINVAL if the
+ * parameters do not refer to a correct allocation.
  */
-static int bitmap_set_ll(unsigned long *map, int start, int nr)
+static int get_boundary(unsigned long *map, int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_set >= 0) {
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-		nr -= bits_to_set;
-		bits_to_set = BITS_PER_LONG;
-		mask_to_set = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-	}
+	int i;
+	unsigned long bitmap_entry;
 
-	return 0;
+
+	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
+		return -EINVAL;
+	for (i = start_entry + 1; i < nentries; i++) {
+		bitmap_entry = get_bitmap_entry(map, i);
+		if (bitmap_entry == ENTRY_HEAD ||
+		    bitmap_entry == ENTRY_UNUSED)
+			return i;
+	}
+	return nentries - start_entry;
 }
 
+
+#define SET_BITS 1
+#define CLEAR_BITS 0
+
 /*
- * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * alter_bitmap_ll - set or clear the entries associated with an allocation
+ * @alteration: indicates if the bits selected should be set or cleared
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
+ *
+ * The modification happens lock-lessly.
+ * Several users can write to the same map simultaneously, without lock.
  *
- * Clear @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users clear the same bit, one user will return remain bits,
- * otherwise return 0.
+ * Return: If two users alter the same bit, to one it will return
+ * remaining entries, to the other it will return 0.
  */
-static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+static int alter_bitmap_ll(bool alteration, unsigned long *map,
+			   int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_clear >= 0) {
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
-		nr -= bits_to_clear;
-		bits_to_clear = BITS_PER_LONG;
-		mask_to_clear = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
+	unsigned long start_bit;
+	unsigned long end_bit;
+	unsigned long mask;
+	unsigned long value;
+	int nbits;
+	int bits_to_write;
+	int index;
+	int (*action)(unsigned long *addr,
+		      unsigned long mask, unsigned long value);
+
+	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
+
+	/*
+	 * Prepare for writing the initial part of the allocation, from
+	 * starting entry, to the end of the UL bitmap element which
+	 * contains it. It might be larger than the actual allocation.
+	 */
+	start_bit = ENTRIES_TO_BITS(start_entry);
+	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
+	nbits = ENTRIES_TO_BITS(nentries);
+	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
+	mask = BITMAP_FIRST_WORD_MASK(start_bit);
+	/* Mark the beginning of the allocation. */
+	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
+	index = BITS_DIV_LONGS(start_bit);
+
+	/*
+	 * Writes entries to the bitmap, as long as the reminder is
+	 * positive or zero.
+	 * Might be skipped if the entries to write do not reach the end
+	 * of a bitmap UL unit.
+	 */
+	while (nbits >= bits_to_write) {
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+		nbits -= bits_to_write;
+		bits_to_write = BITS_PER_LONG;
+		mask = ~0UL;
+		value = MASK;
+		index++;
 	}
 
+	/* Takes care of the ending part of the entries to mark. */
+	if (nbits > 0) {
+		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
+		bits_to_write = nbits;
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+	}
 	return 0;
 }
 
+
 /**
  * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
+ * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
  * @nid: node id of the node the pool structure should be allocated on, or -1
  *
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface.
+ *
+ * Return: pointer to the pool, if successful, NULL otherwise
  */
 struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
 {
@@ -177,16 +357,18 @@ EXPORT_SYMBOL(gen_pool_create);
  *
  * Add a new chunk of special memory to the specified pool.
  *
- * Returns 0 on success or a -ve errno on failure.
+ * Return: 0 on success or a -ve errno on failure.
  */
 int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
 		 size_t size, int nid)
 {
 	struct gen_pool_chunk *chunk;
-	int nbits = size >> pool->min_alloc_order;
-	int nbytes = sizeof(struct gen_pool_chunk) +
-				BITS_TO_LONGS(nbits) * sizeof(long);
+	int nentries;
+	int nbytes;
 
+	nentries = size >> pool->min_alloc_order;
+	nbytes = sizeof(struct gen_pool_chunk) +
+		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
 	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
 	if (unlikely(chunk == NULL))
 		return -ENOMEM;
@@ -209,7 +391,7 @@ EXPORT_SYMBOL(gen_pool_add_virt);
  * @pool: pool to allocate from
  * @addr: starting address of memory
  *
- * Returns the physical address on success, or -1 on error.
+ * Return: the physical address on success, or -1 on error.
  */
 phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
 {
@@ -248,7 +430,7 @@ void gen_pool_destroy(struct gen_pool *pool)
 		list_del(&chunk->next_chunk);
 
 		end_bit = chunk_size(chunk) >> order;
-		bit = find_next_bit(chunk->bits, end_bit, 0);
+		bit = find_next_bit(chunk->entries, end_bit, 0);
 		BUG_ON(bit < end_bit);
 
 		kfree(chunk);
@@ -267,6 +449,8 @@ EXPORT_SYMBOL(gen_pool_destroy);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 {
@@ -285,6 +469,8 @@ EXPORT_SYMBOL(gen_pool_alloc);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 		genpool_algo_t algo, void *data)
@@ -292,7 +478,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	struct gen_pool_chunk *chunk;
 	unsigned long addr = 0;
 	int order = pool->min_alloc_order;
-	int nbits, start_bit, end_bit, remain;
+	int nentries, start_entry, end_entry, remain;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
@@ -301,29 +487,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	if (size == 0)
 		return 0;
 
-	nbits = (size + (1UL << order) - 1) >> order;
+	nentries = mem_to_units(size, order);
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (size > atomic_long_read(&chunk->avail))
 			continue;
 
-		start_bit = 0;
-		end_bit = chunk_size(chunk) >> order;
+		start_entry = 0;
+		end_entry = chunk_size(chunk) >> order;
 retry:
-		start_bit = algo(chunk->bits, end_bit, start_bit,
-				 nbits, data, pool);
-		if (start_bit >= end_bit)
+		start_entry = algo(chunk->entries, end_entry, start_entry,
+				  nentries, data, pool);
+		if (start_entry >= end_entry)
 			continue;
-		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
+					 start_entry, nentries);
 		if (remain) {
-			remain = bitmap_clear_ll(chunk->bits, start_bit,
-						 nbits - remain);
-			BUG_ON(remain);
+			remain = alter_bitmap_ll(CLEAR_BITS,
+						 chunk->entries,
+						 start_entry,
+						 nentries - remain);
 			goto retry;
 		}
 
-		addr = chunk->start_addr + ((unsigned long)start_bit << order);
-		size = nbits << order;
+		addr = chunk->start_addr +
+			((unsigned long)start_entry << order);
+		size = nentries << order;
 		atomic_long_sub(size, &chunk->avail);
 		break;
 	}
@@ -342,6 +531,8 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
 {
@@ -365,7 +556,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
  * gen_pool_free - free allocated special memory back to the pool
  * @pool: pool to free to
  * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * @size: size in bytes of memory to free or 0, for auto-detection
  *
  * Free previously allocated special memory back to the specified
  * pool.  Can not be used in NMI handler on architectures without
@@ -375,22 +566,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 {
 	struct gen_pool_chunk *chunk;
 	int order = pool->min_alloc_order;
-	int start_bit, nbits, remain;
+	int start_entry, remaining_entries, nentries, remain;
+	int boundary;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
 #endif
 
-	nbits = (size + (1UL << order) - 1) >> order;
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
 			BUG_ON(addr + size - 1 > chunk->end_addr);
-			start_bit = (addr - chunk->start_addr) >> order;
-			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+			start_entry = (addr - chunk->start_addr) >> order;
+			remaining_entries = (chunk->end_addr - addr) >> order;
+			boundary = get_boundary(chunk->entries, start_entry,
+						remaining_entries);
+			BUG_ON(boundary < 0);
+			nentries = boundary - start_entry;
+			BUG_ON(size &&
+			       (nentries != mem_to_units(size, order)));
+			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
+						 start_entry, nentries);
 			BUG_ON(remain);
-			size = nbits << order;
-			atomic_long_add(size, &chunk->avail);
+			atomic_long_add(nentries << order, &chunk->avail);
 			rcu_read_unlock();
 			return;
 		}
@@ -428,8 +626,9 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk);
  * @start:	start address
  * @size:	size of the region
  *
- * Check if the range of addresses falls within the specified pool. Returns
- * true if the entire range is contained in the pool and false otherwise.
+ * Check if the range of addresses falls within the specified pool.
+ *
+ * Return: true if the entire range is contained in the pool, false otherwise.
  */
 bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
 			size_t size)
@@ -455,7 +654,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
  * gen_pool_avail - get available free space of the pool
  * @pool: pool to get available free space
  *
- * Return available free space of the specified pool.
+ * Return: available free space of the specified pool.
  */
 size_t gen_pool_avail(struct gen_pool *pool)
 {
@@ -474,7 +673,7 @@ EXPORT_SYMBOL_GPL(gen_pool_avail);
  * gen_pool_size - get size in bytes of memory managed by the pool
  * @pool: pool to get size
  *
- * Return size in bytes of memory managed by the pool.
+ * Return: size in bytes of memory managed by the pool.
  */
 size_t gen_pool_size(struct gen_pool *pool)
 {
@@ -517,17 +716,27 @@ EXPORT_SYMBOL(gen_pool_set_algo);
  * gen_pool_first_fit - find the first available region
  * of memory matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	return bitmap_find_next_zero_area(map, size, start, nr, 0);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit);
 
@@ -535,11 +744,13 @@ EXPORT_SYMBOL(gen_pool_first_fit);
  * gen_pool_first_fit_align - find the first available region
  * of memory matching the size requirement (alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -547,23 +758,32 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_align *alignment;
 	unsigned long align_mask;
+	unsigned long bit_index;
 	int order;
 
 	alignment = data;
 	order = pool->min_alloc_order;
-	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	align_mask = roundup_pow_of_two(
+			ENTRIES_TO_BITS(mem_to_units(alignment->align,
+						     order))) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_align);
 
 /**
  * gen_pool_fixed_alloc - reserve a specific region
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -571,20 +791,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_fixed *fixed_data;
 	int order;
-	unsigned long offset_bit;
-	unsigned long start_bit;
+	unsigned long offset;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
 	fixed_data = data;
 	order = pool->min_alloc_order;
-	offset_bit = fixed_data->offset >> order;
 	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
 		return size;
+	offset = fixed_data->offset >> order;
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start + offset),
+					       ENTRIES_TO_BITS(nr), align_mask);
+	if (bit_index != ENTRIES_TO_BITS(offset))
+		return size;
 
-	start_bit = bitmap_find_next_zero_area(map, size,
-			start + offset_bit, nr, 0);
-	if (start_bit != offset_bit)
-		start_bit = size;
-	return start_bit;
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_fixed_alloc);
 
@@ -593,60 +816,84 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
  * of memory matching the size requirement. The region will be aligned
  * to the order of the size specified.
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_order_align(unsigned long *map,
 		unsigned long size, unsigned long start,
 		unsigned int nr, void *data, struct gen_pool *pool)
 {
-	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
-
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_order_align);
 
 /**
  * gen_pool_best_fit - find the best fitting region of memory
- * macthing the size requirement (no alignment constraint)
+ * matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
  *
  * Iterate over the bitmap to find the smallest free region
  * which we can allocate the memory.
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	unsigned long start_bit = size;
+	unsigned long start_bit = ENTRIES_TO_BITS(size);
 	unsigned long len = size + 1;
 	unsigned long index;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
-	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	index = BITS_DIV_ENTRIES(bit_index);
 
 	while (index < size) {
-		int next_bit = find_next_bit(map, size, index + nr);
-		if ((next_bit - index) < len) {
-			len = next_bit - index;
-			start_bit = index;
+		int next_bit;
+
+		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
+					 ENTRIES_TO_BITS(index + nr));
+		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
+			len = BITS_DIV_ENTRIES(next_bit) - index;
+			start_bit = ENTRIES_TO_BITS(index);
 			if (len == nr)
-				return start_bit;
+				return BITS_DIV_ENTRIES(start_bit);
 		}
-		index = bitmap_find_next_zero_area(map, size,
-						   next_bit + 1, nr, 0);
+		bit_index =
+			bitmap_find_next_zero_area(map,
+						   ENTRIES_TO_BITS(size),
+						   next_bit + 1,
+						   ENTRIES_TO_BITS(nr),
+						   align_mask);
+		index = BITS_DIV_ENTRIES(bit_index);
 	}
 
-	return start_bit;
+	return BITS_DIV_ENTRIES(start_bit);
 }
-EXPORT_SYMBOL(gen_pool_best_fit);
 
 static void devm_gen_pool_release(struct device *dev, void *res)
 {
@@ -672,7 +919,7 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
  * @dev: device to retrieve the gen_pool from
  * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
  *
- * Returns the gen_pool for the device if one is present, or NULL.
+ * Return: the gen_pool for the device if one is present, or NULL.
  */
 struct gen_pool *gen_pool_get(struct device *dev, const char *name)
 {
@@ -696,6 +943,8 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface. The pool will be
  * automatically destroyed by the device management code.
+ *
+ * Return: the address of the pool, if successful, otherwise NULL
  */
 struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
 				      int nid, const char *name)
@@ -743,7 +992,7 @@ EXPORT_SYMBOL(devm_gen_pool_create);
  * @propname: property name containing phandle(s)
  * @index: index into the phandle array
  *
- * Returns the pool that contains the chunk starting at the physical
+ * Return: the pool that contains the chunk starting at the physical
  * address of the device tree node pointed at by the phandle property,
  * or NULL if not found.
  */
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c           | 527 ++++++++++++++++++++++++++++++++++-------------
 2 files changed, 390 insertions(+), 141 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
-#include <linux/atomic.h>
+#include <linux/slab.h>
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
 	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
 	unsigned long start_addr;	/* start address of memory chunk */
 	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
-	unsigned long bits[0];		/* bitmap for allocating memory chunk */
+	unsigned long entries[0];	/* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..044347163acb 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -26,6 +26,74 @@
  *
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
+ *
+ *
+ *
+ * Encoding of the bitmap tracking the allocations
+ * -----------------------------------------------
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ *  - allocation status (busy/free)
+ *  - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb.......LSb:
+ *
+ *  ...000010111100000010101011   <-- Read in this direction.
+ *     \__|\__|\|\____|\______|
+ *        |   | |     |       \___ 4 used allocation units
+ *        |   | |     \___________ 3 empty allocation units
+ *        |   | \_________________ 1 used allocation unit
+ *        |   \___________________ 2 used allocation units
+ *        \_______________________ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the research for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only at even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking internally.
+ *
+ * If ever needed, the bitmap could be extended, assigning larger amounts
+ * of bits to each allocation unit (the increase must follow powers of 2),
+ * to track other properties of the allocations.
  */
 
 #include <linux/slab.h>
@@ -36,118 +104,230 @@
 #include <linux/genalloc.h>
 #include <linux/of_device.h>
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
+
+/* Binary pattern of 1010...1010 that spans one unsigned long. */
+#define MASK (~0UL / 3 * 2)
+
+/**
+ * get_bitmap_entry - extracts the specified entry from the bitmap
+ * @map: pointer to a bitmap
+ * @entry_index: the index of the desired entry in the bitmap
+ *
+ * Return: The requested bitmap.
+ */
+static inline unsigned long get_bitmap_entry(unsigned long *map,
+					    int entry_index)
+{
+	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
+		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
+		ENTRY_MASK;
+}
+
+
+/**
+ * mem_to_units - convert references to memory into orders of allocation
+ * @size: amount in bytes
+ * @order: power of 2 represented by each entry in the bitmap
+ *
+ * Returns the number of units representing the size.
+ */
+static inline unsigned long mem_to_units(unsigned long size,
+					 unsigned long order)
+{
+	return (size + (1UL << order) - 1) >> order;
+}
+
+/**
+ * chunk_size - dimension of a chunk of memory, in bytes
+ * @chunk: pointer to the struct describing the chunk
+ *
+ * Return: The size of the chunk, in bytes.
+ */
 static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
 {
 	return chunk->end_addr - chunk->start_addr + 1;
 }
 
-static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+
+/**
+ * set_bits_ll - according to the mask, sets the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to store
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int set_bits_ll(unsigned long *addr,
+		       unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
 	do {
-		val = nval;
-		if (val & mask_to_set)
+		present = nval;
+		if (present & mask)
 			return -EBUSY;
+		target =  present | value;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+
+/**
+ * clear_bits_ll - according to the mask, clears the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to clear
+ *
+ * Return: 0 upon success, -EBUSY otherwise
+ */
+static int clear_bits_ll(unsigned long *addr,
+			 unsigned long mask, unsigned long value)
 {
-	unsigned long val, nval;
+	unsigned long nval;
+	unsigned long present;
+	unsigned long target;
 
 	nval = *addr;
+	present = nval;
+	if (unlikely((present & mask) ^ value))
+		return -EBUSY;
 	do {
-		val = nval;
-		if ((val & mask_to_clear) != mask_to_clear)
+		present = nval;
+		if (unlikely((present & mask) ^ value))
 			return -EBUSY;
+		target =  present & ~mask;
 		cpu_relax();
-	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
-
+	} while ((nval = cmpxchg(addr, present, target)) != target);
 	return 0;
 }
 
-/*
- * bitmap_set_ll - set the specified number of bits at the specified position
+
+/**
+ * get_boundary - verify that an allocation effectively
+ * starts at the given address, then measure its length.
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start_entry: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
  *
- * Set @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users set the same bit, one user will return remain bits, otherwise
- * return 0.
+ * Return: the length of an allocation, otherwise -EINVAL if the
+ * parameters do not refer to a correct allocation.
  */
-static int bitmap_set_ll(unsigned long *map, int start, int nr)
+static int get_boundary(unsigned long *map, int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_set >= 0) {
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-		nr -= bits_to_set;
-		bits_to_set = BITS_PER_LONG;
-		mask_to_set = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
-		if (set_bits_ll(p, mask_to_set))
-			return nr;
-	}
+	int i;
+	unsigned long bitmap_entry;
 
-	return 0;
+
+	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
+		return -EINVAL;
+	for (i = start_entry + 1; i < nentries; i++) {
+		bitmap_entry = get_bitmap_entry(map, i);
+		if (bitmap_entry == ENTRY_HEAD ||
+		    bitmap_entry == ENTRY_UNUSED)
+			return i;
+	}
+	return nentries - start_entry;
 }
 
+
+#define SET_BITS 1
+#define CLEAR_BITS 0
+
 /*
- * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * alter_bitmap_ll - set or clear the entries associated with an allocation
+ * @alteration: indicates if the bits selected should be set or cleared
  * @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
+ *
+ * The modification happens lock-lessly.
+ * Several users can write to the same map simultaneously, without lock.
  *
- * Clear @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users clear the same bit, one user will return remain bits,
- * otherwise return 0.
+ * Return: If two users alter the same bit, to one it will return
+ * remaining entries, to the other it will return 0.
  */
-static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+static int alter_bitmap_ll(bool alteration, unsigned long *map,
+			   int start_entry, int nentries)
 {
-	unsigned long *p = map + BIT_WORD(start);
-	const int size = start + nr;
-	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
-	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
-
-	while (nr - bits_to_clear >= 0) {
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
-		nr -= bits_to_clear;
-		bits_to_clear = BITS_PER_LONG;
-		mask_to_clear = ~0UL;
-		p++;
-	}
-	if (nr) {
-		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
-		if (clear_bits_ll(p, mask_to_clear))
-			return nr;
+	unsigned long start_bit;
+	unsigned long end_bit;
+	unsigned long mask;
+	unsigned long value;
+	int nbits;
+	int bits_to_write;
+	int index;
+	int (*action)(unsigned long *addr,
+		      unsigned long mask, unsigned long value);
+
+	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
+
+	/*
+	 * Prepare for writing the initial part of the allocation, from
+	 * starting entry, to the end of the UL bitmap element which
+	 * contains it. It might be larger than the actual allocation.
+	 */
+	start_bit = ENTRIES_TO_BITS(start_entry);
+	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
+	nbits = ENTRIES_TO_BITS(nentries);
+	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
+	mask = BITMAP_FIRST_WORD_MASK(start_bit);
+	/* Mark the beginning of the allocation. */
+	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
+	index = BITS_DIV_LONGS(start_bit);
+
+	/*
+	 * Writes entries to the bitmap, as long as the reminder is
+	 * positive or zero.
+	 * Might be skipped if the entries to write do not reach the end
+	 * of a bitmap UL unit.
+	 */
+	while (nbits >= bits_to_write) {
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+		nbits -= bits_to_write;
+		bits_to_write = BITS_PER_LONG;
+		mask = ~0UL;
+		value = MASK;
+		index++;
 	}
 
+	/* Takes care of the ending part of the entries to mark. */
+	if (nbits > 0) {
+		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
+		bits_to_write = nbits;
+		if (action(map + index, mask, value & mask))
+			return BITS_DIV_ENTRIES(nbits);
+	}
 	return 0;
 }
 
+
 /**
  * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
+ * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
  * @nid: node id of the node the pool structure should be allocated on, or -1
  *
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface.
+ *
+ * Return: pointer to the pool, if successful, NULL otherwise
  */
 struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
 {
@@ -177,16 +357,18 @@ EXPORT_SYMBOL(gen_pool_create);
  *
  * Add a new chunk of special memory to the specified pool.
  *
- * Returns 0 on success or a -ve errno on failure.
+ * Return: 0 on success or a -ve errno on failure.
  */
 int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
 		 size_t size, int nid)
 {
 	struct gen_pool_chunk *chunk;
-	int nbits = size >> pool->min_alloc_order;
-	int nbytes = sizeof(struct gen_pool_chunk) +
-				BITS_TO_LONGS(nbits) * sizeof(long);
+	int nentries;
+	int nbytes;
 
+	nentries = size >> pool->min_alloc_order;
+	nbytes = sizeof(struct gen_pool_chunk) +
+		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
 	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
 	if (unlikely(chunk == NULL))
 		return -ENOMEM;
@@ -209,7 +391,7 @@ EXPORT_SYMBOL(gen_pool_add_virt);
  * @pool: pool to allocate from
  * @addr: starting address of memory
  *
- * Returns the physical address on success, or -1 on error.
+ * Return: the physical address on success, or -1 on error.
  */
 phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
 {
@@ -248,7 +430,7 @@ void gen_pool_destroy(struct gen_pool *pool)
 		list_del(&chunk->next_chunk);
 
 		end_bit = chunk_size(chunk) >> order;
-		bit = find_next_bit(chunk->bits, end_bit, 0);
+		bit = find_next_bit(chunk->entries, end_bit, 0);
 		BUG_ON(bit < end_bit);
 
 		kfree(chunk);
@@ -267,6 +449,8 @@ EXPORT_SYMBOL(gen_pool_destroy);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 {
@@ -285,6 +469,8 @@ EXPORT_SYMBOL(gen_pool_alloc);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 		genpool_algo_t algo, void *data)
@@ -292,7 +478,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	struct gen_pool_chunk *chunk;
 	unsigned long addr = 0;
 	int order = pool->min_alloc_order;
-	int nbits, start_bit, end_bit, remain;
+	int nentries, start_entry, end_entry, remain;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
@@ -301,29 +487,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
 	if (size == 0)
 		return 0;
 
-	nbits = (size + (1UL << order) - 1) >> order;
+	nentries = mem_to_units(size, order);
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (size > atomic_long_read(&chunk->avail))
 			continue;
 
-		start_bit = 0;
-		end_bit = chunk_size(chunk) >> order;
+		start_entry = 0;
+		end_entry = chunk_size(chunk) >> order;
 retry:
-		start_bit = algo(chunk->bits, end_bit, start_bit,
-				 nbits, data, pool);
-		if (start_bit >= end_bit)
+		start_entry = algo(chunk->entries, end_entry, start_entry,
+				  nentries, data, pool);
+		if (start_entry >= end_entry)
 			continue;
-		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
+					 start_entry, nentries);
 		if (remain) {
-			remain = bitmap_clear_ll(chunk->bits, start_bit,
-						 nbits - remain);
-			BUG_ON(remain);
+			remain = alter_bitmap_ll(CLEAR_BITS,
+						 chunk->entries,
+						 start_entry,
+						 nentries - remain);
 			goto retry;
 		}
 
-		addr = chunk->start_addr + ((unsigned long)start_bit << order);
-		size = nbits << order;
+		addr = chunk->start_addr +
+			((unsigned long)start_entry << order);
+		size = nentries << order;
 		atomic_long_sub(size, &chunk->avail);
 		break;
 	}
@@ -342,6 +531,8 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
  * Uses the pool allocation function (with first-fit algorithm by default).
  * Can not be used in NMI handler on architectures without
  * NMI-safe cmpxchg implementation.
+ *
+ * Return: address of the memory allocated, otherwise NULL
  */
 void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
 {
@@ -365,7 +556,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
  * gen_pool_free - free allocated special memory back to the pool
  * @pool: pool to free to
  * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * @size: size in bytes of memory to free or 0, for auto-detection
  *
  * Free previously allocated special memory back to the specified
  * pool.  Can not be used in NMI handler on architectures without
@@ -375,22 +566,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 {
 	struct gen_pool_chunk *chunk;
 	int order = pool->min_alloc_order;
-	int start_bit, nbits, remain;
+	int start_entry, remaining_entries, nentries, remain;
+	int boundary;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
 #endif
 
-	nbits = (size + (1UL << order) - 1) >> order;
 	rcu_read_lock();
 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
 		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
 			BUG_ON(addr + size - 1 > chunk->end_addr);
-			start_bit = (addr - chunk->start_addr) >> order;
-			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+			start_entry = (addr - chunk->start_addr) >> order;
+			remaining_entries = (chunk->end_addr - addr) >> order;
+			boundary = get_boundary(chunk->entries, start_entry,
+						remaining_entries);
+			BUG_ON(boundary < 0);
+			nentries = boundary - start_entry;
+			BUG_ON(size &&
+			       (nentries != mem_to_units(size, order)));
+			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
+						 start_entry, nentries);
 			BUG_ON(remain);
-			size = nbits << order;
-			atomic_long_add(size, &chunk->avail);
+			atomic_long_add(nentries << order, &chunk->avail);
 			rcu_read_unlock();
 			return;
 		}
@@ -428,8 +626,9 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk);
  * @start:	start address
  * @size:	size of the region
  *
- * Check if the range of addresses falls within the specified pool. Returns
- * true if the entire range is contained in the pool and false otherwise.
+ * Check if the range of addresses falls within the specified pool.
+ *
+ * Return: true if the entire range is contained in the pool, false otherwise.
  */
 bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
 			size_t size)
@@ -455,7 +654,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
  * gen_pool_avail - get available free space of the pool
  * @pool: pool to get available free space
  *
- * Return available free space of the specified pool.
+ * Return: available free space of the specified pool.
  */
 size_t gen_pool_avail(struct gen_pool *pool)
 {
@@ -474,7 +673,7 @@ EXPORT_SYMBOL_GPL(gen_pool_avail);
  * gen_pool_size - get size in bytes of memory managed by the pool
  * @pool: pool to get size
  *
- * Return size in bytes of memory managed by the pool.
+ * Return: size in bytes of memory managed by the pool.
  */
 size_t gen_pool_size(struct gen_pool *pool)
 {
@@ -517,17 +716,27 @@ EXPORT_SYMBOL(gen_pool_set_algo);
  * gen_pool_first_fit - find the first available region
  * of memory matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	return bitmap_find_next_zero_area(map, size, start, nr, 0);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit);
 
@@ -535,11 +744,13 @@ EXPORT_SYMBOL(gen_pool_first_fit);
  * gen_pool_first_fit_align - find the first available region
  * of memory matching the size requirement (alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -547,23 +758,32 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_align *alignment;
 	unsigned long align_mask;
+	unsigned long bit_index;
 	int order;
 
 	alignment = data;
 	order = pool->min_alloc_order;
-	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	align_mask = roundup_pow_of_two(
+			ENTRIES_TO_BITS(mem_to_units(alignment->align,
+						     order))) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_align);
 
 /**
  * gen_pool_fixed_alloc - reserve a specific region
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: data for alignment
  * @pool: pool to get order from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
@@ -571,20 +791,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
 {
 	struct genpool_data_fixed *fixed_data;
 	int order;
-	unsigned long offset_bit;
-	unsigned long start_bit;
+	unsigned long offset;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
 	fixed_data = data;
 	order = pool->min_alloc_order;
-	offset_bit = fixed_data->offset >> order;
 	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
 		return size;
+	offset = fixed_data->offset >> order;
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start + offset),
+					       ENTRIES_TO_BITS(nr), align_mask);
+	if (bit_index != ENTRIES_TO_BITS(offset))
+		return size;
 
-	start_bit = bitmap_find_next_zero_area(map, size,
-			start + offset_bit, nr, 0);
-	if (start_bit != offset_bit)
-		start_bit = size;
-	return start_bit;
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_fixed_alloc);
 
@@ -593,60 +816,84 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
  * of memory matching the size requirement. The region will be aligned
  * to the order of the size specified.
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_first_fit_order_align(unsigned long *map,
 		unsigned long size, unsigned long start,
 		unsigned int nr, void *data, struct gen_pool *pool)
 {
-	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
-
-	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+	unsigned long align_mask;
+	unsigned long bit_index;
+
+	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	return BITS_DIV_ENTRIES(bit_index);
 }
 EXPORT_SYMBOL(gen_pool_first_fit_order_align);
 
 /**
  * gen_pool_best_fit - find the best fitting region of memory
- * macthing the size requirement (no alignment constraint)
+ * matching the size requirement (no alignment constraint)
  * @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
  * @data: additional data - unused
  * @pool: pool to find the fit region memory from
  *
  * Iterate over the bitmap to find the smallest free region
  * which we can allocate the memory.
+ *
+ * Return: index of the memory allocated, otherwise the end of the range
  */
 unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
 		unsigned long start, unsigned int nr, void *data,
 		struct gen_pool *pool)
 {
-	unsigned long start_bit = size;
+	unsigned long start_bit = ENTRIES_TO_BITS(size);
 	unsigned long len = size + 1;
 	unsigned long index;
+	unsigned long align_mask;
+	unsigned long bit_index;
 
-	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
+	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+					       ENTRIES_TO_BITS(start),
+					       ENTRIES_TO_BITS(nr),
+					       align_mask);
+	index = BITS_DIV_ENTRIES(bit_index);
 
 	while (index < size) {
-		int next_bit = find_next_bit(map, size, index + nr);
-		if ((next_bit - index) < len) {
-			len = next_bit - index;
-			start_bit = index;
+		int next_bit;
+
+		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
+					 ENTRIES_TO_BITS(index + nr));
+		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
+			len = BITS_DIV_ENTRIES(next_bit) - index;
+			start_bit = ENTRIES_TO_BITS(index);
 			if (len == nr)
-				return start_bit;
+				return BITS_DIV_ENTRIES(start_bit);
 		}
-		index = bitmap_find_next_zero_area(map, size,
-						   next_bit + 1, nr, 0);
+		bit_index =
+			bitmap_find_next_zero_area(map,
+						   ENTRIES_TO_BITS(size),
+						   next_bit + 1,
+						   ENTRIES_TO_BITS(nr),
+						   align_mask);
+		index = BITS_DIV_ENTRIES(bit_index);
 	}
 
-	return start_bit;
+	return BITS_DIV_ENTRIES(start_bit);
 }
-EXPORT_SYMBOL(gen_pool_best_fit);
 
 static void devm_gen_pool_release(struct device *dev, void *res)
 {
@@ -672,7 +919,7 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
  * @dev: device to retrieve the gen_pool from
  * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
  *
- * Returns the gen_pool for the device if one is present, or NULL.
+ * Return: the gen_pool for the device if one is present, or NULL.
  */
 struct gen_pool *gen_pool_get(struct device *dev, const char *name)
 {
@@ -696,6 +943,8 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
  * Create a new special memory pool that can be used to manage special purpose
  * memory not managed by the regular kmalloc/kfree interface. The pool will be
  * automatically destroyed by the device management code.
+ *
+ * Return: the address of the pool, if successful, otherwise NULL
  */
 struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
 				      int nid, const char *name)
@@ -743,7 +992,7 @@ EXPORT_SYMBOL(devm_gen_pool_create);
  * @propname: property name containing phandle(s)
  * @index: index into the phandle array
  *
- * Returns the pool that contains the chunk starting at the physical
+ * Return: the pool that contains the chunk starting at the physical
  * address of the device tree node pointed at by the phandle property,
  * or NULL if not found.
  */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 2/6] genalloc: selftest
  2018-02-11  3:19 ` Igor Stoppa
  (?)
  (?)
@ 2018-02-11  3:19   ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc-selftest.h |  26 +++
 init/main.c                       |   2 +
 lib/Kconfig                       |  15 ++
 lib/Makefile                      |   1 +
 lib/genalloc-selftest.c           | 400 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 444 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h b/include/linux/genalloc-selftest.h
new file mode 100644
index 000000000000..d8b9ae56a54e
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __LINUX_GENALLOC_SELFTEST_H
+#define __LINUX_GENALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include <linux/genalloc.h>
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..fb844aa3eb8c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include <linux/io.h>
 #include <linux/cache.h>
 #include <linux/rodata_test.h>
+#include <linux/genalloc-selftest.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 	 */
 	mem_encrypt_init();
 
+	genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..0d526c004e81 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
 	bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+	bool "genalloc tester"
+	default n
+	select GENERIC_ALLOCATOR
+	help
+	  Enable automated testing of the generic allocator.
+	  The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+	bool "make the genalloc tester more verbose"
+	default n
+	select GENERIC_ALLOCATOR_SELFTEST
+	help
+	  More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..fadb30abde08 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000000000000..420ed067940b
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,400 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/vmalloc.h>
+#include <linux/string.h>
+#include <linux/debugfs.h>
+#include <linux/atomic.h>
+#include <linux/genalloc.h>
+
+#include <linux/genalloc-selftest.h>
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+	struct gen_pool_chunk *chunk;
+	char bitmap[BITMAP_SIZE_C * 2 + 1];
+	unsigned long i;
+	char *bm = bitmap;
+	char *entry;
+
+	if (unlikely(pool == NULL || pool->chunks.next == NULL))
+		return;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	entry = (void *)chunk->entries;
+	for (i = 1; i <= BITMAP_SIZE_C; i++)
+		bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+	*bm = '\0';
+	pr_notice("chunk: %p    bitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+	CMD_ALLOCATOR,
+	CMD_ALLOCATE,
+	CMD_FLUSH,
+	CMD_FREE,
+	CMD_NUMBER,
+	CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+	void *null;
+};
+
+struct test_allocator {
+	genpool_algo_t algo;
+	union {
+		struct genpool_data_align align;
+		struct genpool_data_fixed offset;
+		struct null_struct null;
+	} data;
+};
+
+struct test_action {
+	unsigned int location;
+	char pattern[BITMAP_SIZE_C];
+	unsigned int size;
+};
+
+
+struct test_command {
+	enum test_commands command;
+	union {
+		struct test_allocator allocator;
+		struct test_action action;
+	};
+};
+
+
+/*
+ * To pass an array literal as parameter to a macro, it must go through
+ * this one, first.
+ */
+#define ARR(...) __VA_ARGS__
+
+#define SET_DATA(parameter, value)	\
+	.parameter = {			\
+		.parameter = value,	\
+	}				\
+
+#define SET_ALLOCATOR(alloc, parameter, value)		\
+{							\
+	.command = CMD_ALLOCATOR,			\
+	.allocator = {					\
+		.algo = (alloc),			\
+		.data = {				\
+			SET_DATA(parameter, value),	\
+		},					\
+	}						\
+}
+
+#define ACTION_MEM(act, mem_size, mem_loc, match)	\
+{							\
+	.command = act,					\
+	.action = {					\
+		.size = (mem_size),			\
+		.location = (mem_loc),			\
+		.pattern = match,			\
+	},						\
+}
+
+#define ALLOCATE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_ALLOCATE, mem_size, mem_loc, ARR(match))
+
+#define FREE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_FREE, mem_size, mem_loc, ARR(match))
+
+#define FLUSH_MEM()		\
+{				\
+	.command = CMD_FLUSH,	\
+}
+
+#define END()			\
+{				\
+	.command = CMD_END,	\
+}
+
+static inline int compare_bitmaps(const struct gen_pool *pool,
+				   const char *reference)
+{
+	struct gen_pool_chunk *chunk;
+	char *bitmap;
+	unsigned int i;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	bitmap = (char *)chunk->entries;
+
+	for (i = 0; i < BITMAP_SIZE_C; i++)
+		if (bitmap[i] != reference[i])
+			return -1;
+	return 0;
+}
+
+static void callback_set_allocator(struct gen_pool *pool,
+				   const struct test_command *cmd,
+				   unsigned long *locations)
+{
+	gen_pool_set_algo(pool, cmd->allocator.algo,
+			  (void *)&cmd->allocator.data);
+}
+
+static void callback_allocate(struct gen_pool *pool,
+			      const struct test_command *cmd,
+			      unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	locations[action->location] = gen_pool_alloc(pool, action->size);
+	BUG_ON(!locations[action->location]);
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void callback_flush(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	unsigned int i;
+
+	for (i = 0; i < ENTRIES; i++)
+		if (locations[i]) {
+			gen_pool_free(pool, locations[i], 0);
+			locations[i] = 0;
+		}
+}
+
+static void callback_free(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	gen_pool_free(pool, locations[action->location], 0);
+	locations[action->location] = 0;
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void (* const callbacks[CMD_NUMBER])(struct gen_pool *,
+					    const struct test_command *,
+					    unsigned long *) = {
+	[CMD_ALLOCATOR] = callback_set_allocator,
+	[CMD_ALLOCATE] = callback_allocate,
+	[CMD_FREE] = callback_free,
+	[CMD_FLUSH] = callback_flush,
+};
+
+static const struct test_command test_first_fit[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(3, 0, ARR({0x2b})),
+	ALLOCATE_MEM(2, 1, ARR({0xeb, 0x02})),
+	ALLOCATE_MEM(5, 2, ARR({0xeb, 0xae, 0x0a})),
+	FREE_MEM(2, 1,  ARR({0x2b, 0xac, 0x0a})),
+	ALLOCATE_MEM(1, 1, ARR({0xeb, 0xac, 0x0a})),
+	FREE_MEM(0, 2,  ARR({0xeb})),
+	FREE_MEM(0, 0,  ARR({0xc0})),
+	FREE_MEM(0, 1,	ARR({0x00})),
+	END(),
+};
+
+/*
+ * To make the test work for both 32bit and 64bit ulong sizes,
+ * allocate (8 / 2 * 4 - 1) = 15 bytes bytes, then 16, then 2.
+ * The first allocation prepares for the crossing of the 32bit ulong
+ * threshold. The following crosses the 32bit threshold and prepares for
+ * crossing the 64bit thresholds. The last is large enough (2 bytes) to
+ * cross the 64bit threshold.
+ * Then free the allocations in the order: 2nd, 1st, 3rd.
+ */
+static const struct test_command test_ulong_span[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(15, 0, ARR({0xab, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(16, 1, ARR({0xab, 0xaa, 0xaa, 0xea,
+				0xaa, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xaa, 0xaa, 0xea,
+			       0xaa, 0xaa, 0xaa, 0xea,
+			       0x02})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0xaa, 0x2a,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x00, 0x00,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 2, ARR({0x00})),
+	END(),
+};
+
+/*
+ * Create progressively smaller allocations A B C D E.
+ * then free B and D.
+ * Then create new allocation that would fit in both of the gaps left by
+ * B and D. Verify that it uses the gap from B.
+ */
+static const struct test_command test_first_fit_gaps[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(10, 0, ARR({0xab, 0xaa, 0x0a})),
+	ALLOCATE_MEM(8, 1, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0x0a})),
+	ALLOCATE_MEM(6, 2, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa})),
+	ALLOCATE_MEM(4, 3, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab})),
+	ALLOCATE_MEM(2, 4, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 3, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0x00, 0x0b})),
+	ALLOCATE_MEM(3, 3, ARR({0xab, 0xaa, 0xba, 0x02,
+			       0xb0, 0xaa, 0x00, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+/* Test first fit align */
+static const struct test_command test_first_fit_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_align, align, 4),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0x02, 0x2b, 0x0b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0x02, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 2, ARR({0x00, 0x00, 0x2b, 0x00, 0x03})),
+	ALLOCATE_MEM(2, 0, ARR({0x0b, 0x00, 0x2b, 0x00, 0x03})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test fixed alloc */
+static const struct test_command test_fixed_data[] = {
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 1),
+	ALLOCATE_MEM(5, 0, ARR({0xac, 0x0a})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 8),
+	ALLOCATE_MEM(3, 1, ARR({0xac, 0x0a, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 6),
+	ALLOCATE_MEM(2, 2, ARR({0xac, 0xba, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 30),
+	ALLOCATE_MEM(40, 3, ARR({0xac, 0xba, 0x2b, 0x00,
+				0x00, 0x00, 0x00, 0xb0,
+				0xaa, 0xaa, 0xaa, 0xaa,
+				0xaa, 0xaa, 0xaa, 0xaa})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test first fit order align */
+static const struct test_command test_first_fit_order_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_order_align, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xb2, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xbe, 0x2b})),
+	ALLOCATE_MEM(1, 4, ARR({0xab, 0xbe, 0xeb})),
+	ALLOCATE_MEM(2, 5, ARR({0xab, 0xbe, 0xeb, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* 007 Test best fit */
+static const struct test_command test_best_fit[] = {
+	SET_ALLOCATOR(gen_pool_best_fit, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0xae})),
+	ALLOCATE_MEM(3, 2, ARR({0xab, 0xae, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xae, 0xeb})),
+	FREE_MEM(0, 0, ARR({0x00, 0xac, 0xeb})),
+	FREE_MEM(0, 2, ARR({0x00, 0xac, 0xc0})),
+	ALLOCATE_MEM(2, 0, ARR({0x00, 0xac, 0xcb})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+enum test_cases_indexes {
+	TEST_CASE_FIRST_FIT,
+	TEST_CASE_ULONG_SPAN,
+	TEST_CASE_FIRST_FIT_GAPS,
+	TEST_CASE_FIRST_FIT_ALIGN,
+	TEST_CASE_FIXED_DATA,
+	TEST_CASE_FIRST_FIT_ORDER_ALIGN,
+	TEST_CASE_BEST_FIT,
+	TEST_CASES_NUM,
+};
+
+static const struct test_command *test_cases[TEST_CASES_NUM] = {
+	[TEST_CASE_FIRST_FIT] = test_first_fit,
+	[TEST_CASE_ULONG_SPAN] = test_ulong_span,
+	[TEST_CASE_FIRST_FIT_GAPS] = test_first_fit_gaps,
+	[TEST_CASE_FIRST_FIT_ALIGN] = test_first_fit_align,
+	[TEST_CASE_FIXED_DATA] = test_fixed_data,
+	[TEST_CASE_FIRST_FIT_ORDER_ALIGN] = test_first_fit_order_align,
+	[TEST_CASE_BEST_FIT] = test_best_fit,
+};
+
+
+void genalloc_selftest(void)
+{
+	static struct gen_pool *pool;
+	unsigned long locations[ENTRIES];
+	char chunk[CHUNK_SIZE];
+	int retval;
+	unsigned int i;
+	const struct test_command *cmd;
+
+	pool = gen_pool_create(ALLOC_ORDER, -1);
+	if (unlikely(!pool)) {
+		pr_err("genalloc-selftest: no memory for pool.");
+		return;
+	}
+
+	retval = gen_pool_add_virt(pool, (unsigned long)chunk, 0,
+				   CHUNK_SIZE, -1);
+	if (unlikely(retval)) {
+		pr_err("genalloc-selftest: could not register chunk.");
+		goto destroy_pool;
+	}
+
+	memset(locations, 0, ENTRIES * sizeof(unsigned long));
+	for (i = 0; i < TEST_CASES_NUM; i++)
+		for (cmd = test_cases[i]; cmd->command < CMD_END; cmd++)
+			callbacks[cmd->command](pool, cmd, locations);
+	pr_notice("genalloc-selftest: executed successfully %d tests",
+		  TEST_CASES_NUM);
+
+destroy_pool:
+	gen_pool_destroy(pool);
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 2/6] genalloc: selftest
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: linux-security-module

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc-selftest.h |  26 +++
 init/main.c                       |   2 +
 lib/Kconfig                       |  15 ++
 lib/Makefile                      |   1 +
 lib/genalloc-selftest.c           | 400 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 444 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h b/include/linux/genalloc-selftest.h
new file mode 100644
index 000000000000..d8b9ae56a54e
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __LINUX_GENALLOC_SELFTEST_H
+#define __LINUX_GENALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include <linux/genalloc.h>
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..fb844aa3eb8c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include <linux/io.h>
 #include <linux/cache.h>
 #include <linux/rodata_test.h>
+#include <linux/genalloc-selftest.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 	 */
 	mem_encrypt_init();
 
+	genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..0d526c004e81 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
 	bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+	bool "genalloc tester"
+	default n
+	select GENERIC_ALLOCATOR
+	help
+	  Enable automated testing of the generic allocator.
+	  The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+	bool "make the genalloc tester more verbose"
+	default n
+	select GENERIC_ALLOCATOR_SELFTEST
+	help
+	  More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..fadb30abde08 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000000000000..420ed067940b
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,400 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/vmalloc.h>
+#include <linux/string.h>
+#include <linux/debugfs.h>
+#include <linux/atomic.h>
+#include <linux/genalloc.h>
+
+#include <linux/genalloc-selftest.h>
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+	struct gen_pool_chunk *chunk;
+	char bitmap[BITMAP_SIZE_C * 2 + 1];
+	unsigned long i;
+	char *bm = bitmap;
+	char *entry;
+
+	if (unlikely(pool == NULL || pool->chunks.next == NULL))
+		return;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	entry = (void *)chunk->entries;
+	for (i = 1; i <= BITMAP_SIZE_C; i++)
+		bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+	*bm = '\0';
+	pr_notice("chunk: %p    bitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+	CMD_ALLOCATOR,
+	CMD_ALLOCATE,
+	CMD_FLUSH,
+	CMD_FREE,
+	CMD_NUMBER,
+	CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+	void *null;
+};
+
+struct test_allocator {
+	genpool_algo_t algo;
+	union {
+		struct genpool_data_align align;
+		struct genpool_data_fixed offset;
+		struct null_struct null;
+	} data;
+};
+
+struct test_action {
+	unsigned int location;
+	char pattern[BITMAP_SIZE_C];
+	unsigned int size;
+};
+
+
+struct test_command {
+	enum test_commands command;
+	union {
+		struct test_allocator allocator;
+		struct test_action action;
+	};
+};
+
+
+/*
+ * To pass an array literal as parameter to a macro, it must go through
+ * this one, first.
+ */
+#define ARR(...) __VA_ARGS__
+
+#define SET_DATA(parameter, value)	\
+	.parameter = {			\
+		.parameter = value,	\
+	}				\
+
+#define SET_ALLOCATOR(alloc, parameter, value)		\
+{							\
+	.command = CMD_ALLOCATOR,			\
+	.allocator = {					\
+		.algo = (alloc),			\
+		.data = {				\
+			SET_DATA(parameter, value),	\
+		},					\
+	}						\
+}
+
+#define ACTION_MEM(act, mem_size, mem_loc, match)	\
+{							\
+	.command = act,					\
+	.action = {					\
+		.size = (mem_size),			\
+		.location = (mem_loc),			\
+		.pattern = match,			\
+	},						\
+}
+
+#define ALLOCATE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_ALLOCATE, mem_size, mem_loc, ARR(match))
+
+#define FREE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_FREE, mem_size, mem_loc, ARR(match))
+
+#define FLUSH_MEM()		\
+{				\
+	.command = CMD_FLUSH,	\
+}
+
+#define END()			\
+{				\
+	.command = CMD_END,	\
+}
+
+static inline int compare_bitmaps(const struct gen_pool *pool,
+				   const char *reference)
+{
+	struct gen_pool_chunk *chunk;
+	char *bitmap;
+	unsigned int i;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	bitmap = (char *)chunk->entries;
+
+	for (i = 0; i < BITMAP_SIZE_C; i++)
+		if (bitmap[i] != reference[i])
+			return -1;
+	return 0;
+}
+
+static void callback_set_allocator(struct gen_pool *pool,
+				   const struct test_command *cmd,
+				   unsigned long *locations)
+{
+	gen_pool_set_algo(pool, cmd->allocator.algo,
+			  (void *)&cmd->allocator.data);
+}
+
+static void callback_allocate(struct gen_pool *pool,
+			      const struct test_command *cmd,
+			      unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	locations[action->location] = gen_pool_alloc(pool, action->size);
+	BUG_ON(!locations[action->location]);
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void callback_flush(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	unsigned int i;
+
+	for (i = 0; i < ENTRIES; i++)
+		if (locations[i]) {
+			gen_pool_free(pool, locations[i], 0);
+			locations[i] = 0;
+		}
+}
+
+static void callback_free(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	gen_pool_free(pool, locations[action->location], 0);
+	locations[action->location] = 0;
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void (* const callbacks[CMD_NUMBER])(struct gen_pool *,
+					    const struct test_command *,
+					    unsigned long *) = {
+	[CMD_ALLOCATOR] = callback_set_allocator,
+	[CMD_ALLOCATE] = callback_allocate,
+	[CMD_FREE] = callback_free,
+	[CMD_FLUSH] = callback_flush,
+};
+
+static const struct test_command test_first_fit[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(3, 0, ARR({0x2b})),
+	ALLOCATE_MEM(2, 1, ARR({0xeb, 0x02})),
+	ALLOCATE_MEM(5, 2, ARR({0xeb, 0xae, 0x0a})),
+	FREE_MEM(2, 1,  ARR({0x2b, 0xac, 0x0a})),
+	ALLOCATE_MEM(1, 1, ARR({0xeb, 0xac, 0x0a})),
+	FREE_MEM(0, 2,  ARR({0xeb})),
+	FREE_MEM(0, 0,  ARR({0xc0})),
+	FREE_MEM(0, 1,	ARR({0x00})),
+	END(),
+};
+
+/*
+ * To make the test work for both 32bit and 64bit ulong sizes,
+ * allocate (8 / 2 * 4 - 1) = 15 bytes bytes, then 16, then 2.
+ * The first allocation prepares for the crossing of the 32bit ulong
+ * threshold. The following crosses the 32bit threshold and prepares for
+ * crossing the 64bit thresholds. The last is large enough (2 bytes) to
+ * cross the 64bit threshold.
+ * Then free the allocations in the order: 2nd, 1st, 3rd.
+ */
+static const struct test_command test_ulong_span[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(15, 0, ARR({0xab, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(16, 1, ARR({0xab, 0xaa, 0xaa, 0xea,
+				0xaa, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xaa, 0xaa, 0xea,
+			       0xaa, 0xaa, 0xaa, 0xea,
+			       0x02})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0xaa, 0x2a,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x00, 0x00,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 2, ARR({0x00})),
+	END(),
+};
+
+/*
+ * Create progressively smaller allocations A B C D E.
+ * then free B and D.
+ * Then create new allocation that would fit in both of the gaps left by
+ * B and D. Verify that it uses the gap from B.
+ */
+static const struct test_command test_first_fit_gaps[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(10, 0, ARR({0xab, 0xaa, 0x0a})),
+	ALLOCATE_MEM(8, 1, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0x0a})),
+	ALLOCATE_MEM(6, 2, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa})),
+	ALLOCATE_MEM(4, 3, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab})),
+	ALLOCATE_MEM(2, 4, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 3, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0x00, 0x0b})),
+	ALLOCATE_MEM(3, 3, ARR({0xab, 0xaa, 0xba, 0x02,
+			       0xb0, 0xaa, 0x00, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+/* Test first fit align */
+static const struct test_command test_first_fit_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_align, align, 4),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0x02, 0x2b, 0x0b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0x02, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 2, ARR({0x00, 0x00, 0x2b, 0x00, 0x03})),
+	ALLOCATE_MEM(2, 0, ARR({0x0b, 0x00, 0x2b, 0x00, 0x03})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test fixed alloc */
+static const struct test_command test_fixed_data[] = {
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 1),
+	ALLOCATE_MEM(5, 0, ARR({0xac, 0x0a})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 8),
+	ALLOCATE_MEM(3, 1, ARR({0xac, 0x0a, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 6),
+	ALLOCATE_MEM(2, 2, ARR({0xac, 0xba, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 30),
+	ALLOCATE_MEM(40, 3, ARR({0xac, 0xba, 0x2b, 0x00,
+				0x00, 0x00, 0x00, 0xb0,
+				0xaa, 0xaa, 0xaa, 0xaa,
+				0xaa, 0xaa, 0xaa, 0xaa})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test first fit order align */
+static const struct test_command test_first_fit_order_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_order_align, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xb2, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xbe, 0x2b})),
+	ALLOCATE_MEM(1, 4, ARR({0xab, 0xbe, 0xeb})),
+	ALLOCATE_MEM(2, 5, ARR({0xab, 0xbe, 0xeb, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* 007 Test best fit */
+static const struct test_command test_best_fit[] = {
+	SET_ALLOCATOR(gen_pool_best_fit, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0xae})),
+	ALLOCATE_MEM(3, 2, ARR({0xab, 0xae, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xae, 0xeb})),
+	FREE_MEM(0, 0, ARR({0x00, 0xac, 0xeb})),
+	FREE_MEM(0, 2, ARR({0x00, 0xac, 0xc0})),
+	ALLOCATE_MEM(2, 0, ARR({0x00, 0xac, 0xcb})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+enum test_cases_indexes {
+	TEST_CASE_FIRST_FIT,
+	TEST_CASE_ULONG_SPAN,
+	TEST_CASE_FIRST_FIT_GAPS,
+	TEST_CASE_FIRST_FIT_ALIGN,
+	TEST_CASE_FIXED_DATA,
+	TEST_CASE_FIRST_FIT_ORDER_ALIGN,
+	TEST_CASE_BEST_FIT,
+	TEST_CASES_NUM,
+};
+
+static const struct test_command *test_cases[TEST_CASES_NUM] = {
+	[TEST_CASE_FIRST_FIT] = test_first_fit,
+	[TEST_CASE_ULONG_SPAN] = test_ulong_span,
+	[TEST_CASE_FIRST_FIT_GAPS] = test_first_fit_gaps,
+	[TEST_CASE_FIRST_FIT_ALIGN] = test_first_fit_align,
+	[TEST_CASE_FIXED_DATA] = test_fixed_data,
+	[TEST_CASE_FIRST_FIT_ORDER_ALIGN] = test_first_fit_order_align,
+	[TEST_CASE_BEST_FIT] = test_best_fit,
+};
+
+
+void genalloc_selftest(void)
+{
+	static struct gen_pool *pool;
+	unsigned long locations[ENTRIES];
+	char chunk[CHUNK_SIZE];
+	int retval;
+	unsigned int i;
+	const struct test_command *cmd;
+
+	pool = gen_pool_create(ALLOC_ORDER, -1);
+	if (unlikely(!pool)) {
+		pr_err("genalloc-selftest: no memory for pool.");
+		return;
+	}
+
+	retval = gen_pool_add_virt(pool, (unsigned long)chunk, 0,
+				   CHUNK_SIZE, -1);
+	if (unlikely(retval)) {
+		pr_err("genalloc-selftest: could not register chunk.");
+		goto destroy_pool;
+	}
+
+	memset(locations, 0, ENTRIES * sizeof(unsigned long));
+	for (i = 0; i < TEST_CASES_NUM; i++)
+		for (cmd = test_cases[i]; cmd->command < CMD_END; cmd++)
+			callbacks[cmd->command](pool, cmd, locations);
+	pr_notice("genalloc-selftest: executed successfully %d tests",
+		  TEST_CASES_NUM);
+
+destroy_pool:
+	gen_pool_destroy(pool);
+}
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 2/6] genalloc: selftest
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc-selftest.h |  26 +++
 init/main.c                       |   2 +
 lib/Kconfig                       |  15 ++
 lib/Makefile                      |   1 +
 lib/genalloc-selftest.c           | 400 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 444 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h b/include/linux/genalloc-selftest.h
new file mode 100644
index 000000000000..d8b9ae56a54e
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __LINUX_GENALLOC_SELFTEST_H
+#define __LINUX_GENALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include <linux/genalloc.h>
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..fb844aa3eb8c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include <linux/io.h>
 #include <linux/cache.h>
 #include <linux/rodata_test.h>
+#include <linux/genalloc-selftest.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 	 */
 	mem_encrypt_init();
 
+	genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..0d526c004e81 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
 	bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+	bool "genalloc tester"
+	default n
+	select GENERIC_ALLOCATOR
+	help
+	  Enable automated testing of the generic allocator.
+	  The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+	bool "make the genalloc tester more verbose"
+	default n
+	select GENERIC_ALLOCATOR_SELFTEST
+	help
+	  More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..fadb30abde08 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000000000000..420ed067940b
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,400 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/vmalloc.h>
+#include <linux/string.h>
+#include <linux/debugfs.h>
+#include <linux/atomic.h>
+#include <linux/genalloc.h>
+
+#include <linux/genalloc-selftest.h>
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+	struct gen_pool_chunk *chunk;
+	char bitmap[BITMAP_SIZE_C * 2 + 1];
+	unsigned long i;
+	char *bm = bitmap;
+	char *entry;
+
+	if (unlikely(pool == NULL || pool->chunks.next == NULL))
+		return;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	entry = (void *)chunk->entries;
+	for (i = 1; i <= BITMAP_SIZE_C; i++)
+		bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+	*bm = '\0';
+	pr_notice("chunk: %p    bitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+	CMD_ALLOCATOR,
+	CMD_ALLOCATE,
+	CMD_FLUSH,
+	CMD_FREE,
+	CMD_NUMBER,
+	CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+	void *null;
+};
+
+struct test_allocator {
+	genpool_algo_t algo;
+	union {
+		struct genpool_data_align align;
+		struct genpool_data_fixed offset;
+		struct null_struct null;
+	} data;
+};
+
+struct test_action {
+	unsigned int location;
+	char pattern[BITMAP_SIZE_C];
+	unsigned int size;
+};
+
+
+struct test_command {
+	enum test_commands command;
+	union {
+		struct test_allocator allocator;
+		struct test_action action;
+	};
+};
+
+
+/*
+ * To pass an array literal as parameter to a macro, it must go through
+ * this one, first.
+ */
+#define ARR(...) __VA_ARGS__
+
+#define SET_DATA(parameter, value)	\
+	.parameter = {			\
+		.parameter = value,	\
+	}				\
+
+#define SET_ALLOCATOR(alloc, parameter, value)		\
+{							\
+	.command = CMD_ALLOCATOR,			\
+	.allocator = {					\
+		.algo = (alloc),			\
+		.data = {				\
+			SET_DATA(parameter, value),	\
+		},					\
+	}						\
+}
+
+#define ACTION_MEM(act, mem_size, mem_loc, match)	\
+{							\
+	.command = act,					\
+	.action = {					\
+		.size = (mem_size),			\
+		.location = (mem_loc),			\
+		.pattern = match,			\
+	},						\
+}
+
+#define ALLOCATE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_ALLOCATE, mem_size, mem_loc, ARR(match))
+
+#define FREE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_FREE, mem_size, mem_loc, ARR(match))
+
+#define FLUSH_MEM()		\
+{				\
+	.command = CMD_FLUSH,	\
+}
+
+#define END()			\
+{				\
+	.command = CMD_END,	\
+}
+
+static inline int compare_bitmaps(const struct gen_pool *pool,
+				   const char *reference)
+{
+	struct gen_pool_chunk *chunk;
+	char *bitmap;
+	unsigned int i;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	bitmap = (char *)chunk->entries;
+
+	for (i = 0; i < BITMAP_SIZE_C; i++)
+		if (bitmap[i] != reference[i])
+			return -1;
+	return 0;
+}
+
+static void callback_set_allocator(struct gen_pool *pool,
+				   const struct test_command *cmd,
+				   unsigned long *locations)
+{
+	gen_pool_set_algo(pool, cmd->allocator.algo,
+			  (void *)&cmd->allocator.data);
+}
+
+static void callback_allocate(struct gen_pool *pool,
+			      const struct test_command *cmd,
+			      unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	locations[action->location] = gen_pool_alloc(pool, action->size);
+	BUG_ON(!locations[action->location]);
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void callback_flush(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	unsigned int i;
+
+	for (i = 0; i < ENTRIES; i++)
+		if (locations[i]) {
+			gen_pool_free(pool, locations[i], 0);
+			locations[i] = 0;
+		}
+}
+
+static void callback_free(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	gen_pool_free(pool, locations[action->location], 0);
+	locations[action->location] = 0;
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void (* const callbacks[CMD_NUMBER])(struct gen_pool *,
+					    const struct test_command *,
+					    unsigned long *) = {
+	[CMD_ALLOCATOR] = callback_set_allocator,
+	[CMD_ALLOCATE] = callback_allocate,
+	[CMD_FREE] = callback_free,
+	[CMD_FLUSH] = callback_flush,
+};
+
+static const struct test_command test_first_fit[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(3, 0, ARR({0x2b})),
+	ALLOCATE_MEM(2, 1, ARR({0xeb, 0x02})),
+	ALLOCATE_MEM(5, 2, ARR({0xeb, 0xae, 0x0a})),
+	FREE_MEM(2, 1,  ARR({0x2b, 0xac, 0x0a})),
+	ALLOCATE_MEM(1, 1, ARR({0xeb, 0xac, 0x0a})),
+	FREE_MEM(0, 2,  ARR({0xeb})),
+	FREE_MEM(0, 0,  ARR({0xc0})),
+	FREE_MEM(0, 1,	ARR({0x00})),
+	END(),
+};
+
+/*
+ * To make the test work for both 32bit and 64bit ulong sizes,
+ * allocate (8 / 2 * 4 - 1) = 15 bytes bytes, then 16, then 2.
+ * The first allocation prepares for the crossing of the 32bit ulong
+ * threshold. The following crosses the 32bit threshold and prepares for
+ * crossing the 64bit thresholds. The last is large enough (2 bytes) to
+ * cross the 64bit threshold.
+ * Then free the allocations in the order: 2nd, 1st, 3rd.
+ */
+static const struct test_command test_ulong_span[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(15, 0, ARR({0xab, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(16, 1, ARR({0xab, 0xaa, 0xaa, 0xea,
+				0xaa, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xaa, 0xaa, 0xea,
+			       0xaa, 0xaa, 0xaa, 0xea,
+			       0x02})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0xaa, 0x2a,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x00, 0x00,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 2, ARR({0x00})),
+	END(),
+};
+
+/*
+ * Create progressively smaller allocations A B C D E.
+ * then free B and D.
+ * Then create new allocation that would fit in both of the gaps left by
+ * B and D. Verify that it uses the gap from B.
+ */
+static const struct test_command test_first_fit_gaps[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(10, 0, ARR({0xab, 0xaa, 0x0a})),
+	ALLOCATE_MEM(8, 1, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0x0a})),
+	ALLOCATE_MEM(6, 2, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa})),
+	ALLOCATE_MEM(4, 3, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab})),
+	ALLOCATE_MEM(2, 4, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 3, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0x00, 0x0b})),
+	ALLOCATE_MEM(3, 3, ARR({0xab, 0xaa, 0xba, 0x02,
+			       0xb0, 0xaa, 0x00, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+/* Test first fit align */
+static const struct test_command test_first_fit_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_align, align, 4),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0x02, 0x2b, 0x0b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0x02, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 2, ARR({0x00, 0x00, 0x2b, 0x00, 0x03})),
+	ALLOCATE_MEM(2, 0, ARR({0x0b, 0x00, 0x2b, 0x00, 0x03})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test fixed alloc */
+static const struct test_command test_fixed_data[] = {
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 1),
+	ALLOCATE_MEM(5, 0, ARR({0xac, 0x0a})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 8),
+	ALLOCATE_MEM(3, 1, ARR({0xac, 0x0a, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 6),
+	ALLOCATE_MEM(2, 2, ARR({0xac, 0xba, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 30),
+	ALLOCATE_MEM(40, 3, ARR({0xac, 0xba, 0x2b, 0x00,
+				0x00, 0x00, 0x00, 0xb0,
+				0xaa, 0xaa, 0xaa, 0xaa,
+				0xaa, 0xaa, 0xaa, 0xaa})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test first fit order align */
+static const struct test_command test_first_fit_order_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_order_align, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xb2, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xbe, 0x2b})),
+	ALLOCATE_MEM(1, 4, ARR({0xab, 0xbe, 0xeb})),
+	ALLOCATE_MEM(2, 5, ARR({0xab, 0xbe, 0xeb, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* 007 Test best fit */
+static const struct test_command test_best_fit[] = {
+	SET_ALLOCATOR(gen_pool_best_fit, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0xae})),
+	ALLOCATE_MEM(3, 2, ARR({0xab, 0xae, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xae, 0xeb})),
+	FREE_MEM(0, 0, ARR({0x00, 0xac, 0xeb})),
+	FREE_MEM(0, 2, ARR({0x00, 0xac, 0xc0})),
+	ALLOCATE_MEM(2, 0, ARR({0x00, 0xac, 0xcb})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+enum test_cases_indexes {
+	TEST_CASE_FIRST_FIT,
+	TEST_CASE_ULONG_SPAN,
+	TEST_CASE_FIRST_FIT_GAPS,
+	TEST_CASE_FIRST_FIT_ALIGN,
+	TEST_CASE_FIXED_DATA,
+	TEST_CASE_FIRST_FIT_ORDER_ALIGN,
+	TEST_CASE_BEST_FIT,
+	TEST_CASES_NUM,
+};
+
+static const struct test_command *test_cases[TEST_CASES_NUM] = {
+	[TEST_CASE_FIRST_FIT] = test_first_fit,
+	[TEST_CASE_ULONG_SPAN] = test_ulong_span,
+	[TEST_CASE_FIRST_FIT_GAPS] = test_first_fit_gaps,
+	[TEST_CASE_FIRST_FIT_ALIGN] = test_first_fit_align,
+	[TEST_CASE_FIXED_DATA] = test_fixed_data,
+	[TEST_CASE_FIRST_FIT_ORDER_ALIGN] = test_first_fit_order_align,
+	[TEST_CASE_BEST_FIT] = test_best_fit,
+};
+
+
+void genalloc_selftest(void)
+{
+	static struct gen_pool *pool;
+	unsigned long locations[ENTRIES];
+	char chunk[CHUNK_SIZE];
+	int retval;
+	unsigned int i;
+	const struct test_command *cmd;
+
+	pool = gen_pool_create(ALLOC_ORDER, -1);
+	if (unlikely(!pool)) {
+		pr_err("genalloc-selftest: no memory for pool.");
+		return;
+	}
+
+	retval = gen_pool_add_virt(pool, (unsigned long)chunk, 0,
+				   CHUNK_SIZE, -1);
+	if (unlikely(retval)) {
+		pr_err("genalloc-selftest: could not register chunk.");
+		goto destroy_pool;
+	}
+
+	memset(locations, 0, ENTRIES * sizeof(unsigned long));
+	for (i = 0; i < TEST_CASES_NUM; i++)
+		for (cmd = test_cases[i]; cmd->command < CMD_END; cmd++)
+			callbacks[cmd->command](pool, cmd, locations);
+	pr_notice("genalloc-selftest: executed successfully %d tests",
+		  TEST_CASES_NUM);
+
+destroy_pool:
+	gen_pool_destroy(pool);
+}
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 2/6] genalloc: selftest
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc-selftest.h |  26 +++
 init/main.c                       |   2 +
 lib/Kconfig                       |  15 ++
 lib/Makefile                      |   1 +
 lib/genalloc-selftest.c           | 400 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 444 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h b/include/linux/genalloc-selftest.h
new file mode 100644
index 000000000000..d8b9ae56a54e
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __LINUX_GENALLOC_SELFTEST_H
+#define __LINUX_GENALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include <linux/genalloc.h>
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..fb844aa3eb8c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include <linux/io.h>
 #include <linux/cache.h>
 #include <linux/rodata_test.h>
+#include <linux/genalloc-selftest.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 	 */
 	mem_encrypt_init();
 
+	genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..0d526c004e81 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
 	bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+	bool "genalloc tester"
+	default n
+	select GENERIC_ALLOCATOR
+	help
+	  Enable automated testing of the generic allocator.
+	  The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+	bool "make the genalloc tester more verbose"
+	default n
+	select GENERIC_ALLOCATOR_SELFTEST
+	help
+	  More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..fadb30abde08 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000000000000..420ed067940b
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,400 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/vmalloc.h>
+#include <linux/string.h>
+#include <linux/debugfs.h>
+#include <linux/atomic.h>
+#include <linux/genalloc.h>
+
+#include <linux/genalloc-selftest.h>
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+	struct gen_pool_chunk *chunk;
+	char bitmap[BITMAP_SIZE_C * 2 + 1];
+	unsigned long i;
+	char *bm = bitmap;
+	char *entry;
+
+	if (unlikely(pool == NULL || pool->chunks.next == NULL))
+		return;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	entry = (void *)chunk->entries;
+	for (i = 1; i <= BITMAP_SIZE_C; i++)
+		bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+	*bm = '\0';
+	pr_notice("chunk: %p    bitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+	CMD_ALLOCATOR,
+	CMD_ALLOCATE,
+	CMD_FLUSH,
+	CMD_FREE,
+	CMD_NUMBER,
+	CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+	void *null;
+};
+
+struct test_allocator {
+	genpool_algo_t algo;
+	union {
+		struct genpool_data_align align;
+		struct genpool_data_fixed offset;
+		struct null_struct null;
+	} data;
+};
+
+struct test_action {
+	unsigned int location;
+	char pattern[BITMAP_SIZE_C];
+	unsigned int size;
+};
+
+
+struct test_command {
+	enum test_commands command;
+	union {
+		struct test_allocator allocator;
+		struct test_action action;
+	};
+};
+
+
+/*
+ * To pass an array literal as parameter to a macro, it must go through
+ * this one, first.
+ */
+#define ARR(...) __VA_ARGS__
+
+#define SET_DATA(parameter, value)	\
+	.parameter = {			\
+		.parameter = value,	\
+	}				\
+
+#define SET_ALLOCATOR(alloc, parameter, value)		\
+{							\
+	.command = CMD_ALLOCATOR,			\
+	.allocator = {					\
+		.algo = (alloc),			\
+		.data = {				\
+			SET_DATA(parameter, value),	\
+		},					\
+	}						\
+}
+
+#define ACTION_MEM(act, mem_size, mem_loc, match)	\
+{							\
+	.command = act,					\
+	.action = {					\
+		.size = (mem_size),			\
+		.location = (mem_loc),			\
+		.pattern = match,			\
+	},						\
+}
+
+#define ALLOCATE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_ALLOCATE, mem_size, mem_loc, ARR(match))
+
+#define FREE_MEM(mem_size, mem_loc, match)	\
+	ACTION_MEM(CMD_FREE, mem_size, mem_loc, ARR(match))
+
+#define FLUSH_MEM()		\
+{				\
+	.command = CMD_FLUSH,	\
+}
+
+#define END()			\
+{				\
+	.command = CMD_END,	\
+}
+
+static inline int compare_bitmaps(const struct gen_pool *pool,
+				   const char *reference)
+{
+	struct gen_pool_chunk *chunk;
+	char *bitmap;
+	unsigned int i;
+
+	chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+			     next_chunk);
+	bitmap = (char *)chunk->entries;
+
+	for (i = 0; i < BITMAP_SIZE_C; i++)
+		if (bitmap[i] != reference[i])
+			return -1;
+	return 0;
+}
+
+static void callback_set_allocator(struct gen_pool *pool,
+				   const struct test_command *cmd,
+				   unsigned long *locations)
+{
+	gen_pool_set_algo(pool, cmd->allocator.algo,
+			  (void *)&cmd->allocator.data);
+}
+
+static void callback_allocate(struct gen_pool *pool,
+			      const struct test_command *cmd,
+			      unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	locations[action->location] = gen_pool_alloc(pool, action->size);
+	BUG_ON(!locations[action->location]);
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void callback_flush(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	unsigned int i;
+
+	for (i = 0; i < ENTRIES; i++)
+		if (locations[i]) {
+			gen_pool_free(pool, locations[i], 0);
+			locations[i] = 0;
+		}
+}
+
+static void callback_free(struct gen_pool *pool,
+			  const struct test_command *cmd,
+			  unsigned long *locations)
+{
+	const struct test_action *action = &cmd->action;
+
+	gen_pool_free(pool, locations[action->location], 0);
+	locations[action->location] = 0;
+	print_first_chunk_bitmap(pool);
+	BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void (* const callbacks[CMD_NUMBER])(struct gen_pool *,
+					    const struct test_command *,
+					    unsigned long *) = {
+	[CMD_ALLOCATOR] = callback_set_allocator,
+	[CMD_ALLOCATE] = callback_allocate,
+	[CMD_FREE] = callback_free,
+	[CMD_FLUSH] = callback_flush,
+};
+
+static const struct test_command test_first_fit[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(3, 0, ARR({0x2b})),
+	ALLOCATE_MEM(2, 1, ARR({0xeb, 0x02})),
+	ALLOCATE_MEM(5, 2, ARR({0xeb, 0xae, 0x0a})),
+	FREE_MEM(2, 1,  ARR({0x2b, 0xac, 0x0a})),
+	ALLOCATE_MEM(1, 1, ARR({0xeb, 0xac, 0x0a})),
+	FREE_MEM(0, 2,  ARR({0xeb})),
+	FREE_MEM(0, 0,  ARR({0xc0})),
+	FREE_MEM(0, 1,	ARR({0x00})),
+	END(),
+};
+
+/*
+ * To make the test work for both 32bit and 64bit ulong sizes,
+ * allocate (8 / 2 * 4 - 1) = 15 bytes bytes, then 16, then 2.
+ * The first allocation prepares for the crossing of the 32bit ulong
+ * threshold. The following crosses the 32bit threshold and prepares for
+ * crossing the 64bit thresholds. The last is large enough (2 bytes) to
+ * cross the 64bit threshold.
+ * Then free the allocations in the order: 2nd, 1st, 3rd.
+ */
+static const struct test_command test_ulong_span[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(15, 0, ARR({0xab, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(16, 1, ARR({0xab, 0xaa, 0xaa, 0xea,
+				0xaa, 0xaa, 0xaa, 0x2a})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xaa, 0xaa, 0xea,
+			       0xaa, 0xaa, 0xaa, 0xea,
+			       0x02})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0xaa, 0x2a,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x00, 0x00,
+			   0x00, 0x00, 0x00, 0xc0,
+			   0x02})),
+	FREE_MEM(0, 2, ARR({0x00})),
+	END(),
+};
+
+/*
+ * Create progressively smaller allocations A B C D E.
+ * then free B and D.
+ * Then create new allocation that would fit in both of the gaps left by
+ * B and D. Verify that it uses the gap from B.
+ */
+static const struct test_command test_first_fit_gaps[] = {
+	SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+	ALLOCATE_MEM(10, 0, ARR({0xab, 0xaa, 0x0a})),
+	ALLOCATE_MEM(8, 1, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0x0a})),
+	ALLOCATE_MEM(6, 2, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa})),
+	ALLOCATE_MEM(4, 3, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab})),
+	ALLOCATE_MEM(2, 4, ARR({0xab, 0xaa, 0xba, 0xaa,
+			       0xba, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 1, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0xab, 0x0b})),
+	FREE_MEM(0, 3, ARR({0xab, 0xaa, 0x0a, 0x00,
+			   0xb0, 0xaa, 0x00, 0x0b})),
+	ALLOCATE_MEM(3, 3, ARR({0xab, 0xaa, 0xba, 0x02,
+			       0xb0, 0xaa, 0x00, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+/* Test first fit align */
+static const struct test_command test_first_fit_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_align, align, 4),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0x02, 0x2b, 0x0b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0x02, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 0, ARR({0x00, 0x00, 0x2b, 0x0b, 0x03})),
+	FREE_MEM(0, 2, ARR({0x00, 0x00, 0x2b, 0x00, 0x03})),
+	ALLOCATE_MEM(2, 0, ARR({0x0b, 0x00, 0x2b, 0x00, 0x03})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test fixed alloc */
+static const struct test_command test_fixed_data[] = {
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 1),
+	ALLOCATE_MEM(5, 0, ARR({0xac, 0x0a})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 8),
+	ALLOCATE_MEM(3, 1, ARR({0xac, 0x0a, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 6),
+	ALLOCATE_MEM(2, 2, ARR({0xac, 0xba, 0x2b})),
+	SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 30),
+	ALLOCATE_MEM(40, 3, ARR({0xac, 0xba, 0x2b, 0x00,
+				0x00, 0x00, 0x00, 0xb0,
+				0xaa, 0xaa, 0xaa, 0xaa,
+				0xaa, 0xaa, 0xaa, 0xaa})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* Test first fit order align */
+static const struct test_command test_first_fit_order_align[] = {
+	SET_ALLOCATOR(gen_pool_first_fit_order_align, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+	ALLOCATE_MEM(2, 2, ARR({0xab, 0xb2, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xbe, 0x2b})),
+	ALLOCATE_MEM(1, 4, ARR({0xab, 0xbe, 0xeb})),
+	ALLOCATE_MEM(2, 5, ARR({0xab, 0xbe, 0xeb, 0x0b})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+/* 007 Test best fit */
+static const struct test_command test_best_fit[] = {
+	SET_ALLOCATOR(gen_pool_best_fit, null, NULL),
+	ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+	ALLOCATE_MEM(3, 1, ARR({0xab, 0xae})),
+	ALLOCATE_MEM(3, 2, ARR({0xab, 0xae, 0x2b})),
+	ALLOCATE_MEM(1, 3, ARR({0xab, 0xae, 0xeb})),
+	FREE_MEM(0, 0, ARR({0x00, 0xac, 0xeb})),
+	FREE_MEM(0, 2, ARR({0x00, 0xac, 0xc0})),
+	ALLOCATE_MEM(2, 0, ARR({0x00, 0xac, 0xcb})),
+	FLUSH_MEM(),
+	END(),
+};
+
+
+enum test_cases_indexes {
+	TEST_CASE_FIRST_FIT,
+	TEST_CASE_ULONG_SPAN,
+	TEST_CASE_FIRST_FIT_GAPS,
+	TEST_CASE_FIRST_FIT_ALIGN,
+	TEST_CASE_FIXED_DATA,
+	TEST_CASE_FIRST_FIT_ORDER_ALIGN,
+	TEST_CASE_BEST_FIT,
+	TEST_CASES_NUM,
+};
+
+static const struct test_command *test_cases[TEST_CASES_NUM] = {
+	[TEST_CASE_FIRST_FIT] = test_first_fit,
+	[TEST_CASE_ULONG_SPAN] = test_ulong_span,
+	[TEST_CASE_FIRST_FIT_GAPS] = test_first_fit_gaps,
+	[TEST_CASE_FIRST_FIT_ALIGN] = test_first_fit_align,
+	[TEST_CASE_FIXED_DATA] = test_fixed_data,
+	[TEST_CASE_FIRST_FIT_ORDER_ALIGN] = test_first_fit_order_align,
+	[TEST_CASE_BEST_FIT] = test_best_fit,
+};
+
+
+void genalloc_selftest(void)
+{
+	static struct gen_pool *pool;
+	unsigned long locations[ENTRIES];
+	char chunk[CHUNK_SIZE];
+	int retval;
+	unsigned int i;
+	const struct test_command *cmd;
+
+	pool = gen_pool_create(ALLOC_ORDER, -1);
+	if (unlikely(!pool)) {
+		pr_err("genalloc-selftest: no memory for pool.");
+		return;
+	}
+
+	retval = gen_pool_add_virt(pool, (unsigned long)chunk, 0,
+				   CHUNK_SIZE, -1);
+	if (unlikely(retval)) {
+		pr_err("genalloc-selftest: could not register chunk.");
+		goto destroy_pool;
+	}
+
+	memset(locations, 0, ENTRIES * sizeof(unsigned long));
+	for (i = 0; i < TEST_CASES_NUM; i++)
+		for (cmd = test_cases[i]; cmd->command < CMD_END; cmd++)
+			callbacks[cmd->command](pool, cmd, locations);
+	pr_notice("genalloc-selftest: executed successfully %d tests",
+		  TEST_CASES_NUM);
+
+destroy_pool:
+	gen_pool_destroy(pool);
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
  2018-02-11  3:19 ` Igor Stoppa
  (?)
  (?)
@ 2018-02-11  3:19   ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c             | 18 +++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
 		void *s_mem;			/* slab first object */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
+		struct vm_struct *area;
 	};
 
 	/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..9404ffd0ee98 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-	struct vmap_area *va;
+	struct page *page;
 
-	va = find_vmap_area((unsigned long)addr);
-	if (va && va->flags & VM_VM_AREA)
-		return va->vm;
+	if (unlikely(!is_vmalloc_addr(addr)))
+		return NULL;
 
-	return NULL;
+	page = vmalloc_to_page(addr);
+	if (unlikely(!page))
+		return NULL;
+
+	return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
 			struct page *page = area->pages[i];
 
 			BUG_ON(!page);
+			page->area = NULL;
 			__free_pages(page, 0);
 		}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller)
 {
 	struct vm_struct *area;
+	unsigned int i;
 	void *addr;
 	unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 
 	kmemleak_vmalloc(area, size, gfp_mask);
 
+	for (i = 0; i < area->nr_pages; i++)
+		area->pages[i]->area = area;
+
 	return addr;
 
 fail:
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: linux-security-module

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c             | 18 +++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
 		void *s_mem;			/* slab first object */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
+		struct vm_struct *area;
 	};
 
 	/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..9404ffd0ee98 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-	struct vmap_area *va;
+	struct page *page;
 
-	va = find_vmap_area((unsigned long)addr);
-	if (va && va->flags & VM_VM_AREA)
-		return va->vm;
+	if (unlikely(!is_vmalloc_addr(addr)))
+		return NULL;
 
-	return NULL;
+	page = vmalloc_to_page(addr);
+	if (unlikely(!page))
+		return NULL;
+
+	return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
 			struct page *page = area->pages[i];
 
 			BUG_ON(!page);
+			page->area = NULL;
 			__free_pages(page, 0);
 		}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller)
 {
 	struct vm_struct *area;
+	unsigned int i;
 	void *addr;
 	unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 
 	kmemleak_vmalloc(area, size, gfp_mask);
 
+	for (i = 0; i < area->nr_pages; i++)
+		area->pages[i]->area = area;
+
 	return addr;
 
 fail:
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c             | 18 +++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
 		void *s_mem;			/* slab first object */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
+		struct vm_struct *area;
 	};
 
 	/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..9404ffd0ee98 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-	struct vmap_area *va;
+	struct page *page;
 
-	va = find_vmap_area((unsigned long)addr);
-	if (va && va->flags & VM_VM_AREA)
-		return va->vm;
+	if (unlikely(!is_vmalloc_addr(addr)))
+		return NULL;
 
-	return NULL;
+	page = vmalloc_to_page(addr);
+	if (unlikely(!page))
+		return NULL;
+
+	return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
 			struct page *page = area->pages[i];
 
 			BUG_ON(!page);
+			page->area = NULL;
 			__free_pages(page, 0);
 		}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller)
 {
 	struct vm_struct *area;
+	unsigned int i;
 	void *addr;
 	unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 
 	kmemleak_vmalloc(area, size, gfp_mask);
 
+	for (i = 0; i < area->nr_pages; i++)
+		area->pages[i]->area = area;
+
 	return addr;
 
 fail:
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c             | 18 +++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
 		void *s_mem;			/* slab first object */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
+		struct vm_struct *area;
 	};
 
 	/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..9404ffd0ee98 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-	struct vmap_area *va;
+	struct page *page;
 
-	va = find_vmap_area((unsigned long)addr);
-	if (va && va->flags & VM_VM_AREA)
-		return va->vm;
+	if (unlikely(!is_vmalloc_addr(addr)))
+		return NULL;
 
-	return NULL;
+	page = vmalloc_to_page(addr);
+	if (unlikely(!page))
+		return NULL;
+
+	return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
 			struct page *page = area->pages[i];
 
 			BUG_ON(!page);
+			page->area = NULL;
 			__free_pages(page, 0);
 		}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller)
 {
 	struct vm_struct *area;
+	unsigned int i;
 	void *addr;
 	unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 
 	kmemleak_vmalloc(area, size, gfp_mask);
 
+	for (i = 0; i < area->nr_pages; i++)
+		area->pages[i]->area = area;
+
 	return addr;
 
 fail:
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
  2018-02-11  3:19 ` Igor Stoppa
  (?)
  (?)
@ 2018-02-11  3:19   ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 222 +++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 497 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 790 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..624379a937c5
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return: pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
+ * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return: true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return: pointer to the memory requested upon success, NULL otherwise
+ * (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return: pointer to the zeroed memory requested, upon success, NULL
+ * otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return: pointer to the replica, NULL in case of error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return: 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 044347163acb..e40a5db89439 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -598,6 +598,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..e94bfb407c92
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,497 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: linux-security-module

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 222 +++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 497 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 790 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..624379a937c5
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return: pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
+ * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return: true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return: pointer to the memory requested upon success, NULL otherwise
+ * (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return: pointer to the zeroed memory requested, upon success, NULL
+ * otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return: pointer to the replica, NULL in case of error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return: 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 044347163acb..e40a5db89439 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -598,6 +598,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..e94bfb407c92
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,497 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info@ http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 222 +++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 497 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 790 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..624379a937c5
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return: pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
+ * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return: true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return: pointer to the memory requested upon success, NULL otherwise
+ * (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return: pointer to the zeroed memory requested, upon success, NULL
+ * otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return: pointer to the replica, NULL in case of error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return: 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 044347163acb..e40a5db89439 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -598,6 +598,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..e94bfb407c92
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,497 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 222 +++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 497 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 790 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..624379a937c5
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return: pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
+ * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return: true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return: pointer to the memory requested upon success, NULL otherwise
+ * (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return: pointer to the zeroed memory requested, upon success, NULL
+ * otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return: either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return: pointer to the replica, NULL in case of error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return: 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 044347163acb..e40a5db89439 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -598,6 +598,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..e94bfb407c92
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,497 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 5/6] Pmalloc: self-test
  2018-02-11  3:19 ` Igor Stoppa
  (?)
  (?)
@ 2018-02-11  3:19   ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 mm/Kconfig            |  9 ++++++++
 mm/Makefile           |  1 +
 mm/pmalloc-selftest.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/pmalloc-selftest.h | 24 ++++++++++++++++++++
 mm/pmalloc.c          |  2 ++
 5 files changed, 99 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index be578fbdce6d..098aefef78b1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
     depends on ARCH_HAS_SET_MEMORY
     select GENERIC_ALLOCATOR
     default y
+
+config PROTECTABLE_MEMORY_SELFTEST
+	bool "Run self test for pmalloc memory allocator"
+	depends on ARCH_HAS_SET_MEMORY
+	select PROTECTABLE_MEMORY
+	default n
+	help
+	  Tries to verify that pmalloc works correctly and that the memory
+	  is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..f7bbbfde6967 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index 000000000000..05acd62d23ec
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+
+#include "pmalloc-selftest.h"
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)	\
+	pr_notice("must be " expected ": %s",		\
+		  is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)	\
+	validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)	\
+	validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+	struct gen_pool *pool_unprot;
+	struct gen_pool *pool_prot;
+	void *var_prot, *var_unprot, *var_vmall;
+
+	pr_notice("pmalloc self-test");
+	pool_unprot = pmalloc_create_pool("unprotected", 0);
+	pool_prot = pmalloc_create_pool("protected", 0);
+	BUG_ON(!(pool_unprot && pool_prot));
+
+	var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+	var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+	*(int *)var_prot = 0;
+	var_vmall = vmalloc(SIZE_2);
+	is_alloc_ok(var_unprot, 10);
+	is_alloc_ok(var_unprot, SIZE_1);
+	is_alloc_ok(var_unprot, PAGE_SIZE);
+	is_alloc_no(var_unprot, SIZE_1 + 1);
+	is_alloc_no(var_vmall, 10);
+
+
+	pfree(pool_unprot, var_unprot);
+	vfree(var_vmall);
+
+	pmalloc_protect_pool(pool_prot);
+
+	/* This will intentionally trigger a WARN because the pool being
+	 * destroyed is not protected, which is unusual and should happen
+	 * on error paths only, where probably other warnings are already
+	 * displayed.
+	 */
+	pmalloc_destroy_pool(pool_unprot);
+
+	/* This must not cause WARNings */
+	pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index 000000000000..63e430de74f8
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __MM_PMALLOC_SELFTEST_H
+#define __MM_PMALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index e94bfb407c92..c9a472730afc 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -22,6 +22,7 @@
 #include <asm/page.h>
 
 #include <linux/pmalloc.h>
+#include "pmalloc-selftest.h"
 /*
  * pmalloc_data contains the data specific to a pmalloc pool,
  * in a format compatible with the design of gen_alloc.
@@ -492,6 +493,7 @@ static int __init pmalloc_late_init(void)
 		}
 	}
 	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
 	return 0;
 }
 late_initcall(pmalloc_late_init);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 5/6] Pmalloc: self-test
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: linux-security-module

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 mm/Kconfig            |  9 ++++++++
 mm/Makefile           |  1 +
 mm/pmalloc-selftest.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/pmalloc-selftest.h | 24 ++++++++++++++++++++
 mm/pmalloc.c          |  2 ++
 5 files changed, 99 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index be578fbdce6d..098aefef78b1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
     depends on ARCH_HAS_SET_MEMORY
     select GENERIC_ALLOCATOR
     default y
+
+config PROTECTABLE_MEMORY_SELFTEST
+	bool "Run self test for pmalloc memory allocator"
+	depends on ARCH_HAS_SET_MEMORY
+	select PROTECTABLE_MEMORY
+	default n
+	help
+	  Tries to verify that pmalloc works correctly and that the memory
+	  is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..f7bbbfde6967 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index 000000000000..05acd62d23ec
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+
+#include "pmalloc-selftest.h"
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)	\
+	pr_notice("must be " expected ": %s",		\
+		  is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)	\
+	validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)	\
+	validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+	struct gen_pool *pool_unprot;
+	struct gen_pool *pool_prot;
+	void *var_prot, *var_unprot, *var_vmall;
+
+	pr_notice("pmalloc self-test");
+	pool_unprot = pmalloc_create_pool("unprotected", 0);
+	pool_prot = pmalloc_create_pool("protected", 0);
+	BUG_ON(!(pool_unprot && pool_prot));
+
+	var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+	var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+	*(int *)var_prot = 0;
+	var_vmall = vmalloc(SIZE_2);
+	is_alloc_ok(var_unprot, 10);
+	is_alloc_ok(var_unprot, SIZE_1);
+	is_alloc_ok(var_unprot, PAGE_SIZE);
+	is_alloc_no(var_unprot, SIZE_1 + 1);
+	is_alloc_no(var_vmall, 10);
+
+
+	pfree(pool_unprot, var_unprot);
+	vfree(var_vmall);
+
+	pmalloc_protect_pool(pool_prot);
+
+	/* This will intentionally trigger a WARN because the pool being
+	 * destroyed is not protected, which is unusual and should happen
+	 * on error paths only, where probably other warnings are already
+	 * displayed.
+	 */
+	pmalloc_destroy_pool(pool_unprot);
+
+	/* This must not cause WARNings */
+	pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index 000000000000..63e430de74f8
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __MM_PMALLOC_SELFTEST_H
+#define __MM_PMALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index e94bfb407c92..c9a472730afc 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -22,6 +22,7 @@
 #include <asm/page.h>
 
 #include <linux/pmalloc.h>
+#include "pmalloc-selftest.h"
 /*
  * pmalloc_data contains the data specific to a pmalloc pool,
  * in a format compatible with the design of gen_alloc.
@@ -492,6 +493,7 @@ static int __init pmalloc_late_init(void)
 		}
 	}
 	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
 	return 0;
 }
 late_initcall(pmalloc_late_init);
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 5/6] Pmalloc: self-test
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 mm/Kconfig            |  9 ++++++++
 mm/Makefile           |  1 +
 mm/pmalloc-selftest.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/pmalloc-selftest.h | 24 ++++++++++++++++++++
 mm/pmalloc.c          |  2 ++
 5 files changed, 99 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index be578fbdce6d..098aefef78b1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
     depends on ARCH_HAS_SET_MEMORY
     select GENERIC_ALLOCATOR
     default y
+
+config PROTECTABLE_MEMORY_SELFTEST
+	bool "Run self test for pmalloc memory allocator"
+	depends on ARCH_HAS_SET_MEMORY
+	select PROTECTABLE_MEMORY
+	default n
+	help
+	  Tries to verify that pmalloc works correctly and that the memory
+	  is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..f7bbbfde6967 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index 000000000000..05acd62d23ec
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+
+#include "pmalloc-selftest.h"
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)	\
+	pr_notice("must be " expected ": %s",		\
+		  is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)	\
+	validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)	\
+	validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+	struct gen_pool *pool_unprot;
+	struct gen_pool *pool_prot;
+	void *var_prot, *var_unprot, *var_vmall;
+
+	pr_notice("pmalloc self-test");
+	pool_unprot = pmalloc_create_pool("unprotected", 0);
+	pool_prot = pmalloc_create_pool("protected", 0);
+	BUG_ON(!(pool_unprot && pool_prot));
+
+	var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+	var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+	*(int *)var_prot = 0;
+	var_vmall = vmalloc(SIZE_2);
+	is_alloc_ok(var_unprot, 10);
+	is_alloc_ok(var_unprot, SIZE_1);
+	is_alloc_ok(var_unprot, PAGE_SIZE);
+	is_alloc_no(var_unprot, SIZE_1 + 1);
+	is_alloc_no(var_vmall, 10);
+
+
+	pfree(pool_unprot, var_unprot);
+	vfree(var_vmall);
+
+	pmalloc_protect_pool(pool_prot);
+
+	/* This will intentionally trigger a WARN because the pool being
+	 * destroyed is not protected, which is unusual and should happen
+	 * on error paths only, where probably other warnings are already
+	 * displayed.
+	 */
+	pmalloc_destroy_pool(pool_unprot);
+
+	/* This must not cause WARNings */
+	pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index 000000000000..63e430de74f8
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __MM_PMALLOC_SELFTEST_H
+#define __MM_PMALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index e94bfb407c92..c9a472730afc 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -22,6 +22,7 @@
 #include <asm/page.h>
 
 #include <linux/pmalloc.h>
+#include "pmalloc-selftest.h"
 /*
  * pmalloc_data contains the data specific to a pmalloc pool,
  * in a format compatible with the design of gen_alloc.
@@ -492,6 +493,7 @@ static int __init pmalloc_late_init(void)
 		}
 	}
 	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
 	return 0;
 }
 late_initcall(pmalloc_late_init);
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 5/6] Pmalloc: self-test
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 mm/Kconfig            |  9 ++++++++
 mm/Makefile           |  1 +
 mm/pmalloc-selftest.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/pmalloc-selftest.h | 24 ++++++++++++++++++++
 mm/pmalloc.c          |  2 ++
 5 files changed, 99 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index be578fbdce6d..098aefef78b1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
     depends on ARCH_HAS_SET_MEMORY
     select GENERIC_ALLOCATOR
     default y
+
+config PROTECTABLE_MEMORY_SELFTEST
+	bool "Run self test for pmalloc memory allocator"
+	depends on ARCH_HAS_SET_MEMORY
+	select PROTECTABLE_MEMORY
+	default n
+	help
+	  Tries to verify that pmalloc works correctly and that the memory
+	  is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..f7bbbfde6967 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index 000000000000..05acd62d23ec
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+
+#include "pmalloc-selftest.h"
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)	\
+	pr_notice("must be " expected ": %s",		\
+		  is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)	\
+	validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)	\
+	validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+	struct gen_pool *pool_unprot;
+	struct gen_pool *pool_prot;
+	void *var_prot, *var_unprot, *var_vmall;
+
+	pr_notice("pmalloc self-test");
+	pool_unprot = pmalloc_create_pool("unprotected", 0);
+	pool_prot = pmalloc_create_pool("protected", 0);
+	BUG_ON(!(pool_unprot && pool_prot));
+
+	var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+	var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+	*(int *)var_prot = 0;
+	var_vmall = vmalloc(SIZE_2);
+	is_alloc_ok(var_unprot, 10);
+	is_alloc_ok(var_unprot, SIZE_1);
+	is_alloc_ok(var_unprot, PAGE_SIZE);
+	is_alloc_no(var_unprot, SIZE_1 + 1);
+	is_alloc_no(var_vmall, 10);
+
+
+	pfree(pool_unprot, var_unprot);
+	vfree(var_vmall);
+
+	pmalloc_protect_pool(pool_prot);
+
+	/* This will intentionally trigger a WARN because the pool being
+	 * destroyed is not protected, which is unusual and should happen
+	 * on error paths only, where probably other warnings are already
+	 * displayed.
+	 */
+	pmalloc_destroy_pool(pool_unprot);
+
+	/* This must not cause WARNings */
+	pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index 000000000000..63e430de74f8
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+
+#ifndef __MM_PMALLOC_SELFTEST_H
+#define __MM_PMALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index e94bfb407c92..c9a472730afc 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -22,6 +22,7 @@
 #include <asm/page.h>
 
 #include <linux/pmalloc.h>
+#include "pmalloc-selftest.h"
 /*
  * pmalloc_data contains the data specific to a pmalloc pool,
  * in a format compatible with the design of gen_alloc.
@@ -492,6 +493,7 @@ static int __init pmalloc_late_init(void)
 		}
 	}
 	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
 	return 0;
 }
 late_initcall(pmalloc_late_init);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 6/6] Documentation for Pmalloc
  2018-02-11  3:19 ` Igor Stoppa
  (?)
  (?)
@ 2018-02-11  3:19   ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
    genalloc
    errseq
    printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===============================
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
new file mode 100644
index 000000000000..1afc13b6a74d
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+Protectable memory allocator
+============================
+
+Purpose
+-------
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor could it take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+-------
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+-------
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requester; however, it will not become available
+  for further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support with respect to allocating vs
+  protecting an individual pool, for performance reasons.
+  It is recommended not to share the same pool between unrelated functions.
+  Should sharing be a necessity, the user of the shared pool is expected
+  to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using vmalloc directly. The exact number depends on the
+  size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write-protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handles obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: linux-security-module

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
    genalloc
    errseq
    printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===============================
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
new file mode 100644
index 000000000000..1afc13b6a74d
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+Protectable memory allocator
+============================
+
+Purpose
+-------
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor could it take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+-------
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+-------
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requester; however, it will not become available
+  for further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support with respect to allocating vs
+  protecting an individual pool, for performance reasons.
+  It is recommended not to share the same pool between unrelated functions.
+  Should sharing be a necessity, the user of the shared pool is expected
+  to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using vmalloc directly. The exact number depends on the
+  size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write-protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handles obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
    genalloc
    errseq
    printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===============================
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
new file mode 100644
index 000000000000..1afc13b6a74d
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+Protectable memory allocator
+============================
+
+Purpose
+-------
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor could it take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+-------
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+-------
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requester; however, it will not become available
+  for further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support with respect to allocating vs
+  protecting an individual pool, for performance reasons.
+  It is recommended not to share the same pool between unrelated functions.
+  Should sharing be a necessity, the user of the shared pool is expected
+  to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using vmalloc directly. The exact number depends on the
+  size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write-protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handles obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-11  3:19   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  3:19 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
    genalloc
    errseq
    printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===============================
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
new file mode 100644
index 000000000000..1afc13b6a74d
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+Protectable memory allocator
+============================
+
+Purpose
+-------
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor could it take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+-------
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+-------
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requester; however, it will not become available
+  for further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support with respect to allocating vs
+  protecting an individual pool, for performance reasons.
+  It is recommended not to share the same pool between unrelated functions.
+  Should sharing be a necessity, the user of the shared pool is expected
+  to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using vmalloc directly. The exact number depends on the
+  size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write-protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handles obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
  2018-02-11  3:19   ` Igor Stoppa
  (?)
@ 2018-02-11 12:24     ` Mike Rapoport
  -1 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-11 12:24 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
> The genalloc library is only capable of tracking if a certain unit of
> allocation is in use or not.
> 
> It is not capable of discerning where the memory associated to an
> allocation request begins and where it ends.
> 
> The reason is that units of allocations are tracked by using a bitmap,
> where each bit represents that the unit is either allocated (1) or
> available (0).
> 
> The user of the API must keep track of how much space was requested, if
> it ever needs to be freed.
> 
> This can cause errors being undetected.
> Examples:
> * Only a subset of the memory provided to an allocation request is freed
> * The memory from a subsequent allocation is freed
> * The memory being freed doesn't start at the beginning of an
>   allocation.
> 
> The bitmap is used because it allows to perform lockless read/write
> access, where this is supported by hw through cmpxchg.
> Similarly, it is possible to scan the bitmap for a sufficiently long
> sequence of zeros, to identify zones available for allocation.
> 
> This patch doubles the space reserved in the bitmap for each allocation,
> to track their beginning.
> 
> For details, see the documentation inside lib/genalloc.c
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   4 +-
>  lib/genalloc.c           | 527 ++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 390 insertions(+), 141 deletions(-)
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index 872f930f1b06..dcaa33e74b1c 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -32,7 +32,7 @@
> 
>  #include <linux/types.h>
>  #include <linux/spinlock_types.h>
> -#include <linux/atomic.h>
> +#include <linux/slab.h>
> 
>  struct device;
>  struct device_node;
> @@ -76,7 +76,7 @@ struct gen_pool_chunk {
>  	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
>  	unsigned long start_addr;	/* start address of memory chunk */
>  	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
> -	unsigned long bits[0];		/* bitmap for allocating memory chunk */
> +	unsigned long entries[0];	/* bitmap for allocating memory chunk */
>  };
> 
>  /*
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> index ca06adc4f445..044347163acb 100644
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -26,6 +26,74 @@
>   *
>   * This source code is licensed under the GNU General Public License,
>   * Version 2.  See the file COPYING for more details.
> + *
> + *
> + *
> + * Encoding of the bitmap tracking the allocations
> + * -----------------------------------------------
> + *
> + * The bitmap is composed of units of allocations.
> + *
> + * Each unit of allocation is represented using 2 consecutive bits.
> + *
> + * This makes it possible to encode, for each unit of allocation,
> + * information about:
> + *  - allocation status (busy/free)
> + *  - beginning of a sequennce of allocation units (first / successive)
> + *
> + *
> + * Dictionary of allocation units (msb to the left, lsb to the right):
> + *
> + * 11: first allocation unit in the allocation
> + * 10: any subsequent allocation unit (if any) in the allocation
> + * 00: available allocation unit
> + * 01: invalid
> + *
> + * Example, using the same notation as above - MSb.......LSb:
> + *
> + *  ...000010111100000010101011   <-- Read in this direction.
> + *     \__|\__|\|\____|\______|
> + *        |   | |     |       \___ 4 used allocation units
> + *        |   | |     \___________ 3 empty allocation units
> + *        |   | \_________________ 1 used allocation unit
> + *        |   \___________________ 2 used allocation units
> + *        \_______________________ 2 empty allocation units
> + *
> + * The encoding allows for lockless operations, such as:
> + * - search for a sufficiently large range of allocation units
> + * - reservation of a selected range of allocation units
> + * - release of a specific allocation
> + *
> + * The alignment at which to perform the research for sequence of empty
> + * allocation units (marked as zeros in the bitmap) is 2^1.
> + *
> + * This means that an allocation can start only at even places
> + * (bit 0, bit 2, etc.) in the bitmap.
> + *
> + * Therefore, the number of zeroes to look for must be twice the number
> + * of desired allocation units.
> + *
> + * When it's time to free the memory associated to an allocation request,
> + * it's a matter of checking if the corresponding allocation unit is
> + * really the beginning of an allocation (both bits are set to 1).
> + *
> + * Looking for the ending can also be performed locklessly.
> + * It's sufficient to identify the first mapped allocation unit
> + * that is represented either as free (00) or busy (11).
> + * Even if the allocation status should change in the meanwhile, it
> + * doesn't matter, since it can only transition between free (00) and
> + * first-allocated (11).
> + *
> + * The parameter indicating to the *_free() function the size of the
> + * space that should be freed can be either set to 0, for automated
> + * assessment, or it can be specified explicitly.
> + *
> + * In case it is specified explicitly, the value is verified agaisnt what
> + * the library is tracking internally.
> + *
> + * If ever needed, the bitmap could be extended, assigning larger amounts
> + * of bits to each allocation unit (the increase must follow powers of 2),
> + * to track other properties of the allocations.
>   */
> 
>  #include <linux/slab.h>
> @@ -36,118 +104,230 @@
>  #include <linux/genalloc.h>
>  #include <linux/of_device.h>
> 
> +#define ENTRY_ORDER 1UL
> +#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
> +#define ENTRY_HEAD ENTRY_MASK
> +#define ENTRY_UNUSED 0UL
> +#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
> +#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
> +#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
> +#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
> +#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
> +
> +#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
> +
> +/* Binary pattern of 1010...1010 that spans one unsigned long. */
> +#define MASK (~0UL / 3 * 2)
> +
> +/**
> + * get_bitmap_entry - extracts the specified entry from the bitmap
> + * @map: pointer to a bitmap
> + * @entry_index: the index of the desired entry in the bitmap
> + *
> + * Return: The requested bitmap.
> + */
> +static inline unsigned long get_bitmap_entry(unsigned long *map,
> +					    int entry_index)
> +{
> +	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
> +		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
> +		ENTRY_MASK;
> +}
> +
> +
> +/**
> + * mem_to_units - convert references to memory into orders of allocation

Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
for function comments. I haven't noticed any difference in the resulting
html, so I'm not sure if the brackets are actually required.

> + * @size: amount in bytes
> + * @order: power of 2 represented by each entry in the bitmap
> + *
> + * Returns the number of units representing the size.

Please s/Return/Return:/

> + */
> +static inline unsigned long mem_to_units(unsigned long size,
> +					 unsigned long order)
> +{
> +	return (size + (1UL << order) - 1) >> order;
> +}
> +
> +/**
> + * chunk_size - dimension of a chunk of memory, in bytes
> + * @chunk: pointer to the struct describing the chunk
> + *
> + * Return: The size of the chunk, in bytes.
> + */
>  static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
>  {
>  	return chunk->end_addr - chunk->start_addr + 1;
>  }
> 
> -static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
> +
> +/**
> + * set_bits_ll - according to the mask, sets the bits specified by
> + * value, at the address specified.
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to store
> + *
> + * Return: 0 upon success, -EBUSY otherwise
> + */
> +static int set_bits_ll(unsigned long *addr,
> +		       unsigned long mask, unsigned long value)
>  {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
> 
>  	nval = *addr;
>  	do {
> -		val = nval;
> -		if (val & mask_to_set)
> +		present = nval;
> +		if (present & mask)
>  			return -EBUSY;
> +		target =  present | value;
>  		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>  	return 0;
>  }
> 
> -static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
> +
> +/**
> + * clear_bits_ll - according to the mask, clears the bits specified by
> + * value, at the address specified.
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to clear
> + *
> + * Return: 0 upon success, -EBUSY otherwise
> + */
> +static int clear_bits_ll(unsigned long *addr,
> +			 unsigned long mask, unsigned long value)
>  {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
> 
>  	nval = *addr;
> +	present = nval;
> +	if (unlikely((present & mask) ^ value))
> +		return -EBUSY;
>  	do {
> -		val = nval;
> -		if ((val & mask_to_clear) != mask_to_clear)
> +		present = nval;
> +		if (unlikely((present & mask) ^ value))
>  			return -EBUSY;
> +		target =  present & ~mask;
>  		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>  	return 0;
>  }
> 
> -/*
> - * bitmap_set_ll - set the specified number of bits at the specified position
> +
> +/**
> + * get_boundary - verify that an allocation effectively
> + * starts at the given address, then measure its length.
>   * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start_entry: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
>   *
> - * Set @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users set the same bit, one user will return remain bits, otherwise
> - * return 0.
> + * Return: the length of an allocation, otherwise -EINVAL if the
> + * parameters do not refer to a correct allocation.
>   */
> -static int bitmap_set_ll(unsigned long *map, int start, int nr)
> +static int get_boundary(unsigned long *map, int start_entry, int nentries)
>  {
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_set >= 0) {
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -		nr -= bits_to_set;
> -		bits_to_set = BITS_PER_LONG;
> -		mask_to_set = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -	}
> +	int i;
> +	unsigned long bitmap_entry;
> 
> -	return 0;
> +
> +	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
> +		return -EINVAL;
> +	for (i = start_entry + 1; i < nentries; i++) {
> +		bitmap_entry = get_bitmap_entry(map, i);
> +		if (bitmap_entry == ENTRY_HEAD ||
> +		    bitmap_entry == ENTRY_UNUSED)
> +			return i;
> +	}
> +	return nentries - start_entry;
>  }
> 
> +
> +#define SET_BITS 1
> +#define CLEAR_BITS 0
> +
>  /*
> - * bitmap_clear_ll - clear the specified number of bits at the specified position
> + * alter_bitmap_ll - set or clear the entries associated with an allocation
> + * @alteration: indicates if the bits selected should be set or cleared
>   * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
> + *
> + * The modification happens lock-lessly.
> + * Several users can write to the same map simultaneously, without lock.
>   *
> - * Clear @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users clear the same bit, one user will return remain bits,
> - * otherwise return 0.
> + * Return: If two users alter the same bit, to one it will return
> + * remaining entries, to the other it will return 0.

And what if there are three or four concurrent users? ;-)

I believe that a more elaborate description about what happens with
concurrent attempts to alter the bitmap would be really helpful.

>   */
> -static int bitmap_clear_ll(unsigned long *map, int start, int nr)
> +static int alter_bitmap_ll(bool alteration, unsigned long *map,
> +			   int start_entry, int nentries)
>  {
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_clear >= 0) {
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> -		nr -= bits_to_clear;
> -		bits_to_clear = BITS_PER_LONG;
> -		mask_to_clear = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> +	unsigned long start_bit;
> +	unsigned long end_bit;
> +	unsigned long mask;
> +	unsigned long value;
> +	int nbits;
> +	int bits_to_write;
> +	int index;
> +	int (*action)(unsigned long *addr,
> +		      unsigned long mask, unsigned long value);
> +
> +	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
> +
> +	/*
> +	 * Prepare for writing the initial part of the allocation, from
> +	 * starting entry, to the end of the UL bitmap element which
> +	 * contains it. It might be larger than the actual allocation.
> +	 */
> +	start_bit = ENTRIES_TO_BITS(start_entry);
> +	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
> +	nbits = ENTRIES_TO_BITS(nentries);
> +	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
> +	mask = BITMAP_FIRST_WORD_MASK(start_bit);
> +	/* Mark the beginning of the allocation. */
> +	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
> +	index = BITS_DIV_LONGS(start_bit);
> +
> +	/*
> +	 * Writes entries to the bitmap, as long as the reminder is
> +	 * positive or zero.
> +	 * Might be skipped if the entries to write do not reach the end
> +	 * of a bitmap UL unit.
> +	 */
> +	while (nbits >= bits_to_write) {
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +		nbits -= bits_to_write;
> +		bits_to_write = BITS_PER_LONG;
> +		mask = ~0UL;
> +		value = MASK;
> +		index++;
>  	}
> 
> +	/* Takes care of the ending part of the entries to mark. */
> +	if (nbits > 0) {
> +		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
> +		bits_to_write = nbits;
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +	}
>  	return 0;
>  }
> 
> +
>  /**
>   * gen_pool_create - create a new special memory pool
> - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
> + * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
>   * @nid: node id of the node the pool structure should be allocated on, or -1
>   *
>   * Create a new special memory pool that can be used to manage special purpose
>   * memory not managed by the regular kmalloc/kfree interface.
> + *
> + * Return: pointer to the pool, if successful, NULL otherwise
>   */
>  struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>  {
> @@ -177,16 +357,18 @@ EXPORT_SYMBOL(gen_pool_create);
>   *
>   * Add a new chunk of special memory to the specified pool.
>   *
> - * Returns 0 on success or a -ve errno on failure.
> + * Return: 0 on success or a -ve errno on failure.
>   */
>  int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
>  		 size_t size, int nid)
>  {
>  	struct gen_pool_chunk *chunk;
> -	int nbits = size >> pool->min_alloc_order;
> -	int nbytes = sizeof(struct gen_pool_chunk) +
> -				BITS_TO_LONGS(nbits) * sizeof(long);
> +	int nentries;
> +	int nbytes;
> 
> +	nentries = size >> pool->min_alloc_order;
> +	nbytes = sizeof(struct gen_pool_chunk) +
> +		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
>  	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
>  	if (unlikely(chunk == NULL))
>  		return -ENOMEM;
> @@ -209,7 +391,7 @@ EXPORT_SYMBOL(gen_pool_add_virt);
>   * @pool: pool to allocate from
>   * @addr: starting address of memory
>   *
> - * Returns the physical address on success, or -1 on error.
> + * Return: the physical address on success, or -1 on error.
>   */
>  phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>  {
> @@ -248,7 +430,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>  		list_del(&chunk->next_chunk);
> 
>  		end_bit = chunk_size(chunk) >> order;
> -		bit = find_next_bit(chunk->bits, end_bit, 0);
> +		bit = find_next_bit(chunk->entries, end_bit, 0);
>  		BUG_ON(bit < end_bit);
> 
>  		kfree(chunk);
> @@ -267,6 +449,8 @@ EXPORT_SYMBOL(gen_pool_destroy);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>  {
> @@ -285,6 +469,8 @@ EXPORT_SYMBOL(gen_pool_alloc);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  		genpool_algo_t algo, void *data)
> @@ -292,7 +478,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  	struct gen_pool_chunk *chunk;
>  	unsigned long addr = 0;
>  	int order = pool->min_alloc_order;
> -	int nbits, start_bit, end_bit, remain;
> +	int nentries, start_entry, end_entry, remain;
> 
>  #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	BUG_ON(in_nmi());
> @@ -301,29 +487,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  	if (size == 0)
>  		return 0;
> 
> -	nbits = (size + (1UL << order) - 1) >> order;
> +	nentries = mem_to_units(size, order);
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>  		if (size > atomic_long_read(&chunk->avail))
>  			continue;
> 
> -		start_bit = 0;
> -		end_bit = chunk_size(chunk) >> order;
> +		start_entry = 0;
> +		end_entry = chunk_size(chunk) >> order;
>  retry:
> -		start_bit = algo(chunk->bits, end_bit, start_bit,
> -				 nbits, data, pool);
> -		if (start_bit >= end_bit)
> +		start_entry = algo(chunk->entries, end_entry, start_entry,
> +				  nentries, data, pool);
> +		if (start_entry >= end_entry)
>  			continue;
> -		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
> +		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
> +					 start_entry, nentries);
>  		if (remain) {
> -			remain = bitmap_clear_ll(chunk->bits, start_bit,
> -						 nbits - remain);
> -			BUG_ON(remain);
> +			remain = alter_bitmap_ll(CLEAR_BITS,
> +						 chunk->entries,
> +						 start_entry,
> +						 nentries - remain);
>  			goto retry;
>  		}
> 
> -		addr = chunk->start_addr + ((unsigned long)start_bit << order);
> -		size = nbits << order;
> +		addr = chunk->start_addr +
> +			((unsigned long)start_entry << order);
> +		size = nentries << order;
>  		atomic_long_sub(size, &chunk->avail);
>  		break;
>  	}
> @@ -342,6 +531,8 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>  {
> @@ -365,7 +556,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
>   * gen_pool_free - free allocated special memory back to the pool
>   * @pool: pool to free to
>   * @addr: starting address of memory to free back to pool
> - * @size: size in bytes of memory to free
> + * @size: size in bytes of memory to free or 0, for auto-detection
>   *
>   * Free previously allocated special memory back to the specified
>   * pool.  Can not be used in NMI handler on architectures without
> @@ -375,22 +566,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>  {
>  	struct gen_pool_chunk *chunk;
>  	int order = pool->min_alloc_order;
> -	int start_bit, nbits, remain;
> +	int start_entry, remaining_entries, nentries, remain;
> +	int boundary;
> 
>  #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	BUG_ON(in_nmi());
>  #endif
> 
> -	nbits = (size + (1UL << order) - 1) >> order;
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>  		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
>  			BUG_ON(addr + size - 1 > chunk->end_addr);
> -			start_bit = (addr - chunk->start_addr) >> order;
> -			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
> +			start_entry = (addr - chunk->start_addr) >> order;
> +			remaining_entries = (chunk->end_addr - addr) >> order;
> +			boundary = get_boundary(chunk->entries, start_entry,
> +						remaining_entries);
> +			BUG_ON(boundary < 0);
> +			nentries = boundary - start_entry;
> +			BUG_ON(size &&
> +			       (nentries != mem_to_units(size, order)));
> +			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
> +						 start_entry, nentries);
>  			BUG_ON(remain);
> -			size = nbits << order;
> -			atomic_long_add(size, &chunk->avail);
> +			atomic_long_add(nentries << order, &chunk->avail);
>  			rcu_read_unlock();
>  			return;
>  		}
> @@ -428,8 +626,9 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk);
>   * @start:	start address
>   * @size:	size of the region
>   *
> - * Check if the range of addresses falls within the specified pool. Returns
> - * true if the entire range is contained in the pool and false otherwise.
> + * Check if the range of addresses falls within the specified pool.
> + *
> + * Return: true if the entire range is contained in the pool, false otherwise.
>   */
>  bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>  			size_t size)
> @@ -455,7 +654,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>   * gen_pool_avail - get available free space of the pool
>   * @pool: pool to get available free space
>   *
> - * Return available free space of the specified pool.
> + * Return: available free space of the specified pool.
>   */
>  size_t gen_pool_avail(struct gen_pool *pool)
>  {
> @@ -474,7 +673,7 @@ EXPORT_SYMBOL_GPL(gen_pool_avail);
>   * gen_pool_size - get size in bytes of memory managed by the pool
>   * @pool: pool to get size
>   *
> - * Return size in bytes of memory managed by the pool.
> + * Return: size in bytes of memory managed by the pool.
>   */
>  size_t gen_pool_size(struct gen_pool *pool)
>  {
> @@ -517,17 +716,27 @@ EXPORT_SYMBOL(gen_pool_set_algo);
>   * gen_pool_first_fit - find the first available region
>   * of memory matching the size requirement (no alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
>  		struct gen_pool *pool)
>  {
> -	return bitmap_find_next_zero_area(map, size, start, nr, 0);
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> +
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit);
> 
> @@ -535,11 +744,13 @@ EXPORT_SYMBOL(gen_pool_first_fit);
>   * gen_pool_first_fit_align - find the first available region
>   * of memory matching the size requirement (alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: data for alignment
>   * @pool: pool to get order from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
> @@ -547,23 +758,32 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
>  {
>  	struct genpool_data_align *alignment;
>  	unsigned long align_mask;
> +	unsigned long bit_index;
>  	int order;
> 
>  	alignment = data;
>  	order = pool->min_alloc_order;
> -	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
> -	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
> +	align_mask = roundup_pow_of_two(
> +			ENTRIES_TO_BITS(mem_to_units(alignment->align,
> +						     order))) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit_align);
> 
>  /**
>   * gen_pool_fixed_alloc - reserve a specific region
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: data for alignment
>   * @pool: pool to get order from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
> @@ -571,20 +791,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
>  {
>  	struct genpool_data_fixed *fixed_data;
>  	int order;
> -	unsigned long offset_bit;
> -	unsigned long start_bit;
> +	unsigned long offset;
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> 
>  	fixed_data = data;
>  	order = pool->min_alloc_order;
> -	offset_bit = fixed_data->offset >> order;
>  	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
>  		return size;
> +	offset = fixed_data->offset >> order;
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start + offset),
> +					       ENTRIES_TO_BITS(nr), align_mask);
> +	if (bit_index != ENTRIES_TO_BITS(offset))
> +		return size;
> 
> -	start_bit = bitmap_find_next_zero_area(map, size,
> -			start + offset_bit, nr, 0);
> -	if (start_bit != offset_bit)
> -		start_bit = size;
> -	return start_bit;
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_fixed_alloc);
> 
> @@ -593,60 +816,84 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
>   * of memory matching the size requirement. The region will be aligned
>   * to the order of the size specified.
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit_order_align(unsigned long *map,
>  		unsigned long size, unsigned long start,
>  		unsigned int nr, void *data, struct gen_pool *pool)
>  {
> -	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
> -
> -	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> +
> +	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit_order_align);
> 
>  /**
>   * gen_pool_best_fit - find the best fitting region of memory
> - * macthing the size requirement (no alignment constraint)
> + * matching the size requirement (no alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
>   *
>   * Iterate over the bitmap to find the smallest free region
>   * which we can allocate the memory.
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
>  		struct gen_pool *pool)
>  {
> -	unsigned long start_bit = size;
> +	unsigned long start_bit = ENTRIES_TO_BITS(size);
>  	unsigned long len = size + 1;
>  	unsigned long index;
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> 
> -	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	index = BITS_DIV_ENTRIES(bit_index);
> 
>  	while (index < size) {
> -		int next_bit = find_next_bit(map, size, index + nr);
> -		if ((next_bit - index) < len) {
> -			len = next_bit - index;
> -			start_bit = index;
> +		int next_bit;
> +
> +		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
> +					 ENTRIES_TO_BITS(index + nr));
> +		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
> +			len = BITS_DIV_ENTRIES(next_bit) - index;
> +			start_bit = ENTRIES_TO_BITS(index);
>  			if (len == nr)
> -				return start_bit;
> +				return BITS_DIV_ENTRIES(start_bit);
>  		}
> -		index = bitmap_find_next_zero_area(map, size,
> -						   next_bit + 1, nr, 0);
> +		bit_index =
> +			bitmap_find_next_zero_area(map,
> +						   ENTRIES_TO_BITS(size),
> +						   next_bit + 1,
> +						   ENTRIES_TO_BITS(nr),
> +						   align_mask);
> +		index = BITS_DIV_ENTRIES(bit_index);
>  	}
> 
> -	return start_bit;
> +	return BITS_DIV_ENTRIES(start_bit);
>  }
> -EXPORT_SYMBOL(gen_pool_best_fit);
> 
>  static void devm_gen_pool_release(struct device *dev, void *res)
>  {
> @@ -672,7 +919,7 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
>   * @dev: device to retrieve the gen_pool from
>   * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
>   *
> - * Returns the gen_pool for the device if one is present, or NULL.
> + * Return: the gen_pool for the device if one is present, or NULL.
>   */
>  struct gen_pool *gen_pool_get(struct device *dev, const char *name)
>  {
> @@ -696,6 +943,8 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
>   * Create a new special memory pool that can be used to manage special purpose
>   * memory not managed by the regular kmalloc/kfree interface. The pool will be
>   * automatically destroyed by the device management code.
> + *
> + * Return: the address of the pool, if successful, otherwise NULL
>   */
>  struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
>  				      int nid, const char *name)
> @@ -743,7 +992,7 @@ EXPORT_SYMBOL(devm_gen_pool_create);
>   * @propname: property name containing phandle(s)
>   * @index: index into the phandle array
>   *
> - * Returns the pool that contains the chunk starting at the physical
> + * Return: the pool that contains the chunk starting at the physical
>   * address of the device tree node pointed at by the phandle property,
>   * or NULL if not found.
>   */
> -- 
> 2.14.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-11 12:24     ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-11 12:24 UTC (permalink / raw)
  To: linux-security-module

On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
> The genalloc library is only capable of tracking if a certain unit of
> allocation is in use or not.
> 
> It is not capable of discerning where the memory associated to an
> allocation request begins and where it ends.
> 
> The reason is that units of allocations are tracked by using a bitmap,
> where each bit represents that the unit is either allocated (1) or
> available (0).
> 
> The user of the API must keep track of how much space was requested, if
> it ever needs to be freed.
> 
> This can cause errors being undetected.
> Examples:
> * Only a subset of the memory provided to an allocation request is freed
> * The memory from a subsequent allocation is freed
> * The memory being freed doesn't start at the beginning of an
>   allocation.
> 
> The bitmap is used because it allows to perform lockless read/write
> access, where this is supported by hw through cmpxchg.
> Similarly, it is possible to scan the bitmap for a sufficiently long
> sequence of zeros, to identify zones available for allocation.
> 
> This patch doubles the space reserved in the bitmap for each allocation,
> to track their beginning.
> 
> For details, see the documentation inside lib/genalloc.c
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   4 +-
>  lib/genalloc.c           | 527 ++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 390 insertions(+), 141 deletions(-)
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index 872f930f1b06..dcaa33e74b1c 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -32,7 +32,7 @@
> 
>  #include <linux/types.h>
>  #include <linux/spinlock_types.h>
> -#include <linux/atomic.h>
> +#include <linux/slab.h>
> 
>  struct device;
>  struct device_node;
> @@ -76,7 +76,7 @@ struct gen_pool_chunk {
>  	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
>  	unsigned long start_addr;	/* start address of memory chunk */
>  	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
> -	unsigned long bits[0];		/* bitmap for allocating memory chunk */
> +	unsigned long entries[0];	/* bitmap for allocating memory chunk */
>  };
> 
>  /*
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> index ca06adc4f445..044347163acb 100644
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -26,6 +26,74 @@
>   *
>   * This source code is licensed under the GNU General Public License,
>   * Version 2.  See the file COPYING for more details.
> + *
> + *
> + *
> + * Encoding of the bitmap tracking the allocations
> + * -----------------------------------------------
> + *
> + * The bitmap is composed of units of allocations.
> + *
> + * Each unit of allocation is represented using 2 consecutive bits.
> + *
> + * This makes it possible to encode, for each unit of allocation,
> + * information about:
> + *  - allocation status (busy/free)
> + *  - beginning of a sequennce of allocation units (first / successive)
> + *
> + *
> + * Dictionary of allocation units (msb to the left, lsb to the right):
> + *
> + * 11: first allocation unit in the allocation
> + * 10: any subsequent allocation unit (if any) in the allocation
> + * 00: available allocation unit
> + * 01: invalid
> + *
> + * Example, using the same notation as above - MSb.......LSb:
> + *
> + *  ...000010111100000010101011   <-- Read in this direction.
> + *     \__|\__|\|\____|\______|
> + *        |   | |     |       \___ 4 used allocation units
> + *        |   | |     \___________ 3 empty allocation units
> + *        |   | \_________________ 1 used allocation unit
> + *        |   \___________________ 2 used allocation units
> + *        \_______________________ 2 empty allocation units
> + *
> + * The encoding allows for lockless operations, such as:
> + * - search for a sufficiently large range of allocation units
> + * - reservation of a selected range of allocation units
> + * - release of a specific allocation
> + *
> + * The alignment at which to perform the research for sequence of empty
> + * allocation units (marked as zeros in the bitmap) is 2^1.
> + *
> + * This means that an allocation can start only at even places
> + * (bit 0, bit 2, etc.) in the bitmap.
> + *
> + * Therefore, the number of zeroes to look for must be twice the number
> + * of desired allocation units.
> + *
> + * When it's time to free the memory associated to an allocation request,
> + * it's a matter of checking if the corresponding allocation unit is
> + * really the beginning of an allocation (both bits are set to 1).
> + *
> + * Looking for the ending can also be performed locklessly.
> + * It's sufficient to identify the first mapped allocation unit
> + * that is represented either as free (00) or busy (11).
> + * Even if the allocation status should change in the meanwhile, it
> + * doesn't matter, since it can only transition between free (00) and
> + * first-allocated (11).
> + *
> + * The parameter indicating to the *_free() function the size of the
> + * space that should be freed can be either set to 0, for automated
> + * assessment, or it can be specified explicitly.
> + *
> + * In case it is specified explicitly, the value is verified agaisnt what
> + * the library is tracking internally.
> + *
> + * If ever needed, the bitmap could be extended, assigning larger amounts
> + * of bits to each allocation unit (the increase must follow powers of 2),
> + * to track other properties of the allocations.
>   */
> 
>  #include <linux/slab.h>
> @@ -36,118 +104,230 @@
>  #include <linux/genalloc.h>
>  #include <linux/of_device.h>
> 
> +#define ENTRY_ORDER 1UL
> +#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
> +#define ENTRY_HEAD ENTRY_MASK
> +#define ENTRY_UNUSED 0UL
> +#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
> +#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
> +#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
> +#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
> +#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
> +
> +#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
> +
> +/* Binary pattern of 1010...1010 that spans one unsigned long. */
> +#define MASK (~0UL / 3 * 2)
> +
> +/**
> + * get_bitmap_entry - extracts the specified entry from the bitmap
> + * @map: pointer to a bitmap
> + * @entry_index: the index of the desired entry in the bitmap
> + *
> + * Return: The requested bitmap.
> + */
> +static inline unsigned long get_bitmap_entry(unsigned long *map,
> +					    int entry_index)
> +{
> +	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
> +		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
> +		ENTRY_MASK;
> +}
> +
> +
> +/**
> + * mem_to_units - convert references to memory into orders of allocation

Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
for function comments. I haven't noticed any difference in the resulting
html, so I'm not sure if the brackets are actually required.

> + * @size: amount in bytes
> + * @order: power of 2 represented by each entry in the bitmap
> + *
> + * Returns the number of units representing the size.

Please s/Return/Return:/

> + */
> +static inline unsigned long mem_to_units(unsigned long size,
> +					 unsigned long order)
> +{
> +	return (size + (1UL << order) - 1) >> order;
> +}
> +
> +/**
> + * chunk_size - dimension of a chunk of memory, in bytes
> + * @chunk: pointer to the struct describing the chunk
> + *
> + * Return: The size of the chunk, in bytes.
> + */
>  static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
>  {
>  	return chunk->end_addr - chunk->start_addr + 1;
>  }
> 
> -static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
> +
> +/**
> + * set_bits_ll - according to the mask, sets the bits specified by
> + * value, at the address specified.
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to store
> + *
> + * Return: 0 upon success, -EBUSY otherwise
> + */
> +static int set_bits_ll(unsigned long *addr,
> +		       unsigned long mask, unsigned long value)
>  {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
> 
>  	nval = *addr;
>  	do {
> -		val = nval;
> -		if (val & mask_to_set)
> +		present = nval;
> +		if (present & mask)
>  			return -EBUSY;
> +		target =  present | value;
>  		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>  	return 0;
>  }
> 
> -static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
> +
> +/**
> + * clear_bits_ll - according to the mask, clears the bits specified by
> + * value, at the address specified.
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to clear
> + *
> + * Return: 0 upon success, -EBUSY otherwise
> + */
> +static int clear_bits_ll(unsigned long *addr,
> +			 unsigned long mask, unsigned long value)
>  {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
> 
>  	nval = *addr;
> +	present = nval;
> +	if (unlikely((present & mask) ^ value))
> +		return -EBUSY;
>  	do {
> -		val = nval;
> -		if ((val & mask_to_clear) != mask_to_clear)
> +		present = nval;
> +		if (unlikely((present & mask) ^ value))
>  			return -EBUSY;
> +		target =  present & ~mask;
>  		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>  	return 0;
>  }
> 
> -/*
> - * bitmap_set_ll - set the specified number of bits at the specified position
> +
> +/**
> + * get_boundary - verify that an allocation effectively
> + * starts at the given address, then measure its length.
>   * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start_entry: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
>   *
> - * Set @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users set the same bit, one user will return remain bits, otherwise
> - * return 0.
> + * Return: the length of an allocation, otherwise -EINVAL if the
> + * parameters do not refer to a correct allocation.
>   */
> -static int bitmap_set_ll(unsigned long *map, int start, int nr)
> +static int get_boundary(unsigned long *map, int start_entry, int nentries)
>  {
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_set >= 0) {
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -		nr -= bits_to_set;
> -		bits_to_set = BITS_PER_LONG;
> -		mask_to_set = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -	}
> +	int i;
> +	unsigned long bitmap_entry;
> 
> -	return 0;
> +
> +	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
> +		return -EINVAL;
> +	for (i = start_entry + 1; i < nentries; i++) {
> +		bitmap_entry = get_bitmap_entry(map, i);
> +		if (bitmap_entry == ENTRY_HEAD ||
> +		    bitmap_entry == ENTRY_UNUSED)
> +			return i;
> +	}
> +	return nentries - start_entry;
>  }
> 
> +
> +#define SET_BITS 1
> +#define CLEAR_BITS 0
> +
>  /*
> - * bitmap_clear_ll - clear the specified number of bits at the specified position
> + * alter_bitmap_ll - set or clear the entries associated with an allocation
> + * @alteration: indicates if the bits selected should be set or cleared
>   * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
> + *
> + * The modification happens lock-lessly.
> + * Several users can write to the same map simultaneously, without lock.
>   *
> - * Clear @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users clear the same bit, one user will return remain bits,
> - * otherwise return 0.
> + * Return: If two users alter the same bit, to one it will return
> + * remaining entries, to the other it will return 0.

And what if there are three or four concurrent users? ;-)

I believe that a more elaborate description about what happens with
concurrent attempts to alter the bitmap would be really helpful.

>   */
> -static int bitmap_clear_ll(unsigned long *map, int start, int nr)
> +static int alter_bitmap_ll(bool alteration, unsigned long *map,
> +			   int start_entry, int nentries)
>  {
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_clear >= 0) {
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> -		nr -= bits_to_clear;
> -		bits_to_clear = BITS_PER_LONG;
> -		mask_to_clear = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> +	unsigned long start_bit;
> +	unsigned long end_bit;
> +	unsigned long mask;
> +	unsigned long value;
> +	int nbits;
> +	int bits_to_write;
> +	int index;
> +	int (*action)(unsigned long *addr,
> +		      unsigned long mask, unsigned long value);
> +
> +	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
> +
> +	/*
> +	 * Prepare for writing the initial part of the allocation, from
> +	 * starting entry, to the end of the UL bitmap element which
> +	 * contains it. It might be larger than the actual allocation.
> +	 */
> +	start_bit = ENTRIES_TO_BITS(start_entry);
> +	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
> +	nbits = ENTRIES_TO_BITS(nentries);
> +	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
> +	mask = BITMAP_FIRST_WORD_MASK(start_bit);
> +	/* Mark the beginning of the allocation. */
> +	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
> +	index = BITS_DIV_LONGS(start_bit);
> +
> +	/*
> +	 * Writes entries to the bitmap, as long as the reminder is
> +	 * positive or zero.
> +	 * Might be skipped if the entries to write do not reach the end
> +	 * of a bitmap UL unit.
> +	 */
> +	while (nbits >= bits_to_write) {
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +		nbits -= bits_to_write;
> +		bits_to_write = BITS_PER_LONG;
> +		mask = ~0UL;
> +		value = MASK;
> +		index++;
>  	}
> 
> +	/* Takes care of the ending part of the entries to mark. */
> +	if (nbits > 0) {
> +		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
> +		bits_to_write = nbits;
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +	}
>  	return 0;
>  }
> 
> +
>  /**
>   * gen_pool_create - create a new special memory pool
> - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
> + * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
>   * @nid: node id of the node the pool structure should be allocated on, or -1
>   *
>   * Create a new special memory pool that can be used to manage special purpose
>   * memory not managed by the regular kmalloc/kfree interface.
> + *
> + * Return: pointer to the pool, if successful, NULL otherwise
>   */
>  struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>  {
> @@ -177,16 +357,18 @@ EXPORT_SYMBOL(gen_pool_create);
>   *
>   * Add a new chunk of special memory to the specified pool.
>   *
> - * Returns 0 on success or a -ve errno on failure.
> + * Return: 0 on success or a -ve errno on failure.
>   */
>  int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
>  		 size_t size, int nid)
>  {
>  	struct gen_pool_chunk *chunk;
> -	int nbits = size >> pool->min_alloc_order;
> -	int nbytes = sizeof(struct gen_pool_chunk) +
> -				BITS_TO_LONGS(nbits) * sizeof(long);
> +	int nentries;
> +	int nbytes;
> 
> +	nentries = size >> pool->min_alloc_order;
> +	nbytes = sizeof(struct gen_pool_chunk) +
> +		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
>  	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
>  	if (unlikely(chunk == NULL))
>  		return -ENOMEM;
> @@ -209,7 +391,7 @@ EXPORT_SYMBOL(gen_pool_add_virt);
>   * @pool: pool to allocate from
>   * @addr: starting address of memory
>   *
> - * Returns the physical address on success, or -1 on error.
> + * Return: the physical address on success, or -1 on error.
>   */
>  phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>  {
> @@ -248,7 +430,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>  		list_del(&chunk->next_chunk);
> 
>  		end_bit = chunk_size(chunk) >> order;
> -		bit = find_next_bit(chunk->bits, end_bit, 0);
> +		bit = find_next_bit(chunk->entries, end_bit, 0);
>  		BUG_ON(bit < end_bit);
> 
>  		kfree(chunk);
> @@ -267,6 +449,8 @@ EXPORT_SYMBOL(gen_pool_destroy);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>  {
> @@ -285,6 +469,8 @@ EXPORT_SYMBOL(gen_pool_alloc);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  		genpool_algo_t algo, void *data)
> @@ -292,7 +478,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  	struct gen_pool_chunk *chunk;
>  	unsigned long addr = 0;
>  	int order = pool->min_alloc_order;
> -	int nbits, start_bit, end_bit, remain;
> +	int nentries, start_entry, end_entry, remain;
> 
>  #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	BUG_ON(in_nmi());
> @@ -301,29 +487,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  	if (size == 0)
>  		return 0;
> 
> -	nbits = (size + (1UL << order) - 1) >> order;
> +	nentries = mem_to_units(size, order);
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>  		if (size > atomic_long_read(&chunk->avail))
>  			continue;
> 
> -		start_bit = 0;
> -		end_bit = chunk_size(chunk) >> order;
> +		start_entry = 0;
> +		end_entry = chunk_size(chunk) >> order;
>  retry:
> -		start_bit = algo(chunk->bits, end_bit, start_bit,
> -				 nbits, data, pool);
> -		if (start_bit >= end_bit)
> +		start_entry = algo(chunk->entries, end_entry, start_entry,
> +				  nentries, data, pool);
> +		if (start_entry >= end_entry)
>  			continue;
> -		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
> +		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
> +					 start_entry, nentries);
>  		if (remain) {
> -			remain = bitmap_clear_ll(chunk->bits, start_bit,
> -						 nbits - remain);
> -			BUG_ON(remain);
> +			remain = alter_bitmap_ll(CLEAR_BITS,
> +						 chunk->entries,
> +						 start_entry,
> +						 nentries - remain);
>  			goto retry;
>  		}
> 
> -		addr = chunk->start_addr + ((unsigned long)start_bit << order);
> -		size = nbits << order;
> +		addr = chunk->start_addr +
> +			((unsigned long)start_entry << order);
> +		size = nentries << order;
>  		atomic_long_sub(size, &chunk->avail);
>  		break;
>  	}
> @@ -342,6 +531,8 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>  {
> @@ -365,7 +556,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
>   * gen_pool_free - free allocated special memory back to the pool
>   * @pool: pool to free to
>   * @addr: starting address of memory to free back to pool
> - * @size: size in bytes of memory to free
> + * @size: size in bytes of memory to free or 0, for auto-detection
>   *
>   * Free previously allocated special memory back to the specified
>   * pool.  Can not be used in NMI handler on architectures without
> @@ -375,22 +566,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>  {
>  	struct gen_pool_chunk *chunk;
>  	int order = pool->min_alloc_order;
> -	int start_bit, nbits, remain;
> +	int start_entry, remaining_entries, nentries, remain;
> +	int boundary;
> 
>  #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	BUG_ON(in_nmi());
>  #endif
> 
> -	nbits = (size + (1UL << order) - 1) >> order;
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>  		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
>  			BUG_ON(addr + size - 1 > chunk->end_addr);
> -			start_bit = (addr - chunk->start_addr) >> order;
> -			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
> +			start_entry = (addr - chunk->start_addr) >> order;
> +			remaining_entries = (chunk->end_addr - addr) >> order;
> +			boundary = get_boundary(chunk->entries, start_entry,
> +						remaining_entries);
> +			BUG_ON(boundary < 0);
> +			nentries = boundary - start_entry;
> +			BUG_ON(size &&
> +			       (nentries != mem_to_units(size, order)));
> +			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
> +						 start_entry, nentries);
>  			BUG_ON(remain);
> -			size = nbits << order;
> -			atomic_long_add(size, &chunk->avail);
> +			atomic_long_add(nentries << order, &chunk->avail);
>  			rcu_read_unlock();
>  			return;
>  		}
> @@ -428,8 +626,9 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk);
>   * @start:	start address
>   * @size:	size of the region
>   *
> - * Check if the range of addresses falls within the specified pool. Returns
> - * true if the entire range is contained in the pool and false otherwise.
> + * Check if the range of addresses falls within the specified pool.
> + *
> + * Return: true if the entire range is contained in the pool, false otherwise.
>   */
>  bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>  			size_t size)
> @@ -455,7 +654,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>   * gen_pool_avail - get available free space of the pool
>   * @pool: pool to get available free space
>   *
> - * Return available free space of the specified pool.
> + * Return: available free space of the specified pool.
>   */
>  size_t gen_pool_avail(struct gen_pool *pool)
>  {
> @@ -474,7 +673,7 @@ EXPORT_SYMBOL_GPL(gen_pool_avail);
>   * gen_pool_size - get size in bytes of memory managed by the pool
>   * @pool: pool to get size
>   *
> - * Return size in bytes of memory managed by the pool.
> + * Return: size in bytes of memory managed by the pool.
>   */
>  size_t gen_pool_size(struct gen_pool *pool)
>  {
> @@ -517,17 +716,27 @@ EXPORT_SYMBOL(gen_pool_set_algo);
>   * gen_pool_first_fit - find the first available region
>   * of memory matching the size requirement (no alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
>  		struct gen_pool *pool)
>  {
> -	return bitmap_find_next_zero_area(map, size, start, nr, 0);
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> +
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit);
> 
> @@ -535,11 +744,13 @@ EXPORT_SYMBOL(gen_pool_first_fit);
>   * gen_pool_first_fit_align - find the first available region
>   * of memory matching the size requirement (alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: data for alignment
>   * @pool: pool to get order from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
> @@ -547,23 +758,32 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
>  {
>  	struct genpool_data_align *alignment;
>  	unsigned long align_mask;
> +	unsigned long bit_index;
>  	int order;
> 
>  	alignment = data;
>  	order = pool->min_alloc_order;
> -	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
> -	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
> +	align_mask = roundup_pow_of_two(
> +			ENTRIES_TO_BITS(mem_to_units(alignment->align,
> +						     order))) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit_align);
> 
>  /**
>   * gen_pool_fixed_alloc - reserve a specific region
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: data for alignment
>   * @pool: pool to get order from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
> @@ -571,20 +791,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
>  {
>  	struct genpool_data_fixed *fixed_data;
>  	int order;
> -	unsigned long offset_bit;
> -	unsigned long start_bit;
> +	unsigned long offset;
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> 
>  	fixed_data = data;
>  	order = pool->min_alloc_order;
> -	offset_bit = fixed_data->offset >> order;
>  	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
>  		return size;
> +	offset = fixed_data->offset >> order;
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start + offset),
> +					       ENTRIES_TO_BITS(nr), align_mask);
> +	if (bit_index != ENTRIES_TO_BITS(offset))
> +		return size;
> 
> -	start_bit = bitmap_find_next_zero_area(map, size,
> -			start + offset_bit, nr, 0);
> -	if (start_bit != offset_bit)
> -		start_bit = size;
> -	return start_bit;
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_fixed_alloc);
> 
> @@ -593,60 +816,84 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
>   * of memory matching the size requirement. The region will be aligned
>   * to the order of the size specified.
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit_order_align(unsigned long *map,
>  		unsigned long size, unsigned long start,
>  		unsigned int nr, void *data, struct gen_pool *pool)
>  {
> -	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
> -
> -	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> +
> +	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit_order_align);
> 
>  /**
>   * gen_pool_best_fit - find the best fitting region of memory
> - * macthing the size requirement (no alignment constraint)
> + * matching the size requirement (no alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
>   *
>   * Iterate over the bitmap to find the smallest free region
>   * which we can allocate the memory.
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
>  		struct gen_pool *pool)
>  {
> -	unsigned long start_bit = size;
> +	unsigned long start_bit = ENTRIES_TO_BITS(size);
>  	unsigned long len = size + 1;
>  	unsigned long index;
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> 
> -	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	index = BITS_DIV_ENTRIES(bit_index);
> 
>  	while (index < size) {
> -		int next_bit = find_next_bit(map, size, index + nr);
> -		if ((next_bit - index) < len) {
> -			len = next_bit - index;
> -			start_bit = index;
> +		int next_bit;
> +
> +		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
> +					 ENTRIES_TO_BITS(index + nr));
> +		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
> +			len = BITS_DIV_ENTRIES(next_bit) - index;
> +			start_bit = ENTRIES_TO_BITS(index);
>  			if (len == nr)
> -				return start_bit;
> +				return BITS_DIV_ENTRIES(start_bit);
>  		}
> -		index = bitmap_find_next_zero_area(map, size,
> -						   next_bit + 1, nr, 0);
> +		bit_index =
> +			bitmap_find_next_zero_area(map,
> +						   ENTRIES_TO_BITS(size),
> +						   next_bit + 1,
> +						   ENTRIES_TO_BITS(nr),
> +						   align_mask);
> +		index = BITS_DIV_ENTRIES(bit_index);
>  	}
> 
> -	return start_bit;
> +	return BITS_DIV_ENTRIES(start_bit);
>  }
> -EXPORT_SYMBOL(gen_pool_best_fit);
> 
>  static void devm_gen_pool_release(struct device *dev, void *res)
>  {
> @@ -672,7 +919,7 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
>   * @dev: device to retrieve the gen_pool from
>   * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
>   *
> - * Returns the gen_pool for the device if one is present, or NULL.
> + * Return: the gen_pool for the device if one is present, or NULL.
>   */
>  struct gen_pool *gen_pool_get(struct device *dev, const char *name)
>  {
> @@ -696,6 +943,8 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
>   * Create a new special memory pool that can be used to manage special purpose
>   * memory not managed by the regular kmalloc/kfree interface. The pool will be
>   * automatically destroyed by the device management code.
> + *
> + * Return: the address of the pool, if successful, otherwise NULL
>   */
>  struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
>  				      int nid, const char *name)
> @@ -743,7 +992,7 @@ EXPORT_SYMBOL(devm_gen_pool_create);
>   * @propname: property name containing phandle(s)
>   * @index: index into the phandle array
>   *
> - * Returns the pool that contains the chunk starting at the physical
> + * Return: the pool that contains the chunk starting at the physical
>   * address of the device tree node pointed at by the phandle property,
>   * or NULL if not found.
>   */
> -- 
> 2.14.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>
> 
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-11 12:24     ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-11 12:24 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
> The genalloc library is only capable of tracking if a certain unit of
> allocation is in use or not.
> 
> It is not capable of discerning where the memory associated to an
> allocation request begins and where it ends.
> 
> The reason is that units of allocations are tracked by using a bitmap,
> where each bit represents that the unit is either allocated (1) or
> available (0).
> 
> The user of the API must keep track of how much space was requested, if
> it ever needs to be freed.
> 
> This can cause errors being undetected.
> Examples:
> * Only a subset of the memory provided to an allocation request is freed
> * The memory from a subsequent allocation is freed
> * The memory being freed doesn't start at the beginning of an
>   allocation.
> 
> The bitmap is used because it allows to perform lockless read/write
> access, where this is supported by hw through cmpxchg.
> Similarly, it is possible to scan the bitmap for a sufficiently long
> sequence of zeros, to identify zones available for allocation.
> 
> This patch doubles the space reserved in the bitmap for each allocation,
> to track their beginning.
> 
> For details, see the documentation inside lib/genalloc.c
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   4 +-
>  lib/genalloc.c           | 527 ++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 390 insertions(+), 141 deletions(-)
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index 872f930f1b06..dcaa33e74b1c 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -32,7 +32,7 @@
> 
>  #include <linux/types.h>
>  #include <linux/spinlock_types.h>
> -#include <linux/atomic.h>
> +#include <linux/slab.h>
> 
>  struct device;
>  struct device_node;
> @@ -76,7 +76,7 @@ struct gen_pool_chunk {
>  	phys_addr_t phys_addr;		/* physical starting address of memory chunk */
>  	unsigned long start_addr;	/* start address of memory chunk */
>  	unsigned long end_addr;		/* end address of memory chunk (inclusive) */
> -	unsigned long bits[0];		/* bitmap for allocating memory chunk */
> +	unsigned long entries[0];	/* bitmap for allocating memory chunk */
>  };
> 
>  /*
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> index ca06adc4f445..044347163acb 100644
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -26,6 +26,74 @@
>   *
>   * This source code is licensed under the GNU General Public License,
>   * Version 2.  See the file COPYING for more details.
> + *
> + *
> + *
> + * Encoding of the bitmap tracking the allocations
> + * -----------------------------------------------
> + *
> + * The bitmap is composed of units of allocations.
> + *
> + * Each unit of allocation is represented using 2 consecutive bits.
> + *
> + * This makes it possible to encode, for each unit of allocation,
> + * information about:
> + *  - allocation status (busy/free)
> + *  - beginning of a sequennce of allocation units (first / successive)
> + *
> + *
> + * Dictionary of allocation units (msb to the left, lsb to the right):
> + *
> + * 11: first allocation unit in the allocation
> + * 10: any subsequent allocation unit (if any) in the allocation
> + * 00: available allocation unit
> + * 01: invalid
> + *
> + * Example, using the same notation as above - MSb.......LSb:
> + *
> + *  ...000010111100000010101011   <-- Read in this direction.
> + *     \__|\__|\|\____|\______|
> + *        |   | |     |       \___ 4 used allocation units
> + *        |   | |     \___________ 3 empty allocation units
> + *        |   | \_________________ 1 used allocation unit
> + *        |   \___________________ 2 used allocation units
> + *        \_______________________ 2 empty allocation units
> + *
> + * The encoding allows for lockless operations, such as:
> + * - search for a sufficiently large range of allocation units
> + * - reservation of a selected range of allocation units
> + * - release of a specific allocation
> + *
> + * The alignment at which to perform the research for sequence of empty
> + * allocation units (marked as zeros in the bitmap) is 2^1.
> + *
> + * This means that an allocation can start only at even places
> + * (bit 0, bit 2, etc.) in the bitmap.
> + *
> + * Therefore, the number of zeroes to look for must be twice the number
> + * of desired allocation units.
> + *
> + * When it's time to free the memory associated to an allocation request,
> + * it's a matter of checking if the corresponding allocation unit is
> + * really the beginning of an allocation (both bits are set to 1).
> + *
> + * Looking for the ending can also be performed locklessly.
> + * It's sufficient to identify the first mapped allocation unit
> + * that is represented either as free (00) or busy (11).
> + * Even if the allocation status should change in the meanwhile, it
> + * doesn't matter, since it can only transition between free (00) and
> + * first-allocated (11).
> + *
> + * The parameter indicating to the *_free() function the size of the
> + * space that should be freed can be either set to 0, for automated
> + * assessment, or it can be specified explicitly.
> + *
> + * In case it is specified explicitly, the value is verified agaisnt what
> + * the library is tracking internally.
> + *
> + * If ever needed, the bitmap could be extended, assigning larger amounts
> + * of bits to each allocation unit (the increase must follow powers of 2),
> + * to track other properties of the allocations.
>   */
> 
>  #include <linux/slab.h>
> @@ -36,118 +104,230 @@
>  #include <linux/genalloc.h>
>  #include <linux/of_device.h>
> 
> +#define ENTRY_ORDER 1UL
> +#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
> +#define ENTRY_HEAD ENTRY_MASK
> +#define ENTRY_UNUSED 0UL
> +#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
> +#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
> +#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
> +#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
> +#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
> +
> +#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
> +
> +/* Binary pattern of 1010...1010 that spans one unsigned long. */
> +#define MASK (~0UL / 3 * 2)
> +
> +/**
> + * get_bitmap_entry - extracts the specified entry from the bitmap
> + * @map: pointer to a bitmap
> + * @entry_index: the index of the desired entry in the bitmap
> + *
> + * Return: The requested bitmap.
> + */
> +static inline unsigned long get_bitmap_entry(unsigned long *map,
> +					    int entry_index)
> +{
> +	return (map[ENTRIES_DIV_LONGS(entry_index)] >>
> +		ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
> +		ENTRY_MASK;
> +}
> +
> +
> +/**
> + * mem_to_units - convert references to memory into orders of allocation

Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
for function comments. I haven't noticed any difference in the resulting
html, so I'm not sure if the brackets are actually required.

> + * @size: amount in bytes
> + * @order: power of 2 represented by each entry in the bitmap
> + *
> + * Returns the number of units representing the size.

Please s/Return/Return:/

> + */
> +static inline unsigned long mem_to_units(unsigned long size,
> +					 unsigned long order)
> +{
> +	return (size + (1UL << order) - 1) >> order;
> +}
> +
> +/**
> + * chunk_size - dimension of a chunk of memory, in bytes
> + * @chunk: pointer to the struct describing the chunk
> + *
> + * Return: The size of the chunk, in bytes.
> + */
>  static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
>  {
>  	return chunk->end_addr - chunk->start_addr + 1;
>  }
> 
> -static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
> +
> +/**
> + * set_bits_ll - according to the mask, sets the bits specified by
> + * value, at the address specified.
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to store
> + *
> + * Return: 0 upon success, -EBUSY otherwise
> + */
> +static int set_bits_ll(unsigned long *addr,
> +		       unsigned long mask, unsigned long value)
>  {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
> 
>  	nval = *addr;
>  	do {
> -		val = nval;
> -		if (val & mask_to_set)
> +		present = nval;
> +		if (present & mask)
>  			return -EBUSY;
> +		target =  present | value;
>  		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>  	return 0;
>  }
> 
> -static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
> +
> +/**
> + * clear_bits_ll - according to the mask, clears the bits specified by
> + * value, at the address specified.
> + * @addr: where to write
> + * @mask: filter to apply for the bits to alter
> + * @value: actual configuration of bits to clear
> + *
> + * Return: 0 upon success, -EBUSY otherwise
> + */
> +static int clear_bits_ll(unsigned long *addr,
> +			 unsigned long mask, unsigned long value)
>  {
> -	unsigned long val, nval;
> +	unsigned long nval;
> +	unsigned long present;
> +	unsigned long target;
> 
>  	nval = *addr;
> +	present = nval;
> +	if (unlikely((present & mask) ^ value))
> +		return -EBUSY;
>  	do {
> -		val = nval;
> -		if ((val & mask_to_clear) != mask_to_clear)
> +		present = nval;
> +		if (unlikely((present & mask) ^ value))
>  			return -EBUSY;
> +		target =  present & ~mask;
>  		cpu_relax();
> -	} while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
> -
> +	} while ((nval = cmpxchg(addr, present, target)) != target);
>  	return 0;
>  }
> 
> -/*
> - * bitmap_set_ll - set the specified number of bits at the specified position
> +
> +/**
> + * get_boundary - verify that an allocation effectively
> + * starts at the given address, then measure its length.
>   * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start_entry: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
>   *
> - * Set @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users set the same bit, one user will return remain bits, otherwise
> - * return 0.
> + * Return: the length of an allocation, otherwise -EINVAL if the
> + * parameters do not refer to a correct allocation.
>   */
> -static int bitmap_set_ll(unsigned long *map, int start, int nr)
> +static int get_boundary(unsigned long *map, int start_entry, int nentries)
>  {
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_set >= 0) {
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -		nr -= bits_to_set;
> -		bits_to_set = BITS_PER_LONG;
> -		mask_to_set = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
> -		if (set_bits_ll(p, mask_to_set))
> -			return nr;
> -	}
> +	int i;
> +	unsigned long bitmap_entry;
> 
> -	return 0;
> +
> +	if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
> +		return -EINVAL;
> +	for (i = start_entry + 1; i < nentries; i++) {
> +		bitmap_entry = get_bitmap_entry(map, i);
> +		if (bitmap_entry == ENTRY_HEAD ||
> +		    bitmap_entry == ENTRY_UNUSED)
> +			return i;
> +	}
> +	return nentries - start_entry;
>  }
> 
> +
> +#define SET_BITS 1
> +#define CLEAR_BITS 0
> +
>  /*
> - * bitmap_clear_ll - clear the specified number of bits at the specified position
> + * alter_bitmap_ll - set or clear the entries associated with an allocation
> + * @alteration: indicates if the bits selected should be set or cleared
>   * @map: pointer to a bitmap
> - * @start: a bit position in @map
> - * @nr: number of bits to set
> + * @start: the index of the first entry in the bitmap
> + * @nentries: number of entries to alter
> + *
> + * The modification happens lock-lessly.
> + * Several users can write to the same map simultaneously, without lock.
>   *
> - * Clear @nr bits start from @start in @map lock-lessly. Several users
> - * can set/clear the same bitmap simultaneously without lock. If two
> - * users clear the same bit, one user will return remain bits,
> - * otherwise return 0.
> + * Return: If two users alter the same bit, to one it will return
> + * remaining entries, to the other it will return 0.

And what if there are three or four concurrent users? ;-)

I believe that a more elaborate description about what happens with
concurrent attempts to alter the bitmap would be really helpful.

>   */
> -static int bitmap_clear_ll(unsigned long *map, int start, int nr)
> +static int alter_bitmap_ll(bool alteration, unsigned long *map,
> +			   int start_entry, int nentries)
>  {
> -	unsigned long *p = map + BIT_WORD(start);
> -	const int size = start + nr;
> -	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
> -	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
> -
> -	while (nr - bits_to_clear >= 0) {
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> -		nr -= bits_to_clear;
> -		bits_to_clear = BITS_PER_LONG;
> -		mask_to_clear = ~0UL;
> -		p++;
> -	}
> -	if (nr) {
> -		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> -		if (clear_bits_ll(p, mask_to_clear))
> -			return nr;
> +	unsigned long start_bit;
> +	unsigned long end_bit;
> +	unsigned long mask;
> +	unsigned long value;
> +	int nbits;
> +	int bits_to_write;
> +	int index;
> +	int (*action)(unsigned long *addr,
> +		      unsigned long mask, unsigned long value);
> +
> +	action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
> +
> +	/*
> +	 * Prepare for writing the initial part of the allocation, from
> +	 * starting entry, to the end of the UL bitmap element which
> +	 * contains it. It might be larger than the actual allocation.
> +	 */
> +	start_bit = ENTRIES_TO_BITS(start_entry);
> +	end_bit = ENTRIES_TO_BITS(start_entry + nentries);
> +	nbits = ENTRIES_TO_BITS(nentries);
> +	bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
> +	mask = BITMAP_FIRST_WORD_MASK(start_bit);
> +	/* Mark the beginning of the allocation. */
> +	value = MASK | (1UL << (start_bit % BITS_PER_LONG));
> +	index = BITS_DIV_LONGS(start_bit);
> +
> +	/*
> +	 * Writes entries to the bitmap, as long as the reminder is
> +	 * positive or zero.
> +	 * Might be skipped if the entries to write do not reach the end
> +	 * of a bitmap UL unit.
> +	 */
> +	while (nbits >= bits_to_write) {
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +		nbits -= bits_to_write;
> +		bits_to_write = BITS_PER_LONG;
> +		mask = ~0UL;
> +		value = MASK;
> +		index++;
>  	}
> 
> +	/* Takes care of the ending part of the entries to mark. */
> +	if (nbits > 0) {
> +		mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
> +		bits_to_write = nbits;
> +		if (action(map + index, mask, value & mask))
> +			return BITS_DIV_ENTRIES(nbits);
> +	}
>  	return 0;
>  }
> 
> +
>  /**
>   * gen_pool_create - create a new special memory pool
> - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
> + * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
>   * @nid: node id of the node the pool structure should be allocated on, or -1
>   *
>   * Create a new special memory pool that can be used to manage special purpose
>   * memory not managed by the regular kmalloc/kfree interface.
> + *
> + * Return: pointer to the pool, if successful, NULL otherwise
>   */
>  struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
>  {
> @@ -177,16 +357,18 @@ EXPORT_SYMBOL(gen_pool_create);
>   *
>   * Add a new chunk of special memory to the specified pool.
>   *
> - * Returns 0 on success or a -ve errno on failure.
> + * Return: 0 on success or a -ve errno on failure.
>   */
>  int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
>  		 size_t size, int nid)
>  {
>  	struct gen_pool_chunk *chunk;
> -	int nbits = size >> pool->min_alloc_order;
> -	int nbytes = sizeof(struct gen_pool_chunk) +
> -				BITS_TO_LONGS(nbits) * sizeof(long);
> +	int nentries;
> +	int nbytes;
> 
> +	nentries = size >> pool->min_alloc_order;
> +	nbytes = sizeof(struct gen_pool_chunk) +
> +		 ENTRIES_DIV_LONGS(nentries) * sizeof(long);
>  	chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
>  	if (unlikely(chunk == NULL))
>  		return -ENOMEM;
> @@ -209,7 +391,7 @@ EXPORT_SYMBOL(gen_pool_add_virt);
>   * @pool: pool to allocate from
>   * @addr: starting address of memory
>   *
> - * Returns the physical address on success, or -1 on error.
> + * Return: the physical address on success, or -1 on error.
>   */
>  phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
>  {
> @@ -248,7 +430,7 @@ void gen_pool_destroy(struct gen_pool *pool)
>  		list_del(&chunk->next_chunk);
> 
>  		end_bit = chunk_size(chunk) >> order;
> -		bit = find_next_bit(chunk->bits, end_bit, 0);
> +		bit = find_next_bit(chunk->entries, end_bit, 0);
>  		BUG_ON(bit < end_bit);
> 
>  		kfree(chunk);
> @@ -267,6 +449,8 @@ EXPORT_SYMBOL(gen_pool_destroy);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>  {
> @@ -285,6 +469,8 @@ EXPORT_SYMBOL(gen_pool_alloc);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  		genpool_algo_t algo, void *data)
> @@ -292,7 +478,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  	struct gen_pool_chunk *chunk;
>  	unsigned long addr = 0;
>  	int order = pool->min_alloc_order;
> -	int nbits, start_bit, end_bit, remain;
> +	int nentries, start_entry, end_entry, remain;
> 
>  #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	BUG_ON(in_nmi());
> @@ -301,29 +487,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
>  	if (size == 0)
>  		return 0;
> 
> -	nbits = (size + (1UL << order) - 1) >> order;
> +	nentries = mem_to_units(size, order);
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>  		if (size > atomic_long_read(&chunk->avail))
>  			continue;
> 
> -		start_bit = 0;
> -		end_bit = chunk_size(chunk) >> order;
> +		start_entry = 0;
> +		end_entry = chunk_size(chunk) >> order;
>  retry:
> -		start_bit = algo(chunk->bits, end_bit, start_bit,
> -				 nbits, data, pool);
> -		if (start_bit >= end_bit)
> +		start_entry = algo(chunk->entries, end_entry, start_entry,
> +				  nentries, data, pool);
> +		if (start_entry >= end_entry)
>  			continue;
> -		remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
> +		remain = alter_bitmap_ll(SET_BITS, chunk->entries,
> +					 start_entry, nentries);
>  		if (remain) {
> -			remain = bitmap_clear_ll(chunk->bits, start_bit,
> -						 nbits - remain);
> -			BUG_ON(remain);
> +			remain = alter_bitmap_ll(CLEAR_BITS,
> +						 chunk->entries,
> +						 start_entry,
> +						 nentries - remain);
>  			goto retry;
>  		}
> 
> -		addr = chunk->start_addr + ((unsigned long)start_bit << order);
> -		size = nbits << order;
> +		addr = chunk->start_addr +
> +			((unsigned long)start_entry << order);
> +		size = nentries << order;
>  		atomic_long_sub(size, &chunk->avail);
>  		break;
>  	}
> @@ -342,6 +531,8 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
>   * Uses the pool allocation function (with first-fit algorithm by default).
>   * Can not be used in NMI handler on architectures without
>   * NMI-safe cmpxchg implementation.
> + *
> + * Return: address of the memory allocated, otherwise NULL
>   */
>  void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>  {
> @@ -365,7 +556,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
>   * gen_pool_free - free allocated special memory back to the pool
>   * @pool: pool to free to
>   * @addr: starting address of memory to free back to pool
> - * @size: size in bytes of memory to free
> + * @size: size in bytes of memory to free or 0, for auto-detection
>   *
>   * Free previously allocated special memory back to the specified
>   * pool.  Can not be used in NMI handler on architectures without
> @@ -375,22 +566,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>  {
>  	struct gen_pool_chunk *chunk;
>  	int order = pool->min_alloc_order;
> -	int start_bit, nbits, remain;
> +	int start_entry, remaining_entries, nentries, remain;
> +	int boundary;
> 
>  #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	BUG_ON(in_nmi());
>  #endif
> 
> -	nbits = (size + (1UL << order) - 1) >> order;
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>  		if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
>  			BUG_ON(addr + size - 1 > chunk->end_addr);
> -			start_bit = (addr - chunk->start_addr) >> order;
> -			remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
> +			start_entry = (addr - chunk->start_addr) >> order;
> +			remaining_entries = (chunk->end_addr - addr) >> order;
> +			boundary = get_boundary(chunk->entries, start_entry,
> +						remaining_entries);
> +			BUG_ON(boundary < 0);
> +			nentries = boundary - start_entry;
> +			BUG_ON(size &&
> +			       (nentries != mem_to_units(size, order)));
> +			remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
> +						 start_entry, nentries);
>  			BUG_ON(remain);
> -			size = nbits << order;
> -			atomic_long_add(size, &chunk->avail);
> +			atomic_long_add(nentries << order, &chunk->avail);
>  			rcu_read_unlock();
>  			return;
>  		}
> @@ -428,8 +626,9 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk);
>   * @start:	start address
>   * @size:	size of the region
>   *
> - * Check if the range of addresses falls within the specified pool. Returns
> - * true if the entire range is contained in the pool and false otherwise.
> + * Check if the range of addresses falls within the specified pool.
> + *
> + * Return: true if the entire range is contained in the pool, false otherwise.
>   */
>  bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>  			size_t size)
> @@ -455,7 +654,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>   * gen_pool_avail - get available free space of the pool
>   * @pool: pool to get available free space
>   *
> - * Return available free space of the specified pool.
> + * Return: available free space of the specified pool.
>   */
>  size_t gen_pool_avail(struct gen_pool *pool)
>  {
> @@ -474,7 +673,7 @@ EXPORT_SYMBOL_GPL(gen_pool_avail);
>   * gen_pool_size - get size in bytes of memory managed by the pool
>   * @pool: pool to get size
>   *
> - * Return size in bytes of memory managed by the pool.
> + * Return: size in bytes of memory managed by the pool.
>   */
>  size_t gen_pool_size(struct gen_pool *pool)
>  {
> @@ -517,17 +716,27 @@ EXPORT_SYMBOL(gen_pool_set_algo);
>   * gen_pool_first_fit - find the first available region
>   * of memory matching the size requirement (no alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
>  		struct gen_pool *pool)
>  {
> -	return bitmap_find_next_zero_area(map, size, start, nr, 0);
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> +
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit);
> 
> @@ -535,11 +744,13 @@ EXPORT_SYMBOL(gen_pool_first_fit);
>   * gen_pool_first_fit_align - find the first available region
>   * of memory matching the size requirement (alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: data for alignment
>   * @pool: pool to get order from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
> @@ -547,23 +758,32 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
>  {
>  	struct genpool_data_align *alignment;
>  	unsigned long align_mask;
> +	unsigned long bit_index;
>  	int order;
> 
>  	alignment = data;
>  	order = pool->min_alloc_order;
> -	align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
> -	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
> +	align_mask = roundup_pow_of_two(
> +			ENTRIES_TO_BITS(mem_to_units(alignment->align,
> +						     order))) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit_align);
> 
>  /**
>   * gen_pool_fixed_alloc - reserve a specific region
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: data for alignment
>   * @pool: pool to get order from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
> @@ -571,20 +791,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
>  {
>  	struct genpool_data_fixed *fixed_data;
>  	int order;
> -	unsigned long offset_bit;
> -	unsigned long start_bit;
> +	unsigned long offset;
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> 
>  	fixed_data = data;
>  	order = pool->min_alloc_order;
> -	offset_bit = fixed_data->offset >> order;
>  	if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
>  		return size;
> +	offset = fixed_data->offset >> order;
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start + offset),
> +					       ENTRIES_TO_BITS(nr), align_mask);
> +	if (bit_index != ENTRIES_TO_BITS(offset))
> +		return size;
> 
> -	start_bit = bitmap_find_next_zero_area(map, size,
> -			start + offset_bit, nr, 0);
> -	if (start_bit != offset_bit)
> -		start_bit = size;
> -	return start_bit;
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_fixed_alloc);
> 
> @@ -593,60 +816,84 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
>   * of memory matching the size requirement. The region will be aligned
>   * to the order of the size specified.
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_first_fit_order_align(unsigned long *map,
>  		unsigned long size, unsigned long start,
>  		unsigned int nr, void *data, struct gen_pool *pool)
>  {
> -	unsigned long align_mask = roundup_pow_of_two(nr) - 1;
> -
> -	return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> +
> +	align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	return BITS_DIV_ENTRIES(bit_index);
>  }
>  EXPORT_SYMBOL(gen_pool_first_fit_order_align);
> 
>  /**
>   * gen_pool_best_fit - find the best fitting region of memory
> - * macthing the size requirement (no alignment constraint)
> + * matching the size requirement (no alignment constraint)
>   * @map: The address to base the search on
> - * @size: The bitmap size in bits
> - * @start: The bitnumber to start searching at
> - * @nr: The number of zeroed bits we're looking for
> + * @size: The number of allocation units in the bitmap
> + * @start: The allocation unit to start searching at
> + * @nr: The number of allocation units we're looking for
>   * @data: additional data - unused
>   * @pool: pool to find the fit region memory from
>   *
>   * Iterate over the bitmap to find the smallest free region
>   * which we can allocate the memory.
> + *
> + * Return: index of the memory allocated, otherwise the end of the range
>   */
>  unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
>  		unsigned long start, unsigned int nr, void *data,
>  		struct gen_pool *pool)
>  {
> -	unsigned long start_bit = size;
> +	unsigned long start_bit = ENTRIES_TO_BITS(size);
>  	unsigned long len = size + 1;
>  	unsigned long index;
> +	unsigned long align_mask;
> +	unsigned long bit_index;
> 
> -	index = bitmap_find_next_zero_area(map, size, start, nr, 0);
> +	align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
> +	bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
> +					       ENTRIES_TO_BITS(start),
> +					       ENTRIES_TO_BITS(nr),
> +					       align_mask);
> +	index = BITS_DIV_ENTRIES(bit_index);
> 
>  	while (index < size) {
> -		int next_bit = find_next_bit(map, size, index + nr);
> -		if ((next_bit - index) < len) {
> -			len = next_bit - index;
> -			start_bit = index;
> +		int next_bit;
> +
> +		next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
> +					 ENTRIES_TO_BITS(index + nr));
> +		if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
> +			len = BITS_DIV_ENTRIES(next_bit) - index;
> +			start_bit = ENTRIES_TO_BITS(index);
>  			if (len == nr)
> -				return start_bit;
> +				return BITS_DIV_ENTRIES(start_bit);
>  		}
> -		index = bitmap_find_next_zero_area(map, size,
> -						   next_bit + 1, nr, 0);
> +		bit_index =
> +			bitmap_find_next_zero_area(map,
> +						   ENTRIES_TO_BITS(size),
> +						   next_bit + 1,
> +						   ENTRIES_TO_BITS(nr),
> +						   align_mask);
> +		index = BITS_DIV_ENTRIES(bit_index);
>  	}
> 
> -	return start_bit;
> +	return BITS_DIV_ENTRIES(start_bit);
>  }
> -EXPORT_SYMBOL(gen_pool_best_fit);
> 
>  static void devm_gen_pool_release(struct device *dev, void *res)
>  {
> @@ -672,7 +919,7 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
>   * @dev: device to retrieve the gen_pool from
>   * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
>   *
> - * Returns the gen_pool for the device if one is present, or NULL.
> + * Return: the gen_pool for the device if one is present, or NULL.
>   */
>  struct gen_pool *gen_pool_get(struct device *dev, const char *name)
>  {
> @@ -696,6 +943,8 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
>   * Create a new special memory pool that can be used to manage special purpose
>   * memory not managed by the regular kmalloc/kfree interface. The pool will be
>   * automatically destroyed by the device management code.
> + *
> + * Return: the address of the pool, if successful, otherwise NULL
>   */
>  struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
>  				      int nid, const char *name)
> @@ -743,7 +992,7 @@ EXPORT_SYMBOL(devm_gen_pool_create);
>   * @propname: property name containing phandle(s)
>   * @index: index into the phandle array
>   *
> - * Returns the pool that contains the chunk starting at the physical
> + * Return: the pool that contains the chunk starting at the physical
>   * address of the device tree node pointed at by the phandle property,
>   * or NULL if not found.
>   */
> -- 
> 2.14.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-11  3:19   ` Igor Stoppa
  (?)
@ 2018-02-11 12:37     ` Mike Rapoport
  -1 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-11 12:37 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> The MMU available in many systems running Linux can often provide R/O
> protection to the memory pages it handles.
> 
> However, the MMU-based protection works efficiently only when said pages
> contain exclusively data that will not need further modifications.
> 
> Statically allocated variables can be segregated into a dedicated
> section, but this does not sit very well with dynamically allocated
> ones.
> 
> Dynamic allocation does not provide, currently, any means for grouping
> variables in memory pages that would contain exclusively data suitable
> for conversion to read only access mode.
> 
> The allocator here provided (pmalloc - protectable memory allocator)
> introduces the concept of pools of protectable memory.
> 
> A module can request a pool and then refer any allocation request to the
> pool handler it has received.
> 
> Once all the chunks of memory associated to a specific pool are
> initialized, the pool can be protected.
> 
> After this point, the pool can only be destroyed (it is up to the module
> to avoid any further references to the memory from the pool, after
> the destruction is invoked).
> 
> The latter case is mainly meant for releasing memory, when a module is
> unloaded.
> 
> A module can have as many pools as needed, for example to support the
> protection of data that is initialized in sufficiently distinct phases.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   3 +
>  include/linux/pmalloc.h  | 222 +++++++++++++++++++++
>  include/linux/vmalloc.h  |   1 +
>  lib/genalloc.c           |  27 +++
>  mm/Kconfig               |   6 +
>  mm/Makefile              |   1 +
>  mm/pmalloc.c             | 497 +++++++++++++++++++++++++++++++++++++++++++++++
>  mm/usercopy.c            |  33 ++++
>  8 files changed, 790 insertions(+)
>  create mode 100644 include/linux/pmalloc.h
>  create mode 100644 mm/pmalloc.c
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index dcaa33e74b1c..b6c4cea9fbd8 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
>  extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
>  		dma_addr_t *dma);
>  extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> +
> +extern void gen_pool_flush_chunk(struct gen_pool *pool,
> +				 struct gen_pool_chunk *chunk);
>  extern void gen_pool_for_each_chunk(struct gen_pool *,
>  	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
>  extern size_t gen_pool_avail(struct gen_pool *);
> diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
> new file mode 100644
> index 000000000000..624379a937c5
> --- /dev/null
> +++ b/include/linux/pmalloc.h
> @@ -0,0 +1,222 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.h: Header for Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#ifndef _LINUX_PMALLOC_H
> +#define _LINUX_PMALLOC_H
> +
> +
> +#include <linux/genalloc.h>
> +#include <linux/string.h>
> +
> +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
> +
> +/*
> + * Library for dynamic allocation of pools of memory that can be,
> + * after initialization, marked as read-only.
> + *
> + * This is intended to complement __read_only_after_init, for those cases
> + * where either it is not possible to know the initialization value before
> + * init is completed, or the amount of data is variable and can be
> + * determined only at run-time.
> + *
> + * ***WARNING***
> + * The user of the API is expected to synchronize:
> + * 1) allocation,
> + * 2) writes to the allocated memory,
> + * 3) write protection of the pool,
> + * 4) freeing of the allocated memory, and
> + * 5) destruction of the pool.
> + *
> + * For a non-threaded scenario, this type of locking is not even required.
> + *
> + * Even if the library were to provide support for locking, point 2)
> + * would still depend on the user taking the lock.
> + */
> +
> +
> +/**
> + * pmalloc_create_pool - create a new protectable memory pool
> + * @name: the name of the pool, enforced to be unique
> + * @min_alloc_order: log2 of the minimum allocation size obtainable
> + *                   from the pool
> + *
> + * Creates a new (empty) memory pool for allocation of protectable
> + * memory. Memory will be allocated upon request (through pmalloc).
> + *
> + * Return: pointer to the new pool upon success, otherwise a NULL.
> + */
> +struct gen_pool *pmalloc_create_pool(const char *name,
> +					 int min_alloc_order);
> +
> +/**
> + * is_pmalloc_object - validates the existence of an alleged object
> + * @ptr: address of the object
> + * @n: size of the object, in bytes
> + *
> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.

typo:                                            ^ memory

> + */
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +/**
> + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + *
> + * Prepares a chunk of the requested size.
> + * This is intended to both minimize latency in later memory requests and
> + * avoid sleeping during allocation.
> + * Memory allocated with prealloc is stored in one single chunk, as
> + * opposed to what is allocated on-demand when pmalloc runs out of free
> + * space already existing in the pool and has to invoke vmalloc.
> + * One additional advantage of pre-allocating larger chunks of memory is
> + * that the total slack tends to be smaller.
> + *
> + * Return: true if the vmalloc call was successful, false otherwise.
> + */
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
> +
> +/**
> + * pmalloc - allocate protectable memory from a pool
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Allocates memory from an unprotected pool. If the pool doesn't have
> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
> + * is made to add a new chunk of memory to the pool
> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> + * Otherwise, NULL is returned.
> + *
> + * Return: pointer to the memory requested upon success, NULL otherwise
> + * (either no memory available or pool already read-only).
> + */
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
> +
> +
> +/**
> + * pzalloc - zero-initialized version of pmalloc
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Executes pmalloc, initializing the memory requested to 0,
> + * before returning the pointer to it.
> + *
> + * Return: pointer to the zeroed memory requested, upon success, NULL
> + * otherwise (either no memory available or pool already read-only).
> + */
> +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	return pmalloc(pool, size, gfp | __GFP_ZERO);
> +}
> +
> +/**
> + * pmalloc_array - allocates an array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested for each element
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc, if it has a chance to succeed.
> + *
> + * Return: either NULL or the pmalloc result.
> + */
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> +				  size_t size, gfp_t flags)
> +{
> +	if (unlikely(!(pool && n && size)))
> +		return NULL;
> +	return pmalloc(pool, n * size, flags);
> +}
> +
> +/**
> + * pcalloc - allocates a 0-initialized array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc_array, if it has a chance to succeed.
> + *
> + * Return: either NULL or the pmalloc result.
> + */
> +static inline void *pcalloc(struct gen_pool *pool, size_t n,
> +			    size_t size, gfp_t flags)
> +{
> +	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
> +}
> +
> +/**
> + * pstrdup - duplicate a string, using pmalloc as allocator
> + * @pool: handle to the pool to be used for memory allocation
> + * @s: string to duplicate
> + * @gfp: flags for page allocation
> + *
> + * Generates a copy of the given string, allocating sufficient memory
> + * from the given pmalloc pool.
> + *
> + * Return: pointer to the replica, NULL in case of error.
> + */
> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> +	size_t len;
> +	char *buf;
> +
> +	if (unlikely(pool == NULL || s == NULL))
> +		return NULL;
> +
> +	len = strlen(s) + 1;
> +	buf = pmalloc(pool, len, gfp);
> +	if (likely(buf))
> +		strncpy(buf, s, len);
> +	return buf;
> +}
> +
> +/**
> + * pmalloc_protect_pool - turn a read/write pool read-only
> + * @pool: the pool to protect
> + *
> + * Write-protects all the memory chunks assigned to the pool.
> + * This prevents any further allocation.
> + *
> + * Return: 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_protect_pool(struct gen_pool *pool);
> +
> +/**
> + * pfree - mark as unused memory that was previously in use
> + * @pool: handle to the pool to be used for memory allocation
> + * @addr: the beginning of the memory area to be freed
> + *
> + * The behavior of pfree is different, depending on the state of the
> + * protection.
> + * If the pool is not yet protected, the memory is marked as unused and
> + * will be available for further allocations.
> + * If the pool is already protected, the memory is marked as unused, but
> + * it will still be impossible to perform further allocation, because of
> + * the existing protection.
> + * The freed memory, in this case, will be truly released only when the
> + * pool is destroyed.
> + */
> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> +	gen_pool_free(pool, (unsigned long)addr, 0);
> +}
> +
> +/**
> + * pmalloc_destroy_pool - destroys a pool and all the associated memory
> + * @pool: the pool to destroy
> + *
> + * All the memory that was allocated through pmalloc in the pool will be freed.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_destroy_pool(struct gen_pool *pool);
> +
> +#endif
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 1e5d8c392f15..116d280cca53 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
>  #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
> +#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
>  /* bits [20..32] reserved for arch specific ioremap internals */
> 
>  /*
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> index 044347163acb..e40a5db89439 100644
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -598,6 +598,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>  }
>  EXPORT_SYMBOL(gen_pool_free);
> 
> +
> +/**
> + * gen_pool_flush_chunk - drops all the allocations from a specific chunk
> + * @pool:	the generic memory pool
> + * @chunk:	The chunk to wipe clear.
> + *
> + * This is meant to be called only while destroying a pool. It's up to the
> + * caller to avoid races, but really, at this point the pool should have
> + * already been retired and have become unavailable for any other sort of
> + * operation.
> + */
> +void gen_pool_flush_chunk(struct gen_pool *pool,
> +			  struct gen_pool_chunk *chunk)
> +{
> +	size_t size;
> +
> +	if (unlikely(!(pool && chunk)))
> +		return;
> +
> +	size = chunk->end_addr + 1 - chunk->start_addr;
> +	memset(chunk->entries, 0,
> +	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
> +			    BITS_PER_BYTE));
> +	atomic_long_set(&chunk->avail, size);
> +}
> +
> +
>  /**
>   * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
>   * @pool:	the generic memory pool
> diff --git a/mm/Kconfig b/mm/Kconfig
> index c782e8fb7235..be578fbdce6d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -760,3 +760,9 @@ config GUP_BENCHMARK
>  	  performance of get_user_pages_fast().
> 
>  	  See tools/testing/selftests/vm/gup_benchmark.c
> +
> +config PROTECTABLE_MEMORY
> +    bool
> +    depends on ARCH_HAS_SET_MEMORY
> +    select GENERIC_ALLOCATOR
> +    default y
> diff --git a/mm/Makefile b/mm/Makefile
> index e669f02c5a54..959fdbdac118 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
>  obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
>  obj-$(CONFIG_SLOB) += slob.o
>  obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
> +obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
>  obj-$(CONFIG_KSM) += ksm.o
>  obj-$(CONFIG_PAGE_POISONING) += page_poison.o
>  obj-$(CONFIG_SLAB) += slab.o
> diff --git a/mm/pmalloc.c b/mm/pmalloc.c
> new file mode 100644
> index 000000000000..e94bfb407c92
> --- /dev/null
> +++ b/mm/pmalloc.c
> @@ -0,0 +1,497 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.c: Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/genalloc.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/atomic.h>
> +#include <linux/rculist.h>
> +#include <linux/set_memory.h>
> +#include <asm/cacheflush.h>
> +#include <asm/page.h>
> +
> +#include <linux/pmalloc.h>
> +/*
> + * pmalloc_data contains the data specific to a pmalloc pool,
> + * in a format compatible with the design of gen_alloc.
> + * Some of the fields are used for exposing the corresponding parameter
> + * to userspace, through sysfs.
> + */
> +struct pmalloc_data {
> +	struct gen_pool *pool;  /* Link back to the associated pool. */
> +	bool protected;     /* Status of the pool: RO or RW. */
> +	struct kobj_attribute attr_protected; /* Sysfs attribute. */
> +	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
> +	struct kobj_attribute attr_size;      /* Sysfs attribute. */
> +	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
> +	struct kobject *pool_kobject;
> +	struct list_head node; /* list of pools */
> +};
> +
> +static LIST_HEAD(pmalloc_final_list);
> +static LIST_HEAD(pmalloc_tmp_list);
> +static struct list_head *pmalloc_list = &pmalloc_tmp_list;
> +static DEFINE_MUTEX(pmalloc_mutex);
> +static struct kobject *pmalloc_kobject;
> +
> +static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
> +					   struct kobj_attribute *attr,
> +					   char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_protected);
> +	if (data->protected)
> +		return sprintf(buf, "protected\n");
> +	else
> +		return sprintf(buf, "unprotected\n");
> +}
> +
> +static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
> +				       struct kobj_attribute *attr,
> +				       char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_avail);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_avail(data->pool));
> +}
> +
> +static ssize_t pmalloc_pool_show_size(struct kobject *dev,
> +				      struct kobj_attribute *attr,
> +				      char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_size);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_size(data->pool));
> +}
> +
> +static void pool_chunk_number(struct gen_pool *pool,
> +			      struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long *counter = data;
> +
> +	(*counter)++;
> +}
> +
> +static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	struct pmalloc_data *data;
> +	unsigned long chunks_num = 0;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_chunks);
> +	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
> +	return sprintf(buf, "%lu\n", chunks_num);
> +}
> +
> +/* Exposes the pool and its attributes through sysfs. */
> +static struct kobject *pmalloc_connect(struct pmalloc_data *data)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +	struct kobject *kobj;
> +
> +	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
> +	if (unlikely(!kobj))
> +		return NULL;
> +
> +	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
> +		kobject_put(kobj);
> +		kobj = NULL;
> +	}
> +	return kobj;
> +}
> +
> +/* Removes the pool and its attributes from sysfs. */
> +static void pmalloc_disconnect(struct pmalloc_data *data,
> +			       struct kobject *kobj)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +
> +	sysfs_remove_files(kobj, attrs);
> +	kobject_put(kobj);
> +}
> +
> +/* Declares an attribute of the pool. */
> +#define pmalloc_attr_init(data, attr_name) \
> +do { \
> +	sysfs_attr_init(&data->attr_##attr_name.attr); \
> +	data->attr_##attr_name.attr.name = #attr_name; \
> +	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
> +	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
> +} while (0)
> +
> +struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
> +{
> +	struct gen_pool *pool;
> +	const char *pool_name;
> +	struct pmalloc_data *data;
> +
> +	if (!name) {
> +		WARN_ON(1);
> +		return NULL;
> +	}
> +
> +	if (min_alloc_order < 0)
> +		min_alloc_order = ilog2(sizeof(unsigned long));
> +
> +	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
> +	if (unlikely(!pool))
> +		return NULL;
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_for_each_entry(data, pmalloc_list, node)
> +		if (!strcmp(name, data->pool->name))
> +			goto same_name_err;
> +
> +	pool_name = kstrdup(name, GFP_KERNEL);
> +	if (unlikely(!pool_name))
> +		goto name_alloc_err;
> +
> +	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
> +	if (unlikely(!data))
> +		goto data_alloc_err;
> +
> +	data->protected = false;
> +	data->pool = pool;
> +	pmalloc_attr_init(data, protected);
> +	pmalloc_attr_init(data, avail);
> +	pmalloc_attr_init(data, size);
> +	pmalloc_attr_init(data, chunks);
> +	pool->data = data;
> +	pool->name = pool_name;
> +
> +	list_add(&data->node, pmalloc_list);
> +	if (pmalloc_list == &pmalloc_final_list)
> +		data->pool_kobject = pmalloc_connect(data);
> +	mutex_unlock(&pmalloc_mutex);
> +	return pool;
> +
> +data_alloc_err:
> +	kfree(pool_name);
> +name_alloc_err:
> +same_name_err:
> +	mutex_unlock(&pmalloc_mutex);
> +	gen_pool_destroy(pool);
> +	return NULL;
> +}
> +
> +static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(!req_size || !pool))
> +		return -1;
> +
> +	data = pool->data;
> +
> +	if (data == NULL)
> +		return -1;
> +
> +	if (unlikely(data->protected)) {
> +		WARN_ON(1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +
> +static inline bool chunk_tagging(void *chunk, bool tag)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +
> +	if (!is_vmalloc_addr(chunk))
> +		return false;
> +
> +	page = vmalloc_to_page(chunk);
> +	if (unlikely(!page))
> +		return false;
> +
> +	area = page->area;
> +	if (tag)
> +		area->flags |= VM_PMALLOC;
> +	else
> +		area->flags &= ~VM_PMALLOC;
> +	return true;
> +}
> +
> +
> +static inline bool tag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, true);
> +}
> +
> +
> +static inline bool untag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, false);
> +}
> +
> +enum {
> +	INVALID_PMALLOC_OBJECT = -1,
> +	NOT_PMALLOC_OBJECT = 0,
> +	VALID_PMALLOC_OBJECT = 1,
> +};
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +	unsigned long area_start;
> +	unsigned long area_end;
> +	unsigned long object_start;
> +	unsigned long object_end;
> +
> +
> +	/* is_pmalloc_object gets called pretty late, so chances are high
> +	 * that the object is indeed of vmalloc type
> +	 */
> +	if (unlikely(!is_vmalloc_addr(ptr)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	page = vmalloc_to_page(ptr);
> +	if (unlikely(!page))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area = page->area;
> +
> +	if (likely(!(area->flags & VM_PMALLOC)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area_start = (unsigned long)area->addr;
> +	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
> +	object_start = (unsigned long)ptr;
> +	object_end = object_start + n - 1;
> +
> +	if (likely((area_start <= object_start) &&
> +		   (object_end <= area_end)))
> +		return VALID_PMALLOC_OBJECT;
> +	else
> +		return INVALID_PMALLOC_OBJECT;
> +}
> +
> +
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +
> +	if (check_alloc_params(pool, size))
> +		return false;
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(chunk == NULL))
> +		return false;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error != 0))
> +		goto abort;
> +
> +	return true;
> +abort:
> +	vfree_atomic(chunk);
> +	return false;
> +
> +}
> +
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned long retval;
> +
> +	if (check_alloc_params(pool, size))
> +		return NULL;
> +
> +retry_alloc_from_pool:
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval)
> +		goto return_allocation;
> +
> +	if (unlikely((gfp & __GFP_ATOMIC))) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(!chunk)) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +	if (unlikely(!tag_chunk(chunk)))
> +		goto free;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error))
> +		goto abort;
> +
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval) {
> +return_allocation:
> +		*(size_t *)retval = size;
> +		if (gfp & __GFP_ZERO)
> +			memset((void *)retval, 0, size);
> +		return (void *)retval;
> +	}
> +	/* Here there is no test for __GFP_NO_FAIL because, in case of
> +	 * concurrent allocation, one thread might add a chunk to the
> +	 * pool and this memory could be allocated by another thread,
> +	 * before the first thread gets a chance to use it.
> +	 * As long as vmalloc succeeds, it's ok to retry.
> +	 */
> +	goto retry_alloc_from_pool;
> +abort:
> +	untag_chunk(chunk);
> +free:
> +	vfree_atomic(chunk);
> +	return NULL;
> +}
> +
> +static void pmalloc_chunk_set_protection(struct gen_pool *pool,
> +
> +					 struct gen_pool_chunk *chunk,
> +					 void *data)
> +{
> +	const bool *flag = data;
> +	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
> +	unsigned long pages = chunk_size / PAGE_SIZE;
> +
> +	BUG_ON(chunk_size & (PAGE_SIZE - 1));
> +
> +	if (*flag)
> +		set_memory_ro(chunk->start_addr, pages);
> +	else
> +		set_memory_rw(chunk->start_addr, pages);
> +}
> +
> +static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
> +{
> +	struct pmalloc_data *data;
> +	struct gen_pool_chunk *chunk;
> +
> +	if (unlikely(!pool))
> +		return -EINVAL;
> +
> +	data = pool->data;
> +
> +	if (unlikely(!data))
> +		return -EINVAL;
> +
> +	if (unlikely(data->protected == protection)) {
> +		WARN_ON(1);
> +		return 0;
> +	}
> +
> +	data->protected = protection;
> +	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
> +		pmalloc_chunk_set_protection(pool, chunk, &protection);
> +	return 0;
> +}
> +
> +int pmalloc_protect_pool(struct gen_pool *pool)
> +{
> +	return pmalloc_pool_set_protection(pool, true);
> +}
> +
> +
> +static void pmalloc_chunk_free(struct gen_pool *pool,
> +			       struct gen_pool_chunk *chunk, void *data)
> +{
> +	untag_chunk(chunk);
> +	gen_pool_flush_chunk(pool, chunk);
> +	vfree_atomic((void *)chunk->start_addr);
> +}
> +
> +
> +int pmalloc_destroy_pool(struct gen_pool *pool)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(pool == NULL))
> +		return -EINVAL;
> +
> +	data = pool->data;
> +
> +	if (unlikely(data == NULL))
> +		return -EINVAL;
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_del(&data->node);
> +	mutex_unlock(&pmalloc_mutex);
> +
> +	if (likely(data->pool_kobject))
> +		pmalloc_disconnect(data, data->pool_kobject);
> +
> +	pmalloc_pool_set_protection(pool, false);
> +	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
> +	gen_pool_destroy(pool);
> +	kfree(data);
> +	return 0;
> +}
> +
> +/**
> + * When the sysfs is ready to receive registrations, connect all the
> + * pools previously created. Also enable further pools to be connected
> + * right away.
> + */

This does not seem as kernel-doc comment. Please either remove the second *
from the opening comment mark or reformat the comment.

> +static int __init pmalloc_late_init(void)
> +{
> +	struct pmalloc_data *data, *n;
> +
> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
> +
> +	mutex_lock(&pmalloc_mutex);
> +	pmalloc_list = &pmalloc_final_list;
> +
> +	if (likely(pmalloc_kobject != NULL)) {
> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
> +			list_move(&data->node, &pmalloc_final_list);
> +			pmalloc_connect(data);
> +		}
> +	}
> +	mutex_unlock(&pmalloc_mutex);
> +	return 0;
> +}
> +late_initcall(pmalloc_late_init);
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index e9e9325f7638..946ce051e296 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
>  	}
>  }
> 
> +#ifdef CONFIG_PROTECTABLE_MEMORY
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +static void check_pmalloc_object(const void *ptr, unsigned long n,
> +				 bool to_user)
> +{
> +	int retv;
> +
> +	retv = is_pmalloc_object(ptr, n);
> +	if (unlikely(retv)) {
> +		if (unlikely(!to_user))
> +			usercopy_abort("pmalloc",
> +				       "trying to write to pmalloc object",
> +				       to_user, (const unsigned long)ptr, n);
> +		if (retv < 0)
> +			usercopy_abort("pmalloc",
> +				       "invalid pmalloc object",
> +				       to_user, (const unsigned long)ptr, n);
> +	}
> +}
> +
> +#else
> +
> +static void check_pmalloc_object(const void *ptr, unsigned long n,
> +				 bool to_user)
> +{
> +}
> +#endif
> +
>  /*
>   * Validates that the given object is:
>   * - not bogus address
> @@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
> 
>  	/* Check for object in kernel to avoid text exposure. */
>  	check_kernel_text_object((const unsigned long)ptr, n, to_user);
> +
> +	/* Check if object is from a pmalloc chunk. */
> +	check_pmalloc_object(ptr, n, to_user);
>  }
>  EXPORT_SYMBOL(__check_object_size);
> -- 
> 2.14.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-11 12:37     ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-11 12:37 UTC (permalink / raw)
  To: linux-security-module

On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> The MMU available in many systems running Linux can often provide R/O
> protection to the memory pages it handles.
> 
> However, the MMU-based protection works efficiently only when said pages
> contain exclusively data that will not need further modifications.
> 
> Statically allocated variables can be segregated into a dedicated
> section, but this does not sit very well with dynamically allocated
> ones.
> 
> Dynamic allocation does not provide, currently, any means for grouping
> variables in memory pages that would contain exclusively data suitable
> for conversion to read only access mode.
> 
> The allocator here provided (pmalloc - protectable memory allocator)
> introduces the concept of pools of protectable memory.
> 
> A module can request a pool and then refer any allocation request to the
> pool handler it has received.
> 
> Once all the chunks of memory associated to a specific pool are
> initialized, the pool can be protected.
> 
> After this point, the pool can only be destroyed (it is up to the module
> to avoid any further references to the memory from the pool, after
> the destruction is invoked).
> 
> The latter case is mainly meant for releasing memory, when a module is
> unloaded.
> 
> A module can have as many pools as needed, for example to support the
> protection of data that is initialized in sufficiently distinct phases.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   3 +
>  include/linux/pmalloc.h  | 222 +++++++++++++++++++++
>  include/linux/vmalloc.h  |   1 +
>  lib/genalloc.c           |  27 +++
>  mm/Kconfig               |   6 +
>  mm/Makefile              |   1 +
>  mm/pmalloc.c             | 497 +++++++++++++++++++++++++++++++++++++++++++++++
>  mm/usercopy.c            |  33 ++++
>  8 files changed, 790 insertions(+)
>  create mode 100644 include/linux/pmalloc.h
>  create mode 100644 mm/pmalloc.c
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index dcaa33e74b1c..b6c4cea9fbd8 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
>  extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
>  		dma_addr_t *dma);
>  extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> +
> +extern void gen_pool_flush_chunk(struct gen_pool *pool,
> +				 struct gen_pool_chunk *chunk);
>  extern void gen_pool_for_each_chunk(struct gen_pool *,
>  	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
>  extern size_t gen_pool_avail(struct gen_pool *);
> diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
> new file mode 100644
> index 000000000000..624379a937c5
> --- /dev/null
> +++ b/include/linux/pmalloc.h
> @@ -0,0 +1,222 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.h: Header for Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#ifndef _LINUX_PMALLOC_H
> +#define _LINUX_PMALLOC_H
> +
> +
> +#include <linux/genalloc.h>
> +#include <linux/string.h>
> +
> +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
> +
> +/*
> + * Library for dynamic allocation of pools of memory that can be,
> + * after initialization, marked as read-only.
> + *
> + * This is intended to complement __read_only_after_init, for those cases
> + * where either it is not possible to know the initialization value before
> + * init is completed, or the amount of data is variable and can be
> + * determined only at run-time.
> + *
> + * ***WARNING***
> + * The user of the API is expected to synchronize:
> + * 1) allocation,
> + * 2) writes to the allocated memory,
> + * 3) write protection of the pool,
> + * 4) freeing of the allocated memory, and
> + * 5) destruction of the pool.
> + *
> + * For a non-threaded scenario, this type of locking is not even required.
> + *
> + * Even if the library were to provide support for locking, point 2)
> + * would still depend on the user taking the lock.
> + */
> +
> +
> +/**
> + * pmalloc_create_pool - create a new protectable memory pool
> + * @name: the name of the pool, enforced to be unique
> + * @min_alloc_order: log2 of the minimum allocation size obtainable
> + *                   from the pool
> + *
> + * Creates a new (empty) memory pool for allocation of protectable
> + * memory. Memory will be allocated upon request (through pmalloc).
> + *
> + * Return: pointer to the new pool upon success, otherwise a NULL.
> + */
> +struct gen_pool *pmalloc_create_pool(const char *name,
> +					 int min_alloc_order);
> +
> +/**
> + * is_pmalloc_object - validates the existence of an alleged object
> + * @ptr: address of the object
> + * @n: size of the object, in bytes
> + *
> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.

typo:                                            ^ memory

> + */
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +/**
> + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + *
> + * Prepares a chunk of the requested size.
> + * This is intended to both minimize latency in later memory requests and
> + * avoid sleeping during allocation.
> + * Memory allocated with prealloc is stored in one single chunk, as
> + * opposed to what is allocated on-demand when pmalloc runs out of free
> + * space already existing in the pool and has to invoke vmalloc.
> + * One additional advantage of pre-allocating larger chunks of memory is
> + * that the total slack tends to be smaller.
> + *
> + * Return: true if the vmalloc call was successful, false otherwise.
> + */
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
> +
> +/**
> + * pmalloc - allocate protectable memory from a pool
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Allocates memory from an unprotected pool. If the pool doesn't have
> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
> + * is made to add a new chunk of memory to the pool
> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> + * Otherwise, NULL is returned.
> + *
> + * Return: pointer to the memory requested upon success, NULL otherwise
> + * (either no memory available or pool already read-only).
> + */
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
> +
> +
> +/**
> + * pzalloc - zero-initialized version of pmalloc
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Executes pmalloc, initializing the memory requested to 0,
> + * before returning the pointer to it.
> + *
> + * Return: pointer to the zeroed memory requested, upon success, NULL
> + * otherwise (either no memory available or pool already read-only).
> + */
> +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	return pmalloc(pool, size, gfp | __GFP_ZERO);
> +}
> +
> +/**
> + * pmalloc_array - allocates an array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested for each element
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc, if it has a chance to succeed.
> + *
> + * Return: either NULL or the pmalloc result.
> + */
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> +				  size_t size, gfp_t flags)
> +{
> +	if (unlikely(!(pool && n && size)))
> +		return NULL;
> +	return pmalloc(pool, n * size, flags);
> +}
> +
> +/**
> + * pcalloc - allocates a 0-initialized array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc_array, if it has a chance to succeed.
> + *
> + * Return: either NULL or the pmalloc result.
> + */
> +static inline void *pcalloc(struct gen_pool *pool, size_t n,
> +			    size_t size, gfp_t flags)
> +{
> +	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
> +}
> +
> +/**
> + * pstrdup - duplicate a string, using pmalloc as allocator
> + * @pool: handle to the pool to be used for memory allocation
> + * @s: string to duplicate
> + * @gfp: flags for page allocation
> + *
> + * Generates a copy of the given string, allocating sufficient memory
> + * from the given pmalloc pool.
> + *
> + * Return: pointer to the replica, NULL in case of error.
> + */
> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> +	size_t len;
> +	char *buf;
> +
> +	if (unlikely(pool == NULL || s == NULL))
> +		return NULL;
> +
> +	len = strlen(s) + 1;
> +	buf = pmalloc(pool, len, gfp);
> +	if (likely(buf))
> +		strncpy(buf, s, len);
> +	return buf;
> +}
> +
> +/**
> + * pmalloc_protect_pool - turn a read/write pool read-only
> + * @pool: the pool to protect
> + *
> + * Write-protects all the memory chunks assigned to the pool.
> + * This prevents any further allocation.
> + *
> + * Return: 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_protect_pool(struct gen_pool *pool);
> +
> +/**
> + * pfree - mark as unused memory that was previously in use
> + * @pool: handle to the pool to be used for memory allocation
> + * @addr: the beginning of the memory area to be freed
> + *
> + * The behavior of pfree is different, depending on the state of the
> + * protection.
> + * If the pool is not yet protected, the memory is marked as unused and
> + * will be available for further allocations.
> + * If the pool is already protected, the memory is marked as unused, but
> + * it will still be impossible to perform further allocation, because of
> + * the existing protection.
> + * The freed memory, in this case, will be truly released only when the
> + * pool is destroyed.
> + */
> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> +	gen_pool_free(pool, (unsigned long)addr, 0);
> +}
> +
> +/**
> + * pmalloc_destroy_pool - destroys a pool and all the associated memory
> + * @pool: the pool to destroy
> + *
> + * All the memory that was allocated through pmalloc in the pool will be freed.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_destroy_pool(struct gen_pool *pool);
> +
> +#endif
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 1e5d8c392f15..116d280cca53 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
>  #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
> +#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
>  /* bits [20..32] reserved for arch specific ioremap internals */
> 
>  /*
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> index 044347163acb..e40a5db89439 100644
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -598,6 +598,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>  }
>  EXPORT_SYMBOL(gen_pool_free);
> 
> +
> +/**
> + * gen_pool_flush_chunk - drops all the allocations from a specific chunk
> + * @pool:	the generic memory pool
> + * @chunk:	The chunk to wipe clear.
> + *
> + * This is meant to be called only while destroying a pool. It's up to the
> + * caller to avoid races, but really, at this point the pool should have
> + * already been retired and have become unavailable for any other sort of
> + * operation.
> + */
> +void gen_pool_flush_chunk(struct gen_pool *pool,
> +			  struct gen_pool_chunk *chunk)
> +{
> +	size_t size;
> +
> +	if (unlikely(!(pool && chunk)))
> +		return;
> +
> +	size = chunk->end_addr + 1 - chunk->start_addr;
> +	memset(chunk->entries, 0,
> +	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
> +			    BITS_PER_BYTE));
> +	atomic_long_set(&chunk->avail, size);
> +}
> +
> +
>  /**
>   * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
>   * @pool:	the generic memory pool
> diff --git a/mm/Kconfig b/mm/Kconfig
> index c782e8fb7235..be578fbdce6d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -760,3 +760,9 @@ config GUP_BENCHMARK
>  	  performance of get_user_pages_fast().
> 
>  	  See tools/testing/selftests/vm/gup_benchmark.c
> +
> +config PROTECTABLE_MEMORY
> +    bool
> +    depends on ARCH_HAS_SET_MEMORY
> +    select GENERIC_ALLOCATOR
> +    default y
> diff --git a/mm/Makefile b/mm/Makefile
> index e669f02c5a54..959fdbdac118 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
>  obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
>  obj-$(CONFIG_SLOB) += slob.o
>  obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
> +obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
>  obj-$(CONFIG_KSM) += ksm.o
>  obj-$(CONFIG_PAGE_POISONING) += page_poison.o
>  obj-$(CONFIG_SLAB) += slab.o
> diff --git a/mm/pmalloc.c b/mm/pmalloc.c
> new file mode 100644
> index 000000000000..e94bfb407c92
> --- /dev/null
> +++ b/mm/pmalloc.c
> @@ -0,0 +1,497 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.c: Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/genalloc.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/atomic.h>
> +#include <linux/rculist.h>
> +#include <linux/set_memory.h>
> +#include <asm/cacheflush.h>
> +#include <asm/page.h>
> +
> +#include <linux/pmalloc.h>
> +/*
> + * pmalloc_data contains the data specific to a pmalloc pool,
> + * in a format compatible with the design of gen_alloc.
> + * Some of the fields are used for exposing the corresponding parameter
> + * to userspace, through sysfs.
> + */
> +struct pmalloc_data {
> +	struct gen_pool *pool;  /* Link back to the associated pool. */
> +	bool protected;     /* Status of the pool: RO or RW. */
> +	struct kobj_attribute attr_protected; /* Sysfs attribute. */
> +	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
> +	struct kobj_attribute attr_size;      /* Sysfs attribute. */
> +	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
> +	struct kobject *pool_kobject;
> +	struct list_head node; /* list of pools */
> +};
> +
> +static LIST_HEAD(pmalloc_final_list);
> +static LIST_HEAD(pmalloc_tmp_list);
> +static struct list_head *pmalloc_list = &pmalloc_tmp_list;
> +static DEFINE_MUTEX(pmalloc_mutex);
> +static struct kobject *pmalloc_kobject;
> +
> +static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
> +					   struct kobj_attribute *attr,
> +					   char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_protected);
> +	if (data->protected)
> +		return sprintf(buf, "protected\n");
> +	else
> +		return sprintf(buf, "unprotected\n");
> +}
> +
> +static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
> +				       struct kobj_attribute *attr,
> +				       char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_avail);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_avail(data->pool));
> +}
> +
> +static ssize_t pmalloc_pool_show_size(struct kobject *dev,
> +				      struct kobj_attribute *attr,
> +				      char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_size);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_size(data->pool));
> +}
> +
> +static void pool_chunk_number(struct gen_pool *pool,
> +			      struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long *counter = data;
> +
> +	(*counter)++;
> +}
> +
> +static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	struct pmalloc_data *data;
> +	unsigned long chunks_num = 0;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_chunks);
> +	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
> +	return sprintf(buf, "%lu\n", chunks_num);
> +}
> +
> +/* Exposes the pool and its attributes through sysfs. */
> +static struct kobject *pmalloc_connect(struct pmalloc_data *data)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +	struct kobject *kobj;
> +
> +	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
> +	if (unlikely(!kobj))
> +		return NULL;
> +
> +	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
> +		kobject_put(kobj);
> +		kobj = NULL;
> +	}
> +	return kobj;
> +}
> +
> +/* Removes the pool and its attributes from sysfs. */
> +static void pmalloc_disconnect(struct pmalloc_data *data,
> +			       struct kobject *kobj)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +
> +	sysfs_remove_files(kobj, attrs);
> +	kobject_put(kobj);
> +}
> +
> +/* Declares an attribute of the pool. */
> +#define pmalloc_attr_init(data, attr_name) \
> +do { \
> +	sysfs_attr_init(&data->attr_##attr_name.attr); \
> +	data->attr_##attr_name.attr.name = #attr_name; \
> +	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
> +	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
> +} while (0)
> +
> +struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
> +{
> +	struct gen_pool *pool;
> +	const char *pool_name;
> +	struct pmalloc_data *data;
> +
> +	if (!name) {
> +		WARN_ON(1);
> +		return NULL;
> +	}
> +
> +	if (min_alloc_order < 0)
> +		min_alloc_order = ilog2(sizeof(unsigned long));
> +
> +	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
> +	if (unlikely(!pool))
> +		return NULL;
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_for_each_entry(data, pmalloc_list, node)
> +		if (!strcmp(name, data->pool->name))
> +			goto same_name_err;
> +
> +	pool_name = kstrdup(name, GFP_KERNEL);
> +	if (unlikely(!pool_name))
> +		goto name_alloc_err;
> +
> +	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
> +	if (unlikely(!data))
> +		goto data_alloc_err;
> +
> +	data->protected = false;
> +	data->pool = pool;
> +	pmalloc_attr_init(data, protected);
> +	pmalloc_attr_init(data, avail);
> +	pmalloc_attr_init(data, size);
> +	pmalloc_attr_init(data, chunks);
> +	pool->data = data;
> +	pool->name = pool_name;
> +
> +	list_add(&data->node, pmalloc_list);
> +	if (pmalloc_list == &pmalloc_final_list)
> +		data->pool_kobject = pmalloc_connect(data);
> +	mutex_unlock(&pmalloc_mutex);
> +	return pool;
> +
> +data_alloc_err:
> +	kfree(pool_name);
> +name_alloc_err:
> +same_name_err:
> +	mutex_unlock(&pmalloc_mutex);
> +	gen_pool_destroy(pool);
> +	return NULL;
> +}
> +
> +static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(!req_size || !pool))
> +		return -1;
> +
> +	data = pool->data;
> +
> +	if (data == NULL)
> +		return -1;
> +
> +	if (unlikely(data->protected)) {
> +		WARN_ON(1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +
> +static inline bool chunk_tagging(void *chunk, bool tag)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +
> +	if (!is_vmalloc_addr(chunk))
> +		return false;
> +
> +	page = vmalloc_to_page(chunk);
> +	if (unlikely(!page))
> +		return false;
> +
> +	area = page->area;
> +	if (tag)
> +		area->flags |= VM_PMALLOC;
> +	else
> +		area->flags &= ~VM_PMALLOC;
> +	return true;
> +}
> +
> +
> +static inline bool tag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, true);
> +}
> +
> +
> +static inline bool untag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, false);
> +}
> +
> +enum {
> +	INVALID_PMALLOC_OBJECT = -1,
> +	NOT_PMALLOC_OBJECT = 0,
> +	VALID_PMALLOC_OBJECT = 1,
> +};
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +	unsigned long area_start;
> +	unsigned long area_end;
> +	unsigned long object_start;
> +	unsigned long object_end;
> +
> +
> +	/* is_pmalloc_object gets called pretty late, so chances are high
> +	 * that the object is indeed of vmalloc type
> +	 */
> +	if (unlikely(!is_vmalloc_addr(ptr)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	page = vmalloc_to_page(ptr);
> +	if (unlikely(!page))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area = page->area;
> +
> +	if (likely(!(area->flags & VM_PMALLOC)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area_start = (unsigned long)area->addr;
> +	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
> +	object_start = (unsigned long)ptr;
> +	object_end = object_start + n - 1;
> +
> +	if (likely((area_start <= object_start) &&
> +		   (object_end <= area_end)))
> +		return VALID_PMALLOC_OBJECT;
> +	else
> +		return INVALID_PMALLOC_OBJECT;
> +}
> +
> +
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +
> +	if (check_alloc_params(pool, size))
> +		return false;
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(chunk == NULL))
> +		return false;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error != 0))
> +		goto abort;
> +
> +	return true;
> +abort:
> +	vfree_atomic(chunk);
> +	return false;
> +
> +}
> +
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned long retval;
> +
> +	if (check_alloc_params(pool, size))
> +		return NULL;
> +
> +retry_alloc_from_pool:
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval)
> +		goto return_allocation;
> +
> +	if (unlikely((gfp & __GFP_ATOMIC))) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(!chunk)) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +	if (unlikely(!tag_chunk(chunk)))
> +		goto free;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error))
> +		goto abort;
> +
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval) {
> +return_allocation:
> +		*(size_t *)retval = size;
> +		if (gfp & __GFP_ZERO)
> +			memset((void *)retval, 0, size);
> +		return (void *)retval;
> +	}
> +	/* Here there is no test for __GFP_NO_FAIL because, in case of
> +	 * concurrent allocation, one thread might add a chunk to the
> +	 * pool and this memory could be allocated by another thread,
> +	 * before the first thread gets a chance to use it.
> +	 * As long as vmalloc succeeds, it's ok to retry.
> +	 */
> +	goto retry_alloc_from_pool;
> +abort:
> +	untag_chunk(chunk);
> +free:
> +	vfree_atomic(chunk);
> +	return NULL;
> +}
> +
> +static void pmalloc_chunk_set_protection(struct gen_pool *pool,
> +
> +					 struct gen_pool_chunk *chunk,
> +					 void *data)
> +{
> +	const bool *flag = data;
> +	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
> +	unsigned long pages = chunk_size / PAGE_SIZE;
> +
> +	BUG_ON(chunk_size & (PAGE_SIZE - 1));
> +
> +	if (*flag)
> +		set_memory_ro(chunk->start_addr, pages);
> +	else
> +		set_memory_rw(chunk->start_addr, pages);
> +}
> +
> +static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
> +{
> +	struct pmalloc_data *data;
> +	struct gen_pool_chunk *chunk;
> +
> +	if (unlikely(!pool))
> +		return -EINVAL;
> +
> +	data = pool->data;
> +
> +	if (unlikely(!data))
> +		return -EINVAL;
> +
> +	if (unlikely(data->protected == protection)) {
> +		WARN_ON(1);
> +		return 0;
> +	}
> +
> +	data->protected = protection;
> +	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
> +		pmalloc_chunk_set_protection(pool, chunk, &protection);
> +	return 0;
> +}
> +
> +int pmalloc_protect_pool(struct gen_pool *pool)
> +{
> +	return pmalloc_pool_set_protection(pool, true);
> +}
> +
> +
> +static void pmalloc_chunk_free(struct gen_pool *pool,
> +			       struct gen_pool_chunk *chunk, void *data)
> +{
> +	untag_chunk(chunk);
> +	gen_pool_flush_chunk(pool, chunk);
> +	vfree_atomic((void *)chunk->start_addr);
> +}
> +
> +
> +int pmalloc_destroy_pool(struct gen_pool *pool)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(pool == NULL))
> +		return -EINVAL;
> +
> +	data = pool->data;
> +
> +	if (unlikely(data == NULL))
> +		return -EINVAL;
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_del(&data->node);
> +	mutex_unlock(&pmalloc_mutex);
> +
> +	if (likely(data->pool_kobject))
> +		pmalloc_disconnect(data, data->pool_kobject);
> +
> +	pmalloc_pool_set_protection(pool, false);
> +	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
> +	gen_pool_destroy(pool);
> +	kfree(data);
> +	return 0;
> +}
> +
> +/**
> + * When the sysfs is ready to receive registrations, connect all the
> + * pools previously created. Also enable further pools to be connected
> + * right away.
> + */

This does not seem as kernel-doc comment. Please either remove the second *
from the opening comment mark or reformat the comment.

> +static int __init pmalloc_late_init(void)
> +{
> +	struct pmalloc_data *data, *n;
> +
> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
> +
> +	mutex_lock(&pmalloc_mutex);
> +	pmalloc_list = &pmalloc_final_list;
> +
> +	if (likely(pmalloc_kobject != NULL)) {
> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
> +			list_move(&data->node, &pmalloc_final_list);
> +			pmalloc_connect(data);
> +		}
> +	}
> +	mutex_unlock(&pmalloc_mutex);
> +	return 0;
> +}
> +late_initcall(pmalloc_late_init);
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index e9e9325f7638..946ce051e296 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
>  	}
>  }
> 
> +#ifdef CONFIG_PROTECTABLE_MEMORY
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +static void check_pmalloc_object(const void *ptr, unsigned long n,
> +				 bool to_user)
> +{
> +	int retv;
> +
> +	retv = is_pmalloc_object(ptr, n);
> +	if (unlikely(retv)) {
> +		if (unlikely(!to_user))
> +			usercopy_abort("pmalloc",
> +				       "trying to write to pmalloc object",
> +				       to_user, (const unsigned long)ptr, n);
> +		if (retv < 0)
> +			usercopy_abort("pmalloc",
> +				       "invalid pmalloc object",
> +				       to_user, (const unsigned long)ptr, n);
> +	}
> +}
> +
> +#else
> +
> +static void check_pmalloc_object(const void *ptr, unsigned long n,
> +				 bool to_user)
> +{
> +}
> +#endif
> +
>  /*
>   * Validates that the given object is:
>   * - not bogus address
> @@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
> 
>  	/* Check for object in kernel to avoid text exposure. */
>  	check_kernel_text_object((const unsigned long)ptr, n, to_user);
> +
> +	/* Check if object is from a pmalloc chunk. */
> +	check_pmalloc_object(ptr, n, to_user);
>  }
>  EXPORT_SYMBOL(__check_object_size);
> -- 
> 2.14.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-11 12:37     ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-11 12:37 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> The MMU available in many systems running Linux can often provide R/O
> protection to the memory pages it handles.
> 
> However, the MMU-based protection works efficiently only when said pages
> contain exclusively data that will not need further modifications.
> 
> Statically allocated variables can be segregated into a dedicated
> section, but this does not sit very well with dynamically allocated
> ones.
> 
> Dynamic allocation does not provide, currently, any means for grouping
> variables in memory pages that would contain exclusively data suitable
> for conversion to read only access mode.
> 
> The allocator here provided (pmalloc - protectable memory allocator)
> introduces the concept of pools of protectable memory.
> 
> A module can request a pool and then refer any allocation request to the
> pool handler it has received.
> 
> Once all the chunks of memory associated to a specific pool are
> initialized, the pool can be protected.
> 
> After this point, the pool can only be destroyed (it is up to the module
> to avoid any further references to the memory from the pool, after
> the destruction is invoked).
> 
> The latter case is mainly meant for releasing memory, when a module is
> unloaded.
> 
> A module can have as many pools as needed, for example to support the
> protection of data that is initialized in sufficiently distinct phases.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   3 +
>  include/linux/pmalloc.h  | 222 +++++++++++++++++++++
>  include/linux/vmalloc.h  |   1 +
>  lib/genalloc.c           |  27 +++
>  mm/Kconfig               |   6 +
>  mm/Makefile              |   1 +
>  mm/pmalloc.c             | 497 +++++++++++++++++++++++++++++++++++++++++++++++
>  mm/usercopy.c            |  33 ++++
>  8 files changed, 790 insertions(+)
>  create mode 100644 include/linux/pmalloc.h
>  create mode 100644 mm/pmalloc.c
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index dcaa33e74b1c..b6c4cea9fbd8 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
>  extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
>  		dma_addr_t *dma);
>  extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> +
> +extern void gen_pool_flush_chunk(struct gen_pool *pool,
> +				 struct gen_pool_chunk *chunk);
>  extern void gen_pool_for_each_chunk(struct gen_pool *,
>  	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
>  extern size_t gen_pool_avail(struct gen_pool *);
> diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
> new file mode 100644
> index 000000000000..624379a937c5
> --- /dev/null
> +++ b/include/linux/pmalloc.h
> @@ -0,0 +1,222 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.h: Header for Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#ifndef _LINUX_PMALLOC_H
> +#define _LINUX_PMALLOC_H
> +
> +
> +#include <linux/genalloc.h>
> +#include <linux/string.h>
> +
> +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
> +
> +/*
> + * Library for dynamic allocation of pools of memory that can be,
> + * after initialization, marked as read-only.
> + *
> + * This is intended to complement __read_only_after_init, for those cases
> + * where either it is not possible to know the initialization value before
> + * init is completed, or the amount of data is variable and can be
> + * determined only at run-time.
> + *
> + * ***WARNING***
> + * The user of the API is expected to synchronize:
> + * 1) allocation,
> + * 2) writes to the allocated memory,
> + * 3) write protection of the pool,
> + * 4) freeing of the allocated memory, and
> + * 5) destruction of the pool.
> + *
> + * For a non-threaded scenario, this type of locking is not even required.
> + *
> + * Even if the library were to provide support for locking, point 2)
> + * would still depend on the user taking the lock.
> + */
> +
> +
> +/**
> + * pmalloc_create_pool - create a new protectable memory pool
> + * @name: the name of the pool, enforced to be unique
> + * @min_alloc_order: log2 of the minimum allocation size obtainable
> + *                   from the pool
> + *
> + * Creates a new (empty) memory pool for allocation of protectable
> + * memory. Memory will be allocated upon request (through pmalloc).
> + *
> + * Return: pointer to the new pool upon success, otherwise a NULL.
> + */
> +struct gen_pool *pmalloc_create_pool(const char *name,
> +					 int min_alloc_order);
> +
> +/**
> + * is_pmalloc_object - validates the existence of an alleged object
> + * @ptr: address of the object
> + * @n: size of the object, in bytes
> + *
> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.

typo:                                            ^ memory

> + */
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +/**
> + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + *
> + * Prepares a chunk of the requested size.
> + * This is intended to both minimize latency in later memory requests and
> + * avoid sleeping during allocation.
> + * Memory allocated with prealloc is stored in one single chunk, as
> + * opposed to what is allocated on-demand when pmalloc runs out of free
> + * space already existing in the pool and has to invoke vmalloc.
> + * One additional advantage of pre-allocating larger chunks of memory is
> + * that the total slack tends to be smaller.
> + *
> + * Return: true if the vmalloc call was successful, false otherwise.
> + */
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
> +
> +/**
> + * pmalloc - allocate protectable memory from a pool
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Allocates memory from an unprotected pool. If the pool doesn't have
> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
> + * is made to add a new chunk of memory to the pool
> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> + * Otherwise, NULL is returned.
> + *
> + * Return: pointer to the memory requested upon success, NULL otherwise
> + * (either no memory available or pool already read-only).
> + */
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
> +
> +
> +/**
> + * pzalloc - zero-initialized version of pmalloc
> + * @pool: handle to the pool to be used for memory allocation
> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Executes pmalloc, initializing the memory requested to 0,
> + * before returning the pointer to it.
> + *
> + * Return: pointer to the zeroed memory requested, upon success, NULL
> + * otherwise (either no memory available or pool already read-only).
> + */
> +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	return pmalloc(pool, size, gfp | __GFP_ZERO);
> +}
> +
> +/**
> + * pmalloc_array - allocates an array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested for each element
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc, if it has a chance to succeed.
> + *
> + * Return: either NULL or the pmalloc result.
> + */
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> +				  size_t size, gfp_t flags)
> +{
> +	if (unlikely(!(pool && n && size)))
> +		return NULL;
> +	return pmalloc(pool, n * size, flags);
> +}
> +
> +/**
> + * pcalloc - allocates a 0-initialized array according to the parameters
> + * @pool: handle to the pool to be used for memory allocation
> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc_array, if it has a chance to succeed.
> + *
> + * Return: either NULL or the pmalloc result.
> + */
> +static inline void *pcalloc(struct gen_pool *pool, size_t n,
> +			    size_t size, gfp_t flags)
> +{
> +	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
> +}
> +
> +/**
> + * pstrdup - duplicate a string, using pmalloc as allocator
> + * @pool: handle to the pool to be used for memory allocation
> + * @s: string to duplicate
> + * @gfp: flags for page allocation
> + *
> + * Generates a copy of the given string, allocating sufficient memory
> + * from the given pmalloc pool.
> + *
> + * Return: pointer to the replica, NULL in case of error.
> + */
> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> +	size_t len;
> +	char *buf;
> +
> +	if (unlikely(pool == NULL || s == NULL))
> +		return NULL;
> +
> +	len = strlen(s) + 1;
> +	buf = pmalloc(pool, len, gfp);
> +	if (likely(buf))
> +		strncpy(buf, s, len);
> +	return buf;
> +}
> +
> +/**
> + * pmalloc_protect_pool - turn a read/write pool read-only
> + * @pool: the pool to protect
> + *
> + * Write-protects all the memory chunks assigned to the pool.
> + * This prevents any further allocation.
> + *
> + * Return: 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_protect_pool(struct gen_pool *pool);
> +
> +/**
> + * pfree - mark as unused memory that was previously in use
> + * @pool: handle to the pool to be used for memory allocation
> + * @addr: the beginning of the memory area to be freed
> + *
> + * The behavior of pfree is different, depending on the state of the
> + * protection.
> + * If the pool is not yet protected, the memory is marked as unused and
> + * will be available for further allocations.
> + * If the pool is already protected, the memory is marked as unused, but
> + * it will still be impossible to perform further allocation, because of
> + * the existing protection.
> + * The freed memory, in this case, will be truly released only when the
> + * pool is destroyed.
> + */
> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> +	gen_pool_free(pool, (unsigned long)addr, 0);
> +}
> +
> +/**
> + * pmalloc_destroy_pool - destroys a pool and all the associated memory
> + * @pool: the pool to destroy
> + *
> + * All the memory that was allocated through pmalloc in the pool will be freed.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_destroy_pool(struct gen_pool *pool);
> +
> +#endif
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 1e5d8c392f15..116d280cca53 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
>  #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
> +#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
>  /* bits [20..32] reserved for arch specific ioremap internals */
> 
>  /*
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> index 044347163acb..e40a5db89439 100644
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -598,6 +598,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>  }
>  EXPORT_SYMBOL(gen_pool_free);
> 
> +
> +/**
> + * gen_pool_flush_chunk - drops all the allocations from a specific chunk
> + * @pool:	the generic memory pool
> + * @chunk:	The chunk to wipe clear.
> + *
> + * This is meant to be called only while destroying a pool. It's up to the
> + * caller to avoid races, but really, at this point the pool should have
> + * already been retired and have become unavailable for any other sort of
> + * operation.
> + */
> +void gen_pool_flush_chunk(struct gen_pool *pool,
> +			  struct gen_pool_chunk *chunk)
> +{
> +	size_t size;
> +
> +	if (unlikely(!(pool && chunk)))
> +		return;
> +
> +	size = chunk->end_addr + 1 - chunk->start_addr;
> +	memset(chunk->entries, 0,
> +	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
> +			    BITS_PER_BYTE));
> +	atomic_long_set(&chunk->avail, size);
> +}
> +
> +
>  /**
>   * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
>   * @pool:	the generic memory pool
> diff --git a/mm/Kconfig b/mm/Kconfig
> index c782e8fb7235..be578fbdce6d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -760,3 +760,9 @@ config GUP_BENCHMARK
>  	  performance of get_user_pages_fast().
> 
>  	  See tools/testing/selftests/vm/gup_benchmark.c
> +
> +config PROTECTABLE_MEMORY
> +    bool
> +    depends on ARCH_HAS_SET_MEMORY
> +    select GENERIC_ALLOCATOR
> +    default y
> diff --git a/mm/Makefile b/mm/Makefile
> index e669f02c5a54..959fdbdac118 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
>  obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
>  obj-$(CONFIG_SLOB) += slob.o
>  obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
> +obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
>  obj-$(CONFIG_KSM) += ksm.o
>  obj-$(CONFIG_PAGE_POISONING) += page_poison.o
>  obj-$(CONFIG_SLAB) += slab.o
> diff --git a/mm/pmalloc.c b/mm/pmalloc.c
> new file mode 100644
> index 000000000000..e94bfb407c92
> --- /dev/null
> +++ b/mm/pmalloc.c
> @@ -0,0 +1,497 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.c: Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/genalloc.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/atomic.h>
> +#include <linux/rculist.h>
> +#include <linux/set_memory.h>
> +#include <asm/cacheflush.h>
> +#include <asm/page.h>
> +
> +#include <linux/pmalloc.h>
> +/*
> + * pmalloc_data contains the data specific to a pmalloc pool,
> + * in a format compatible with the design of gen_alloc.
> + * Some of the fields are used for exposing the corresponding parameter
> + * to userspace, through sysfs.
> + */
> +struct pmalloc_data {
> +	struct gen_pool *pool;  /* Link back to the associated pool. */
> +	bool protected;     /* Status of the pool: RO or RW. */
> +	struct kobj_attribute attr_protected; /* Sysfs attribute. */
> +	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
> +	struct kobj_attribute attr_size;      /* Sysfs attribute. */
> +	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
> +	struct kobject *pool_kobject;
> +	struct list_head node; /* list of pools */
> +};
> +
> +static LIST_HEAD(pmalloc_final_list);
> +static LIST_HEAD(pmalloc_tmp_list);
> +static struct list_head *pmalloc_list = &pmalloc_tmp_list;
> +static DEFINE_MUTEX(pmalloc_mutex);
> +static struct kobject *pmalloc_kobject;
> +
> +static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
> +					   struct kobj_attribute *attr,
> +					   char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_protected);
> +	if (data->protected)
> +		return sprintf(buf, "protected\n");
> +	else
> +		return sprintf(buf, "unprotected\n");
> +}
> +
> +static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
> +				       struct kobj_attribute *attr,
> +				       char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_avail);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_avail(data->pool));
> +}
> +
> +static ssize_t pmalloc_pool_show_size(struct kobject *dev,
> +				      struct kobj_attribute *attr,
> +				      char *buf)
> +{
> +	struct pmalloc_data *data;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_size);
> +	return sprintf(buf, "%lu\n",
> +		       (unsigned long)gen_pool_size(data->pool));
> +}
> +
> +static void pool_chunk_number(struct gen_pool *pool,
> +			      struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long *counter = data;
> +
> +	(*counter)++;
> +}
> +
> +static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	struct pmalloc_data *data;
> +	unsigned long chunks_num = 0;
> +
> +	data = container_of(attr, struct pmalloc_data, attr_chunks);
> +	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
> +	return sprintf(buf, "%lu\n", chunks_num);
> +}
> +
> +/* Exposes the pool and its attributes through sysfs. */
> +static struct kobject *pmalloc_connect(struct pmalloc_data *data)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +	struct kobject *kobj;
> +
> +	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
> +	if (unlikely(!kobj))
> +		return NULL;
> +
> +	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
> +		kobject_put(kobj);
> +		kobj = NULL;
> +	}
> +	return kobj;
> +}
> +
> +/* Removes the pool and its attributes from sysfs. */
> +static void pmalloc_disconnect(struct pmalloc_data *data,
> +			       struct kobject *kobj)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +
> +	sysfs_remove_files(kobj, attrs);
> +	kobject_put(kobj);
> +}
> +
> +/* Declares an attribute of the pool. */
> +#define pmalloc_attr_init(data, attr_name) \
> +do { \
> +	sysfs_attr_init(&data->attr_##attr_name.attr); \
> +	data->attr_##attr_name.attr.name = #attr_name; \
> +	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
> +	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
> +} while (0)
> +
> +struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
> +{
> +	struct gen_pool *pool;
> +	const char *pool_name;
> +	struct pmalloc_data *data;
> +
> +	if (!name) {
> +		WARN_ON(1);
> +		return NULL;
> +	}
> +
> +	if (min_alloc_order < 0)
> +		min_alloc_order = ilog2(sizeof(unsigned long));
> +
> +	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
> +	if (unlikely(!pool))
> +		return NULL;
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_for_each_entry(data, pmalloc_list, node)
> +		if (!strcmp(name, data->pool->name))
> +			goto same_name_err;
> +
> +	pool_name = kstrdup(name, GFP_KERNEL);
> +	if (unlikely(!pool_name))
> +		goto name_alloc_err;
> +
> +	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
> +	if (unlikely(!data))
> +		goto data_alloc_err;
> +
> +	data->protected = false;
> +	data->pool = pool;
> +	pmalloc_attr_init(data, protected);
> +	pmalloc_attr_init(data, avail);
> +	pmalloc_attr_init(data, size);
> +	pmalloc_attr_init(data, chunks);
> +	pool->data = data;
> +	pool->name = pool_name;
> +
> +	list_add(&data->node, pmalloc_list);
> +	if (pmalloc_list == &pmalloc_final_list)
> +		data->pool_kobject = pmalloc_connect(data);
> +	mutex_unlock(&pmalloc_mutex);
> +	return pool;
> +
> +data_alloc_err:
> +	kfree(pool_name);
> +name_alloc_err:
> +same_name_err:
> +	mutex_unlock(&pmalloc_mutex);
> +	gen_pool_destroy(pool);
> +	return NULL;
> +}
> +
> +static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(!req_size || !pool))
> +		return -1;
> +
> +	data = pool->data;
> +
> +	if (data == NULL)
> +		return -1;
> +
> +	if (unlikely(data->protected)) {
> +		WARN_ON(1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +
> +static inline bool chunk_tagging(void *chunk, bool tag)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +
> +	if (!is_vmalloc_addr(chunk))
> +		return false;
> +
> +	page = vmalloc_to_page(chunk);
> +	if (unlikely(!page))
> +		return false;
> +
> +	area = page->area;
> +	if (tag)
> +		area->flags |= VM_PMALLOC;
> +	else
> +		area->flags &= ~VM_PMALLOC;
> +	return true;
> +}
> +
> +
> +static inline bool tag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, true);
> +}
> +
> +
> +static inline bool untag_chunk(void *chunk)
> +{
> +	return chunk_tagging(chunk, false);
> +}
> +
> +enum {
> +	INVALID_PMALLOC_OBJECT = -1,
> +	NOT_PMALLOC_OBJECT = 0,
> +	VALID_PMALLOC_OBJECT = 1,
> +};
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +	unsigned long area_start;
> +	unsigned long area_end;
> +	unsigned long object_start;
> +	unsigned long object_end;
> +
> +
> +	/* is_pmalloc_object gets called pretty late, so chances are high
> +	 * that the object is indeed of vmalloc type
> +	 */
> +	if (unlikely(!is_vmalloc_addr(ptr)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	page = vmalloc_to_page(ptr);
> +	if (unlikely(!page))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area = page->area;
> +
> +	if (likely(!(area->flags & VM_PMALLOC)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area_start = (unsigned long)area->addr;
> +	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
> +	object_start = (unsigned long)ptr;
> +	object_end = object_start + n - 1;
> +
> +	if (likely((area_start <= object_start) &&
> +		   (object_end <= area_end)))
> +		return VALID_PMALLOC_OBJECT;
> +	else
> +		return INVALID_PMALLOC_OBJECT;
> +}
> +
> +
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +
> +	if (check_alloc_params(pool, size))
> +		return false;
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(chunk == NULL))
> +		return false;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error != 0))
> +		goto abort;
> +
> +	return true;
> +abort:
> +	vfree_atomic(chunk);
> +	return false;
> +
> +}
> +
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned long retval;
> +
> +	if (check_alloc_params(pool, size))
> +		return NULL;
> +
> +retry_alloc_from_pool:
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval)
> +		goto return_allocation;
> +
> +	if (unlikely((gfp & __GFP_ATOMIC))) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(!chunk)) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +	if (unlikely(!tag_chunk(chunk)))
> +		goto free;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error))
> +		goto abort;
> +
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval) {
> +return_allocation:
> +		*(size_t *)retval = size;
> +		if (gfp & __GFP_ZERO)
> +			memset((void *)retval, 0, size);
> +		return (void *)retval;
> +	}
> +	/* Here there is no test for __GFP_NO_FAIL because, in case of
> +	 * concurrent allocation, one thread might add a chunk to the
> +	 * pool and this memory could be allocated by another thread,
> +	 * before the first thread gets a chance to use it.
> +	 * As long as vmalloc succeeds, it's ok to retry.
> +	 */
> +	goto retry_alloc_from_pool;
> +abort:
> +	untag_chunk(chunk);
> +free:
> +	vfree_atomic(chunk);
> +	return NULL;
> +}
> +
> +static void pmalloc_chunk_set_protection(struct gen_pool *pool,
> +
> +					 struct gen_pool_chunk *chunk,
> +					 void *data)
> +{
> +	const bool *flag = data;
> +	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
> +	unsigned long pages = chunk_size / PAGE_SIZE;
> +
> +	BUG_ON(chunk_size & (PAGE_SIZE - 1));
> +
> +	if (*flag)
> +		set_memory_ro(chunk->start_addr, pages);
> +	else
> +		set_memory_rw(chunk->start_addr, pages);
> +}
> +
> +static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
> +{
> +	struct pmalloc_data *data;
> +	struct gen_pool_chunk *chunk;
> +
> +	if (unlikely(!pool))
> +		return -EINVAL;
> +
> +	data = pool->data;
> +
> +	if (unlikely(!data))
> +		return -EINVAL;
> +
> +	if (unlikely(data->protected == protection)) {
> +		WARN_ON(1);
> +		return 0;
> +	}
> +
> +	data->protected = protection;
> +	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
> +		pmalloc_chunk_set_protection(pool, chunk, &protection);
> +	return 0;
> +}
> +
> +int pmalloc_protect_pool(struct gen_pool *pool)
> +{
> +	return pmalloc_pool_set_protection(pool, true);
> +}
> +
> +
> +static void pmalloc_chunk_free(struct gen_pool *pool,
> +			       struct gen_pool_chunk *chunk, void *data)
> +{
> +	untag_chunk(chunk);
> +	gen_pool_flush_chunk(pool, chunk);
> +	vfree_atomic((void *)chunk->start_addr);
> +}
> +
> +
> +int pmalloc_destroy_pool(struct gen_pool *pool)
> +{
> +	struct pmalloc_data *data;
> +
> +	if (unlikely(pool == NULL))
> +		return -EINVAL;
> +
> +	data = pool->data;
> +
> +	if (unlikely(data == NULL))
> +		return -EINVAL;
> +
> +	mutex_lock(&pmalloc_mutex);
> +	list_del(&data->node);
> +	mutex_unlock(&pmalloc_mutex);
> +
> +	if (likely(data->pool_kobject))
> +		pmalloc_disconnect(data, data->pool_kobject);
> +
> +	pmalloc_pool_set_protection(pool, false);
> +	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
> +	gen_pool_destroy(pool);
> +	kfree(data);
> +	return 0;
> +}
> +
> +/**
> + * When the sysfs is ready to receive registrations, connect all the
> + * pools previously created. Also enable further pools to be connected
> + * right away.
> + */

This does not seem as kernel-doc comment. Please either remove the second *
from the opening comment mark or reformat the comment.

> +static int __init pmalloc_late_init(void)
> +{
> +	struct pmalloc_data *data, *n;
> +
> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
> +
> +	mutex_lock(&pmalloc_mutex);
> +	pmalloc_list = &pmalloc_final_list;
> +
> +	if (likely(pmalloc_kobject != NULL)) {
> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
> +			list_move(&data->node, &pmalloc_final_list);
> +			pmalloc_connect(data);
> +		}
> +	}
> +	mutex_unlock(&pmalloc_mutex);
> +	return 0;
> +}
> +late_initcall(pmalloc_late_init);
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index e9e9325f7638..946ce051e296 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
>  	}
>  }
> 
> +#ifdef CONFIG_PROTECTABLE_MEMORY
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +static void check_pmalloc_object(const void *ptr, unsigned long n,
> +				 bool to_user)
> +{
> +	int retv;
> +
> +	retv = is_pmalloc_object(ptr, n);
> +	if (unlikely(retv)) {
> +		if (unlikely(!to_user))
> +			usercopy_abort("pmalloc",
> +				       "trying to write to pmalloc object",
> +				       to_user, (const unsigned long)ptr, n);
> +		if (retv < 0)
> +			usercopy_abort("pmalloc",
> +				       "invalid pmalloc object",
> +				       to_user, (const unsigned long)ptr, n);
> +	}
> +}
> +
> +#else
> +
> +static void check_pmalloc_object(const void *ptr, unsigned long n,
> +				 bool to_user)
> +{
> +}
> +#endif
> +
>  /*
>   * Validates that the given object is:
>   * - not bogus address
> @@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
> 
>  	/* Check for object in kernel to avoid text exposure. */
>  	check_kernel_text_object((const unsigned long)ptr, n, to_user);
> +
> +	/* Check if object is from a pmalloc chunk. */
> +	check_pmalloc_object(ptr, n, to_user);
>  }
>  EXPORT_SYMBOL(__check_object_size);
> -- 
> 2.14.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 2/6] genalloc: selftest
  2018-02-11  3:19   ` Igor Stoppa
  (?)
@ 2018-02-11 20:22     ` Philippe Ombredanne
  -1 siblings, 0 replies; 146+ messages in thread
From: Philippe Ombredanne @ 2018-02-11 20:22 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Matthew Wilcox, Randy Dunlap, Jonathan Corbet, Kees Cook, mhocko,
	labbott, jglisse, Christoph Hellwig, cl, linux-security-module,
	linux-mm, LKML, kernel-hardening

On Sun, Feb 11, 2018 at 4:19 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> Introduce a set of macros for writing concise test cases for genalloc.
>
> The test cases are meant to provide regression testing, when working on
> new functionality for genalloc.
>
> Primarily they are meant to confirm that the various allocation strategy
> will continue to work as expected.
>
> The execution of the self testing is controlled through a Kconfig option.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>

<snip>

> --- /dev/null
> +++ b/include/linux/genalloc-selftest.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0

nit... For a comment in .h this line should be instead its own comment
as the first line:
> +/* SPDX-License-Identifier: GPL-2.0 */

<snip>

> --- /dev/null
> +++ b/lib/genalloc-selftest.c
> @@ -0,0 +1,400 @@
> +/* SPDX-License-Identifier: GPL-2.0

And for a comment in .c this line should use C++ style as the first line:

> +// SPDX-License-Identifier: GPL-2.0

Please check the docs for this (I know this can feel surprising but
this has been debated at great length on list)

Thank you!
-- 
Cordially
Philippe Ombredanne

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 2/6] genalloc: selftest
@ 2018-02-11 20:22     ` Philippe Ombredanne
  0 siblings, 0 replies; 146+ messages in thread
From: Philippe Ombredanne @ 2018-02-11 20:22 UTC (permalink / raw)
  To: linux-security-module

On Sun, Feb 11, 2018 at 4:19 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> Introduce a set of macros for writing concise test cases for genalloc.
>
> The test cases are meant to provide regression testing, when working on
> new functionality for genalloc.
>
> Primarily they are meant to confirm that the various allocation strategy
> will continue to work as expected.
>
> The execution of the self testing is controlled through a Kconfig option.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>

<snip>

> --- /dev/null
> +++ b/include/linux/genalloc-selftest.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0

nit... For a comment in .h this line should be instead its own comment
as the first line:
> +/* SPDX-License-Identifier: GPL-2.0 */

<snip>

> --- /dev/null
> +++ b/lib/genalloc-selftest.c
> @@ -0,0 +1,400 @@
> +/* SPDX-License-Identifier: GPL-2.0

And for a comment in .c this line should use C++ style as the first line:

> +// SPDX-License-Identifier: GPL-2.0

Please check the docs for this (I know this can feel surprising but
this has been debated at great length on list)

Thank you!
-- 
Cordially
Philippe Ombredanne
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 2/6] genalloc: selftest
@ 2018-02-11 20:22     ` Philippe Ombredanne
  0 siblings, 0 replies; 146+ messages in thread
From: Philippe Ombredanne @ 2018-02-11 20:22 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: Matthew Wilcox, Randy Dunlap, Jonathan Corbet, Kees Cook, mhocko,
	labbott, jglisse, Christoph Hellwig, cl, linux-security-module,
	linux-mm, LKML, kernel-hardening

On Sun, Feb 11, 2018 at 4:19 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
> Introduce a set of macros for writing concise test cases for genalloc.
>
> The test cases are meant to provide regression testing, when working on
> new functionality for genalloc.
>
> Primarily they are meant to confirm that the various allocation strategy
> will continue to work as expected.
>
> The execution of the self testing is controlled through a Kconfig option.
>
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>

<snip>

> --- /dev/null
> +++ b/include/linux/genalloc-selftest.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0

nit... For a comment in .h this line should be instead its own comment
as the first line:
> +/* SPDX-License-Identifier: GPL-2.0 */

<snip>

> --- /dev/null
> +++ b/lib/genalloc-selftest.c
> @@ -0,0 +1,400 @@
> +/* SPDX-License-Identifier: GPL-2.0

And for a comment in .c this line should use C++ style as the first line:

> +// SPDX-License-Identifier: GPL-2.0

Please check the docs for this (I know this can feel surprising but
this has been debated at great length on list)

Thank you!
-- 
Cordially
Philippe Ombredanne

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 2/6] genalloc: selftest
  2018-02-11 20:22     ` Philippe Ombredanne
  (?)
@ 2018-02-11 20:27       ` Randy Dunlap
  -1 siblings, 0 replies; 146+ messages in thread
From: Randy Dunlap @ 2018-02-11 20:27 UTC (permalink / raw)
  To: Philippe Ombredanne, Igor Stoppa
  Cc: Matthew Wilcox, Jonathan Corbet, Kees Cook, mhocko, labbott,
	jglisse, Christoph Hellwig, cl, linux-security-module, linux-mm,
	LKML, kernel-hardening

On 02/11/18 12:22, Philippe Ombredanne wrote:
> On Sun, Feb 11, 2018 at 4:19 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> Introduce a set of macros for writing concise test cases for genalloc.
>>
>> The test cases are meant to provide regression testing, when working on
>> new functionality for genalloc.
>>
>> Primarily they are meant to confirm that the various allocation strategy
>> will continue to work as expected.
>>
>> The execution of the self testing is controlled through a Kconfig option.
>>
>> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> 
> <snip>
> 
>> --- /dev/null
>> +++ b/include/linux/genalloc-selftest.h
>> @@ -0,0 +1,26 @@
>> +/* SPDX-License-Identifier: GPL-2.0
> 
> nit... For a comment in .h this line should be instead its own comment
> as the first line:
>> +/* SPDX-License-Identifier: GPL-2.0 */

Why are we treating header files (.h) differently than .c files?
Either one can use the C++ "//" comment syntax.

> <snip>
> 
>> --- /dev/null
>> +++ b/lib/genalloc-selftest.c
>> @@ -0,0 +1,400 @@
>> +/* SPDX-License-Identifier: GPL-2.0
> 
> And for a comment in .c this line should use C++ style as the first line:
> 
>> +// SPDX-License-Identifier: GPL-2.0
> 
> Please check the docs for this (I know this can feel surprising but
> this has been debated at great length on list)
> 
> Thank you!
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 2/6] genalloc: selftest
@ 2018-02-11 20:27       ` Randy Dunlap
  0 siblings, 0 replies; 146+ messages in thread
From: Randy Dunlap @ 2018-02-11 20:27 UTC (permalink / raw)
  To: linux-security-module

On 02/11/18 12:22, Philippe Ombredanne wrote:
> On Sun, Feb 11, 2018 at 4:19 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> Introduce a set of macros for writing concise test cases for genalloc.
>>
>> The test cases are meant to provide regression testing, when working on
>> new functionality for genalloc.
>>
>> Primarily they are meant to confirm that the various allocation strategy
>> will continue to work as expected.
>>
>> The execution of the self testing is controlled through a Kconfig option.
>>
>> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> 
> <snip>
> 
>> --- /dev/null
>> +++ b/include/linux/genalloc-selftest.h
>> @@ -0,0 +1,26 @@
>> +/* SPDX-License-Identifier: GPL-2.0
> 
> nit... For a comment in .h this line should be instead its own comment
> as the first line:
>> +/* SPDX-License-Identifier: GPL-2.0 */

Why are we treating header files (.h) differently than .c files?
Either one can use the C++ "//" comment syntax.

> <snip>
> 
>> --- /dev/null
>> +++ b/lib/genalloc-selftest.c
>> @@ -0,0 +1,400 @@
>> +/* SPDX-License-Identifier: GPL-2.0
> 
> And for a comment in .c this line should use C++ style as the first line:
> 
>> +// SPDX-License-Identifier: GPL-2.0
> 
> Please check the docs for this (I know this can feel surprising but
> this has been debated at great length on list)
> 
> Thank you!
> 


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 2/6] genalloc: selftest
@ 2018-02-11 20:27       ` Randy Dunlap
  0 siblings, 0 replies; 146+ messages in thread
From: Randy Dunlap @ 2018-02-11 20:27 UTC (permalink / raw)
  To: Philippe Ombredanne, Igor Stoppa
  Cc: Matthew Wilcox, Jonathan Corbet, Kees Cook, mhocko, labbott,
	jglisse, Christoph Hellwig, cl, linux-security-module, linux-mm,
	LKML, kernel-hardening

On 02/11/18 12:22, Philippe Ombredanne wrote:
> On Sun, Feb 11, 2018 at 4:19 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> Introduce a set of macros for writing concise test cases for genalloc.
>>
>> The test cases are meant to provide regression testing, when working on
>> new functionality for genalloc.
>>
>> Primarily they are meant to confirm that the various allocation strategy
>> will continue to work as expected.
>>
>> The execution of the self testing is controlled through a Kconfig option.
>>
>> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> 
> <snip>
> 
>> --- /dev/null
>> +++ b/include/linux/genalloc-selftest.h
>> @@ -0,0 +1,26 @@
>> +/* SPDX-License-Identifier: GPL-2.0
> 
> nit... For a comment in .h this line should be instead its own comment
> as the first line:
>> +/* SPDX-License-Identifier: GPL-2.0 */

Why are we treating header files (.h) differently than .c files?
Either one can use the C++ "//" comment syntax.

> <snip>
> 
>> --- /dev/null
>> +++ b/lib/genalloc-selftest.c
>> @@ -0,0 +1,400 @@
>> +/* SPDX-License-Identifier: GPL-2.0
> 
> And for a comment in .c this line should use C++ style as the first line:
> 
>> +// SPDX-License-Identifier: GPL-2.0
> 
> Please check the docs for this (I know this can feel surprising but
> this has been debated at great length on list)
> 
> Thank you!
> 


-- 
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 2/6] genalloc: selftest
  2018-02-11 20:27       ` Randy Dunlap
  (?)
@ 2018-02-11 21:01         ` Matthew Wilcox
  -1 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:01 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Philippe Ombredanne, Igor Stoppa, Jonathan Corbet, Kees Cook,
	mhocko, labbott, jglisse, Christoph Hellwig, cl,
	linux-security-module, linux-mm, LKML, kernel-hardening

On Sun, Feb 11, 2018 at 12:27:14PM -0800, Randy Dunlap wrote:
> On 02/11/18 12:22, Philippe Ombredanne wrote:
> > nit... For a comment in .h this line should be instead its own comment
> > as the first line:
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> 
> Why are we treating header files (.h) differently than .c files?
> Either one can use the C++ "//" comment syntax.

This is now documented!

Documentation/process/license-rules.rst:

   If a specific tool cannot handle the standard comment style, then the
   appropriate comment mechanism which the tool accepts shall be used. This
   is the reason for having the "/\* \*/" style comment in C header
   files. There was build breakage observed with generated .lds files where
   'ld' failed to parse the C++ comment. This has been fixed by now, but
   there are still older assembler tools which cannot handle C++ style
   comments.

Personally, I find this disappointing.  I find this:

// SPDX-License-Identifier: GPL-2.0+
/*
 * XArray implementation
 * Copyright (c) 2017 Microsoft Corporation
 * Author: Matthew Wilcox <mawilcox@microsoft.com>
 */

much less visually appealling than

/*
 * XArray implementation
 * Copyright (c) 2017 Microsoft Corporation
 * Author: Matthew Wilcox <mawilcox@microsoft.com>
 * SPDX-License-Identifier: GPL-2.0+
 */

I can't see this variation making a tag extraction tool harder to write.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 2/6] genalloc: selftest
@ 2018-02-11 21:01         ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:01 UTC (permalink / raw)
  To: linux-security-module

On Sun, Feb 11, 2018 at 12:27:14PM -0800, Randy Dunlap wrote:
> On 02/11/18 12:22, Philippe Ombredanne wrote:
> > nit... For a comment in .h this line should be instead its own comment
> > as the first line:
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> 
> Why are we treating header files (.h) differently than .c files?
> Either one can use the C++ "//" comment syntax.

This is now documented!

Documentation/process/license-rules.rst:

   If a specific tool cannot handle the standard comment style, then the
   appropriate comment mechanism which the tool accepts shall be used. This
   is the reason for having the "/\* \*/" style comment in C header
   files. There was build breakage observed with generated .lds files where
   'ld' failed to parse the C++ comment. This has been fixed by now, but
   there are still older assembler tools which cannot handle C++ style
   comments.

Personally, I find this disappointing.  I find this:

// SPDX-License-Identifier: GPL-2.0+
/*
 * XArray implementation
 * Copyright (c) 2017 Microsoft Corporation
 * Author: Matthew Wilcox <mawilcox@microsoft.com>
 */

much less visually appealling than

/*
 * XArray implementation
 * Copyright (c) 2017 Microsoft Corporation
 * Author: Matthew Wilcox <mawilcox@microsoft.com>
 * SPDX-License-Identifier: GPL-2.0+
 */

I can't see this variation making a tag extraction tool harder to write.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 2/6] genalloc: selftest
@ 2018-02-11 21:01         ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:01 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Philippe Ombredanne, Igor Stoppa, Jonathan Corbet, Kees Cook,
	mhocko, labbott, jglisse, Christoph Hellwig, cl,
	linux-security-module, linux-mm, LKML, kernel-hardening

On Sun, Feb 11, 2018 at 12:27:14PM -0800, Randy Dunlap wrote:
> On 02/11/18 12:22, Philippe Ombredanne wrote:
> > nit... For a comment in .h this line should be instead its own comment
> > as the first line:
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> 
> Why are we treating header files (.h) differently than .c files?
> Either one can use the C++ "//" comment syntax.

This is now documented!

Documentation/process/license-rules.rst:

   If a specific tool cannot handle the standard comment style, then the
   appropriate comment mechanism which the tool accepts shall be used. This
   is the reason for having the "/\* \*/" style comment in C header
   files. There was build breakage observed with generated .lds files where
   'ld' failed to parse the C++ comment. This has been fixed by now, but
   there are still older assembler tools which cannot handle C++ style
   comments.

Personally, I find this disappointing.  I find this:

// SPDX-License-Identifier: GPL-2.0+
/*
 * XArray implementation
 * Copyright (c) 2017 Microsoft Corporation
 * Author: Matthew Wilcox <mawilcox@microsoft.com>
 */

much less visually appealling than

/*
 * XArray implementation
 * Copyright (c) 2017 Microsoft Corporation
 * Author: Matthew Wilcox <mawilcox@microsoft.com>
 * SPDX-License-Identifier: GPL-2.0+
 */

I can't see this variation making a tag extraction tool harder to write.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
  2018-02-11  3:19   ` Igor Stoppa
  (?)
@ 2018-02-11 21:16     ` Matthew Wilcox
  -1 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:16 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
> The struct page has a "mapping" field, which can be re-used, to store a
> pointer to the parent area. This will avoid more expensive searches.
> 
> As example, the function find_vm_area is reimplemented, to take advantage
> of the newly introduced field.

Umm.  Is it more efficient?  You're replacing an rb-tree search with a
page-table walk.  You eliminate a spinlock, which is great, but is the
page-table walk more efficient?  I suppose it'll depend on the depth of
the rb-tree, and (at least on x86), the page tables should already be
in cache.

Unrelated to this patch, I'm working on a patch to give us page_type,
and I think I'll allocate a bit to mark pages which are vmalloced.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-11 21:16     ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:16 UTC (permalink / raw)
  To: linux-security-module

On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
> The struct page has a "mapping" field, which can be re-used, to store a
> pointer to the parent area. This will avoid more expensive searches.
> 
> As example, the function find_vm_area is reimplemented, to take advantage
> of the newly introduced field.

Umm.  Is it more efficient?  You're replacing an rb-tree search with a
page-table walk.  You eliminate a spinlock, which is great, but is the
page-table walk more efficient?  I suppose it'll depend on the depth of
the rb-tree, and (at least on x86), the page tables should already be
in cache.

Unrelated to this patch, I'm working on a patch to give us page_type,
and I think I'll allocate a bit to mark pages which are vmalloced.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-11 21:16     ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:16 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
> The struct page has a "mapping" field, which can be re-used, to store a
> pointer to the parent area. This will avoid more expensive searches.
> 
> As example, the function find_vm_area is reimplemented, to take advantage
> of the newly introduced field.

Umm.  Is it more efficient?  You're replacing an rb-tree search with a
page-table walk.  You eliminate a spinlock, which is great, but is the
page-table walk more efficient?  I suppose it'll depend on the depth of
the rb-tree, and (at least on x86), the page tables should already be
in cache.

Unrelated to this patch, I'm working on a patch to give us page_type,
and I think I'll allocate a bit to mark pages which are vmalloced.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 6/6] Documentation for Pmalloc
  2018-02-11  3:19   ` Igor Stoppa
  (?)
@ 2018-02-11 21:17     ` Matthew Wilcox
  -1 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:17 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
> @@ -0,0 +1,114 @@
> +SPDX-License-Identifier: CC-BY-SA-4.0

You need the '.. ' before the 'SPDX'.  See
Documentation/process/license-rules.rst

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-11 21:17     ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:17 UTC (permalink / raw)
  To: linux-security-module

On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
> @@ -0,0 +1,114 @@
> +SPDX-License-Identifier: CC-BY-SA-4.0

You need the '.. ' before the 'SPDX'.  See
Documentation/process/license-rules.rst
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-11 21:17     ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-11 21:17 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
> @@ -0,0 +1,114 @@
> +SPDX-License-Identifier: CC-BY-SA-4.0

You need the '.. ' before the 'SPDX'.  See
Documentation/process/license-rules.rst

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
  2018-02-11 12:24     ` Mike Rapoport
  (?)
  (?)
@ 2018-02-12 11:17       ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:17 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening



On 11/02/18 14:24, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
[...]

>> +/**
>> + * mem_to_units - convert references to memory into orders of allocation
> 
> Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> for function comments. I haven't noticed any difference in the resulting
> html, so I'm not sure if the brackets are actually required.

This is what I see in the example from mailine docs:

/**
 * foobar() - Brief description of foobar.
 * @argument1: Description of parameter argument1 of foobar.
 * @argument2: Description of parameter argument2 of foobar.
 *
 * Longer description of foobar.
 *
 * Return: Description of return value of foobar.
 */
int foobar(int argument1, char *argument2)


What are you referring to?

[...]

>> + * @size: amount in bytes
>> + * @order: power of 2 represented by each entry in the bitmap
>> + *
>> + * Returns the number of units representing the size.
> 
> Please s/Return/Return:/

:-( I thought I had fixed them all. thanks for spotting this.

[...]

>> + * Return: If two users alter the same bit, to one it will return
>> + * remaining entries, to the other it will return 0.
> 
> And what if there are three or four concurrent users? ;-)
> 
> I believe that a more elaborate description about what happens with
> concurrent attempts to alter the bitmap would be really helpful.

ok

--
thanks, igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-12 11:17       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:17 UTC (permalink / raw)
  To: linux-security-module



On 11/02/18 14:24, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
[...]

>> +/**
>> + * mem_to_units - convert references to memory into orders of allocation
> 
> Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> for function comments. I haven't noticed any difference in the resulting
> html, so I'm not sure if the brackets are actually required.

This is what I see in the example from mailine docs:

/**
 * foobar() - Brief description of foobar.
 * @argument1: Description of parameter argument1 of foobar.
 * @argument2: Description of parameter argument2 of foobar.
 *
 * Longer description of foobar.
 *
 * Return: Description of return value of foobar.
 */
int foobar(int argument1, char *argument2)


What are you referring to?

[...]

>> + * @size: amount in bytes
>> + * @order: power of 2 represented by each entry in the bitmap
>> + *
>> + * Returns the number of units representing the size.
> 
> Please s/Return/Return:/

:-( I thought I had fixed them all. thanks for spotting this.

[...]

>> + * Return: If two users alter the same bit, to one it will return
>> + * remaining entries, to the other it will return 0.
> 
> And what if there are three or four concurrent users? ;-)
> 
> I believe that a more elaborate description about what happens with
> concurrent attempts to alter the bitmap would be really helpful.

ok

--
thanks, igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-12 11:17       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:17 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening



On 11/02/18 14:24, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
[...]

>> +/**
>> + * mem_to_units - convert references to memory into orders of allocation
> 
> Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> for function comments. I haven't noticed any difference in the resulting
> html, so I'm not sure if the brackets are actually required.

This is what I see in the example from mailine docs:

/**
 * foobar() - Brief description of foobar.
 * @argument1: Description of parameter argument1 of foobar.
 * @argument2: Description of parameter argument2 of foobar.
 *
 * Longer description of foobar.
 *
 * Return: Description of return value of foobar.
 */
int foobar(int argument1, char *argument2)


What are you referring to?

[...]

>> + * @size: amount in bytes
>> + * @order: power of 2 represented by each entry in the bitmap
>> + *
>> + * Returns the number of units representing the size.
> 
> Please s/Return/Return:/

:-( I thought I had fixed them all. thanks for spotting this.

[...]

>> + * Return: If two users alter the same bit, to one it will return
>> + * remaining entries, to the other it will return 0.
> 
> And what if there are three or four concurrent users? ;-)
> 
> I believe that a more elaborate description about what happens with
> concurrent attempts to alter the bitmap would be really helpful.

ok

--
thanks, igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-12 11:17       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:17 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening



On 11/02/18 14:24, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
[...]

>> +/**
>> + * mem_to_units - convert references to memory into orders of allocation
> 
> Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> for function comments. I haven't noticed any difference in the resulting
> html, so I'm not sure if the brackets are actually required.

This is what I see in the example from mailine docs:

/**
 * foobar() - Brief description of foobar.
 * @argument1: Description of parameter argument1 of foobar.
 * @argument2: Description of parameter argument2 of foobar.
 *
 * Longer description of foobar.
 *
 * Return: Description of return value of foobar.
 */
int foobar(int argument1, char *argument2)


What are you referring to?

[...]

>> + * @size: amount in bytes
>> + * @order: power of 2 represented by each entry in the bitmap
>> + *
>> + * Returns the number of units representing the size.
> 
> Please s/Return/Return:/

:-( I thought I had fixed them all. thanks for spotting this.

[...]

>> + * Return: If two users alter the same bit, to one it will return
>> + * remaining entries, to the other it will return 0.
> 
> And what if there are three or four concurrent users? ;-)
> 
> I believe that a more elaborate description about what happens with
> concurrent attempts to alter the bitmap would be really helpful.

ok

--
thanks, igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-11 12:37     ` Mike Rapoport
  (?)
  (?)
@ 2018-02-12 11:26       ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:26 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On 11/02/18 14:37, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:

>> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
>> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> 
> typo:                                            ^ memory

thanks :-(

[...]

>> +/**
>> + * When the sysfs is ready to receive registrations, connect all the
>> + * pools previously created. Also enable further pools to be connected
>> + * right away.
>> + */
> 
> This does not seem as kernel-doc comment. Please either remove the second *
> from the opening comment mark or reformat the comment.

For this too, I thought I had caught them all, but I was wrong ...

I didn't find any mention of automated checking for comments.
Is there such tool?

--
thanks, igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 11:26       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:26 UTC (permalink / raw)
  To: linux-security-module

On 11/02/18 14:37, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:

>> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
>> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> 
> typo:                                            ^ memory

thanks :-(

[...]

>> +/**
>> + * When the sysfs is ready to receive registrations, connect all the
>> + * pools previously created. Also enable further pools to be connected
>> + * right away.
>> + */
> 
> This does not seem as kernel-doc comment. Please either remove the second *
> from the opening comment mark or reformat the comment.

For this too, I thought I had caught them all, but I was wrong ...

I didn't find any mention of automated checking for comments.
Is there such tool?

--
thanks, igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 11:26       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:26 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On 11/02/18 14:37, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:

>> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
>> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> 
> typo:                                            ^ memory

thanks :-(

[...]

>> +/**
>> + * When the sysfs is ready to receive registrations, connect all the
>> + * pools previously created. Also enable further pools to be connected
>> + * right away.
>> + */
> 
> This does not seem as kernel-doc comment. Please either remove the second *
> from the opening comment mark or reformat the comment.

For this too, I thought I had caught them all, but I was wrong ...

I didn't find any mention of automated checking for comments.
Is there such tool?

--
thanks, igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 11:26       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:26 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On 11/02/18 14:37, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:

>> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
>> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> 
> typo:                                            ^ memory

thanks :-(

[...]

>> +/**
>> + * When the sysfs is ready to receive registrations, connect all the
>> + * pools previously created. Also enable further pools to be connected
>> + * right away.
>> + */
> 
> This does not seem as kernel-doc comment. Please either remove the second *
> from the opening comment mark or reformat the comment.

For this too, I thought I had caught them all, but I was wrong ...

I didn't find any mention of automated checking for comments.
Is there such tool?

--
thanks, igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 6/6] Documentation for Pmalloc
  2018-02-11 21:17     ` Matthew Wilcox
  (?)
  (?)
@ 2018-02-12 11:28       ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 11/02/18 23:17, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
>> @@ -0,0 +1,114 @@
>> +SPDX-License-Identifier: CC-BY-SA-4.0
> 
> You need the '.. ' before the 'SPDX'.  See
> Documentation/process/license-rules.rst

yes, sorry, I thought I had understood how it works,
but clearly I hadn't :-(

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-12 11:28       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:28 UTC (permalink / raw)
  To: linux-security-module



On 11/02/18 23:17, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
>> @@ -0,0 +1,114 @@
>> +SPDX-License-Identifier: CC-BY-SA-4.0
> 
> You need the '.. ' before the 'SPDX'.  See
> Documentation/process/license-rules.rst

yes, sorry, I thought I had understood how it works,
but clearly I hadn't :-(

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-12 11:28       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 11/02/18 23:17, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
>> @@ -0,0 +1,114 @@
>> +SPDX-License-Identifier: CC-BY-SA-4.0
> 
> You need the '.. ' before the 'SPDX'.  See
> Documentation/process/license-rules.rst

yes, sorry, I thought I had understood how it works,
but clearly I hadn't :-(

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 6/6] Documentation for Pmalloc
@ 2018-02-12 11:28       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 11:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 11/02/18 23:17, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
>> @@ -0,0 +1,114 @@
>> +SPDX-License-Identifier: CC-BY-SA-4.0
> 
> You need the '.. ' before the 'SPDX'.  See
> Documentation/process/license-rules.rst

yes, sorry, I thought I had understood how it works,
but clearly I hadn't :-(

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
  2018-02-12 11:17       ` Igor Stoppa
  (?)
@ 2018-02-12 11:36         ` Mike Rapoport
  -1 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 11:36 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 01:17:01PM +0200, Igor Stoppa wrote:
> 
> 
> On 11/02/18 14:24, Mike Rapoport wrote:
> > On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
> [...]
> 
> >> +/**
> >> + * mem_to_units - convert references to memory into orders of allocation
> > 
> > Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> > for function comments. I haven't noticed any difference in the resulting
> > html, so I'm not sure if the brackets are actually required.
> 
> This is what I see in the example from mailine docs:
> 
> /**
>  * foobar() - Brief description of foobar.
>  * @argument1: Description of parameter argument1 of foobar.
>  * @argument2: Description of parameter argument2 of foobar.
>  *
>  * Longer description of foobar.
>  *
>  * Return: Description of return value of foobar.
>  */
> int foobar(int argument1, char *argument2)
> 
> 
> What are you referring to?
 
I'm referring to "foobar() - brief description" vs "foobar - brief
description".

The generated html looks exactly the same in the browser, so I don't know
if the brackets are really required.

> [...]
> 
> >> + * @size: amount in bytes
> >> + * @order: power of 2 represented by each entry in the bitmap
> >> + *
> >> + * Returns the number of units representing the size.
> > 
> > Please s/Return/Return:/
> 
> :-( I thought I had fixed them all. thanks for spotting this.
> 
> [...]
> 
> >> + * Return: If two users alter the same bit, to one it will return
> >> + * remaining entries, to the other it will return 0.
> > 
> > And what if there are three or four concurrent users? ;-)
> > 
> > I believe that a more elaborate description about what happens with
> > concurrent attempts to alter the bitmap would be really helpful.
> 
> ok
> 
> --
> thanks, igor
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-12 11:36         ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 11:36 UTC (permalink / raw)
  To: linux-security-module

On Mon, Feb 12, 2018 at 01:17:01PM +0200, Igor Stoppa wrote:
> 
> 
> On 11/02/18 14:24, Mike Rapoport wrote:
> > On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
> [...]
> 
> >> +/**
> >> + * mem_to_units - convert references to memory into orders of allocation
> > 
> > Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> > for function comments. I haven't noticed any difference in the resulting
> > html, so I'm not sure if the brackets are actually required.
> 
> This is what I see in the example from mailine docs:
> 
> /**
>  * foobar() - Brief description of foobar.
>  * @argument1: Description of parameter argument1 of foobar.
>  * @argument2: Description of parameter argument2 of foobar.
>  *
>  * Longer description of foobar.
>  *
>  * Return: Description of return value of foobar.
>  */
> int foobar(int argument1, char *argument2)
> 
> 
> What are you referring to?
 
I'm referring to "foobar() - brief description" vs "foobar - brief
description".

The generated html looks exactly the same in the browser, so I don't know
if the brackets are really required.

> [...]
> 
> >> + * @size: amount in bytes
> >> + * @order: power of 2 represented by each entry in the bitmap
> >> + *
> >> + * Returns the number of units representing the size.
> > 
> > Please s/Return/Return:/
> 
> :-( I thought I had fixed them all. thanks for spotting this.
> 
> [...]
> 
> >> + * Return: If two users alter the same bit, to one it will return
> >> + * remaining entries, to the other it will return 0.
> > 
> > And what if there are three or four concurrent users? ;-)
> > 
> > I believe that a more elaborate description about what happens with
> > concurrent attempts to alter the bitmap would be really helpful.
> 
> ok
> 
> --
> thanks, igor
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-12 11:36         ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 11:36 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 01:17:01PM +0200, Igor Stoppa wrote:
> 
> 
> On 11/02/18 14:24, Mike Rapoport wrote:
> > On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
> [...]
> 
> >> +/**
> >> + * mem_to_units - convert references to memory into orders of allocation
> > 
> > Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> > for function comments. I haven't noticed any difference in the resulting
> > html, so I'm not sure if the brackets are actually required.
> 
> This is what I see in the example from mailine docs:
> 
> /**
>  * foobar() - Brief description of foobar.
>  * @argument1: Description of parameter argument1 of foobar.
>  * @argument2: Description of parameter argument2 of foobar.
>  *
>  * Longer description of foobar.
>  *
>  * Return: Description of return value of foobar.
>  */
> int foobar(int argument1, char *argument2)
> 
> 
> What are you referring to?
 
I'm referring to "foobar() - brief description" vs "foobar - brief
description".

The generated html looks exactly the same in the browser, so I don't know
if the brackets are really required.

> [...]
> 
> >> + * @size: amount in bytes
> >> + * @order: power of 2 represented by each entry in the bitmap
> >> + *
> >> + * Returns the number of units representing the size.
> > 
> > Please s/Return/Return:/
> 
> :-( I thought I had fixed them all. thanks for spotting this.
> 
> [...]
> 
> >> + * Return: If two users alter the same bit, to one it will return
> >> + * remaining entries, to the other it will return 0.
> > 
> > And what if there are three or four concurrent users? ;-)
> > 
> > I believe that a more elaborate description about what happens with
> > concurrent attempts to alter the bitmap would be really helpful.
> 
> ok
> 
> --
> thanks, igor
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-12 11:26       ` Igor Stoppa
  (?)
@ 2018-02-12 11:43         ` Mike Rapoport
  -1 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 11:43 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote:
> On 11/02/18 14:37, Mike Rapoport wrote:
> > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> 
> >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> > 
> > typo:                                            ^ memory
> 
> thanks :-(
> 
> [...]
> 
> >> +/**
> >> + * When the sysfs is ready to receive registrations, connect all the
> >> + * pools previously created. Also enable further pools to be connected
> >> + * right away.
> >> + */
> > 
> > This does not seem as kernel-doc comment. Please either remove the second *
> > from the opening comment mark or reformat the comment.
> 
> For this too, I thought I had caught them all, but I was wrong ...
> 
> I didn't find any mention of automated checking for comments.
> Is there such tool?

I don't know if there is a tool. I couldn't find anything in scripts, maybe
somebody have such tool out of tree.

For now, I've added mm-api.rst that includes all mm .c files and run 'make
htmldocs' which spits plenty of warnings and errors.

> --
> thanks, igor
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 11:43         ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 11:43 UTC (permalink / raw)
  To: linux-security-module

On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote:
> On 11/02/18 14:37, Mike Rapoport wrote:
> > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> 
> >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> > 
> > typo:                                            ^ memory
> 
> thanks :-(
> 
> [...]
> 
> >> +/**
> >> + * When the sysfs is ready to receive registrations, connect all the
> >> + * pools previously created. Also enable further pools to be connected
> >> + * right away.
> >> + */
> > 
> > This does not seem as kernel-doc comment. Please either remove the second *
> > from the opening comment mark or reformat the comment.
> 
> For this too, I thought I had caught them all, but I was wrong ...
> 
> I didn't find any mention of automated checking for comments.
> Is there such tool?

I don't know if there is a tool. I couldn't find anything in scripts, maybe
somebody have such tool out of tree.

For now, I've added mm-api.rst that includes all mm .c files and run 'make
htmldocs' which spits plenty of warnings and errors.

> --
> thanks, igor
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 11:43         ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 11:43 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote:
> On 11/02/18 14:37, Mike Rapoport wrote:
> > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> 
> >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> > 
> > typo:                                            ^ memory
> 
> thanks :-(
> 
> [...]
> 
> >> +/**
> >> + * When the sysfs is ready to receive registrations, connect all the
> >> + * pools previously created. Also enable further pools to be connected
> >> + * right away.
> >> + */
> > 
> > This does not seem as kernel-doc comment. Please either remove the second *
> > from the opening comment mark or reformat the comment.
> 
> For this too, I thought I had caught them all, but I was wrong ...
> 
> I didn't find any mention of automated checking for comments.
> Is there such tool?

I don't know if there is a tool. I couldn't find anything in scripts, maybe
somebody have such tool out of tree.

For now, I've added mm-api.rst that includes all mm .c files and run 'make
htmldocs' which spits plenty of warnings and errors.

> --
> thanks, igor
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-12 11:43         ` Mike Rapoport
  (?)
@ 2018-02-12 12:53           ` Mike Rapoport
  -1 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 12:53 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 01:43:11PM +0200, Mike Rapoport wrote:
> On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote:
> > On 11/02/18 14:37, Mike Rapoport wrote:
> > > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> > 
> > >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> > >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> > > 
> > > typo:                                            ^ memory
> > 
> > thanks :-(
> > 
> > [...]
> > 
> > >> +/**
> > >> + * When the sysfs is ready to receive registrations, connect all the
> > >> + * pools previously created. Also enable further pools to be connected
> > >> + * right away.
> > >> + */
> > > 
> > > This does not seem as kernel-doc comment. Please either remove the second *
> > > from the opening comment mark or reformat the comment.
> > 
> > For this too, I thought I had caught them all, but I was wrong ...
> > 
> > I didn't find any mention of automated checking for comments.
> > Is there such tool?
> 
> I don't know if there is a tool. I couldn't find anything in scripts, maybe
> somebody have such tool out of tree.
> 
> For now, I've added mm-api.rst that includes all mm .c files and run 'make
> htmldocs' which spits plenty of warnings and errors.

Actually, you can run 'scripts/kernel-doc -v -none <filename>' to check
comments starting with '/**'. I afraid it won't catch formatted blocks that
start with '/*'
 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 12:53           ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 12:53 UTC (permalink / raw)
  To: linux-security-module

On Mon, Feb 12, 2018 at 01:43:11PM +0200, Mike Rapoport wrote:
> On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote:
> > On 11/02/18 14:37, Mike Rapoport wrote:
> > > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> > 
> > >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> > >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> > > 
> > > typo:                                            ^ memory
> > 
> > thanks :-(
> > 
> > [...]
> > 
> > >> +/**
> > >> + * When the sysfs is ready to receive registrations, connect all the
> > >> + * pools previously created. Also enable further pools to be connected
> > >> + * right away.
> > >> + */
> > > 
> > > This does not seem as kernel-doc comment. Please either remove the second *
> > > from the opening comment mark or reformat the comment.
> > 
> > For this too, I thought I had caught them all, but I was wrong ...
> > 
> > I didn't find any mention of automated checking for comments.
> > Is there such tool?
> 
> I don't know if there is a tool. I couldn't find anything in scripts, maybe
> somebody have such tool out of tree.
> 
> For now, I've added mm-api.rst that includes all mm .c files and run 'make
> htmldocs' which spits plenty of warnings and errors.

Actually, you can run 'scripts/kernel-doc -v -none <filename>' to check
comments starting with '/**'. I afraid it won't catch formatted blocks that
start with '/*'
 

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 12:53           ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 12:53 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 01:43:11PM +0200, Mike Rapoport wrote:
> On Mon, Feb 12, 2018 at 01:26:28PM +0200, Igor Stoppa wrote:
> > On 11/02/18 14:37, Mike Rapoport wrote:
> > > On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:
> > 
> > >> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
> > >> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> > > 
> > > typo:                                            ^ memory
> > 
> > thanks :-(
> > 
> > [...]
> > 
> > >> +/**
> > >> + * When the sysfs is ready to receive registrations, connect all the
> > >> + * pools previously created. Also enable further pools to be connected
> > >> + * right away.
> > >> + */
> > > 
> > > This does not seem as kernel-doc comment. Please either remove the second *
> > > from the opening comment mark or reformat the comment.
> > 
> > For this too, I thought I had caught them all, but I was wrong ...
> > 
> > I didn't find any mention of automated checking for comments.
> > Is there such tool?
> 
> I don't know if there is a tool. I couldn't find anything in scripts, maybe
> somebody have such tool out of tree.
> 
> For now, I've added mm-api.rst that includes all mm .c files and run 'make
> htmldocs' which spits plenty of warnings and errors.

Actually, you can run 'scripts/kernel-doc -v -none <filename>' to check
comments starting with '/**'. I afraid it won't catch formatted blocks that
start with '/*'
 

-- 
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-12 12:53           ` Mike Rapoport
  (?)
  (?)
@ 2018-02-12 13:41             ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 13:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening



On 12/02/18 14:53, Mike Rapoport wrote:
> 'scripts/kernel-doc -v -none 

That has a quite interesting behavior.

I run it on genalloc.c while I am in the process of adding the brackets
to the function names in the kernel-doc description.

The brackets confuse the script and it fails to output the name of the
function in the log:

lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
lib/genalloc.c:139: info: Scanning doc for
lib/genalloc.c:152: info: Scanning doc for
lib/genalloc.c:164: info: Scanning doc for


The first function does not have the brackets.
The others do. So what should I do with the missing brackets?
Add them, according to the kernel docs, or leave them out?

I'd lean toward adding them.

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 13:41             ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 13:41 UTC (permalink / raw)
  To: linux-security-module



On 12/02/18 14:53, Mike Rapoport wrote:
> 'scripts/kernel-doc -v -none 

That has a quite interesting behavior.

I run it on genalloc.c while I am in the process of adding the brackets
to the function names in the kernel-doc description.

The brackets confuse the script and it fails to output the name of the
function in the log:

lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
lib/genalloc.c:139: info: Scanning doc for
lib/genalloc.c:152: info: Scanning doc for
lib/genalloc.c:164: info: Scanning doc for


The first function does not have the brackets.
The others do. So what should I do with the missing brackets?
Add them, according to the kernel docs, or leave them out?

I'd lean toward adding them.

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 13:41             ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 13:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening



On 12/02/18 14:53, Mike Rapoport wrote:
> 'scripts/kernel-doc -v -none 

That has a quite interesting behavior.

I run it on genalloc.c while I am in the process of adding the brackets
to the function names in the kernel-doc description.

The brackets confuse the script and it fails to output the name of the
function in the log:

lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
lib/genalloc.c:139: info: Scanning doc for
lib/genalloc.c:152: info: Scanning doc for
lib/genalloc.c:164: info: Scanning doc for


The first function does not have the brackets.
The others do. So what should I do with the missing brackets?
Add them, according to the kernel docs, or leave them out?

I'd lean toward adding them.

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 13:41             ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 13:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening



On 12/02/18 14:53, Mike Rapoport wrote:
> 'scripts/kernel-doc -v -none 

That has a quite interesting behavior.

I run it on genalloc.c while I am in the process of adding the brackets
to the function names in the kernel-doc description.

The brackets confuse the script and it fails to output the name of the
function in the log:

lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
lib/genalloc.c:139: info: Scanning doc for
lib/genalloc.c:152: info: Scanning doc for
lib/genalloc.c:164: info: Scanning doc for


The first function does not have the brackets.
The others do. So what should I do with the missing brackets?
Add them, according to the kernel docs, or leave them out?

I'd lean toward adding them.

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-12 13:41             ` Igor Stoppa
  (?)
@ 2018-02-12 15:31               ` Mike Rapoport
  -1 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 15:31 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 03:41:57PM +0200, Igor Stoppa wrote:
> 
> 
> On 12/02/18 14:53, Mike Rapoport wrote:
> > 'scripts/kernel-doc -v -none 
> 
> That has a quite interesting behavior.
> 
> I run it on genalloc.c while I am in the process of adding the brackets
> to the function names in the kernel-doc description.
> 
> The brackets confuse the script and it fails to output the name of the
> function in the log:
> 
> lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
> lib/genalloc.c:139: info: Scanning doc for
> lib/genalloc.c:152: info: Scanning doc for
> lib/genalloc.c:164: info: Scanning doc for

 
> 
> The first function does not have the brackets.
> The others do. So what should I do with the missing brackets?
> Add them, according to the kernel docs, or leave them out?

Seems that kernel-doc does not consider () as a valid match for the
identifier :)
 
Can you please check with the below patch?

> I'd lean toward adding them.
> 
> --
> igor
 
-- 
Sincerely yours,
Mike.

>From 35255bc2d7d2a63be4f78a7bf4eec83ab0dc4f3f Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt@linux.vnet.ibm.com>
Date: Mon, 12 Feb 2018 17:19:04 +0200
Subject: [PATCH] scripts: kernel_doc: fixup reporting of function identifiers

When function description includes brackets after the function name as
suggested by Documentation/doc-guide/kernel-doc, the kernel-doc script
omits the function name from "Scanning doc for" report.
Extending match for identifier name with optional brackets fixes this
issue.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 scripts/kernel-doc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index fee8952037b1..a6a9a8ef116c 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -1873,7 +1873,7 @@ sub process_file($) {
 	    }
 	    elsif (/$doc_decl/o) {
 		$identifier = $1;
-		if (/\s*([\w\s]+?)\s*-/) {
+		if (/\s*([\w\s]+?)(\(\))?\s*-/) {
 		    $identifier = $1;
 		}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 15:31               ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 15:31 UTC (permalink / raw)
  To: linux-security-module

On Mon, Feb 12, 2018 at 03:41:57PM +0200, Igor Stoppa wrote:
> 
> 
> On 12/02/18 14:53, Mike Rapoport wrote:
> > 'scripts/kernel-doc -v -none 
> 
> That has a quite interesting behavior.
> 
> I run it on genalloc.c while I am in the process of adding the brackets
> to the function names in the kernel-doc description.
> 
> The brackets confuse the script and it fails to output the name of the
> function in the log:
> 
> lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
> lib/genalloc.c:139: info: Scanning doc for
> lib/genalloc.c:152: info: Scanning doc for
> lib/genalloc.c:164: info: Scanning doc for

 
> 
> The first function does not have the brackets.
> The others do. So what should I do with the missing brackets?
> Add them, according to the kernel docs, or leave them out?

Seems that kernel-doc does not consider () as a valid match for the
identifier :)
 
Can you please check with the below patch?

> I'd lean toward adding them.
> 
> --
> igor
 
-- 
Sincerely yours,
Mike.

>From 35255bc2d7d2a63be4f78a7bf4eec83ab0dc4f3f Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt@linux.vnet.ibm.com>
Date: Mon, 12 Feb 2018 17:19:04 +0200
Subject: [PATCH] scripts: kernel_doc: fixup reporting of function identifiers

When function description includes brackets after the function name as
suggested by Documentation/doc-guide/kernel-doc, the kernel-doc script
omits the function name from "Scanning doc for" report.
Extending match for identifier name with optional brackets fixes this
issue.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 scripts/kernel-doc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index fee8952037b1..a6a9a8ef116c 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -1873,7 +1873,7 @@ sub process_file($) {
 	    }
 	    elsif (/$doc_decl/o) {
 		$identifier = $1;
-		if (/\s*([\w\s]+?)\s*-/) {
+		if (/\s*([\w\s]+?)(\(\))?\s*-/) {
 		    $identifier = $1;
 		}
 
-- 
2.7.4


--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 15:31               ` Mike Rapoport
  0 siblings, 0 replies; 146+ messages in thread
From: Mike Rapoport @ 2018-02-12 15:31 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On Mon, Feb 12, 2018 at 03:41:57PM +0200, Igor Stoppa wrote:
> 
> 
> On 12/02/18 14:53, Mike Rapoport wrote:
> > 'scripts/kernel-doc -v -none 
> 
> That has a quite interesting behavior.
> 
> I run it on genalloc.c while I am in the process of adding the brackets
> to the function names in the kernel-doc description.
> 
> The brackets confuse the script and it fails to output the name of the
> function in the log:
> 
> lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
> lib/genalloc.c:139: info: Scanning doc for
> lib/genalloc.c:152: info: Scanning doc for
> lib/genalloc.c:164: info: Scanning doc for

 
> 
> The first function does not have the brackets.
> The others do. So what should I do with the missing brackets?
> Add them, according to the kernel docs, or leave them out?

Seems that kernel-doc does not consider () as a valid match for the
identifier :)
 
Can you please check with the below patch?

> I'd lean toward adding them.
> 
> --
> igor
 
-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-12 15:31               ` Mike Rapoport
  (?)
  (?)
@ 2018-02-12 15:41                 ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 15:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On 12/02/18 17:31, Mike Rapoport wrote:

[...]

> Seems that kernel-doc does not consider () as a valid match for the
> identifier :)
>  
> Can you please check with the below patch?

yes, it works now, than you!

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 15:41                 ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 15:41 UTC (permalink / raw)
  To: linux-security-module

On 12/02/18 17:31, Mike Rapoport wrote:

[...]

> Seems that kernel-doc does not consider () as a valid match for the
> identifier :)
>  
> Can you please check with the below patch?

yes, it works now, than you!

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 15:41                 ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 15:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On 12/02/18 17:31, Mike Rapoport wrote:

[...]

> Seems that kernel-doc does not consider () as a valid match for the
> identifier :)
>  
> Can you please check with the below patch?

yes, it works now, than you!

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-12 15:41                 ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 15:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch,
	cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

On 12/02/18 17:31, Mike Rapoport wrote:

[...]

> Seems that kernel-doc does not consider () as a valid match for the
> identifier :)
>  
> Can you please check with the below patch?

yes, it works now, than you!

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
  2018-02-11 21:16     ` Matthew Wilcox
  (?)
  (?)
@ 2018-02-12 16:24       ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 11/02/18 23:16, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>> The struct page has a "mapping" field, which can be re-used, to store a
>> pointer to the parent area. This will avoid more expensive searches.
>>
>> As example, the function find_vm_area is reimplemented, to take advantage
>> of the newly introduced field.
> 
> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
> page-table walk.  You eliminate a spinlock, which is great, but is the
> page-table walk more efficient?  I suppose it'll depend on the depth of
> the rb-tree, and (at least on x86), the page tables should already be
> in cache.

I thought the tradeoff favorable. How to verify it?

> Unrelated to this patch, I'm working on a patch to give us page_type,
> and I think I'll allocate a bit to mark pages which are vmalloced.

pmalloced too?

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-12 16:24       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:24 UTC (permalink / raw)
  To: linux-security-module



On 11/02/18 23:16, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>> The struct page has a "mapping" field, which can be re-used, to store a
>> pointer to the parent area. This will avoid more expensive searches.
>>
>> As example, the function find_vm_area is reimplemented, to take advantage
>> of the newly introduced field.
> 
> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
> page-table walk.  You eliminate a spinlock, which is great, but is the
> page-table walk more efficient?  I suppose it'll depend on the depth of
> the rb-tree, and (at least on x86), the page tables should already be
> in cache.

I thought the tradeoff favorable. How to verify it?

> Unrelated to this patch, I'm working on a patch to give us page_type,
> and I think I'll allocate a bit to mark pages which are vmalloced.

pmalloced too?

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-12 16:24       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 11/02/18 23:16, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>> The struct page has a "mapping" field, which can be re-used, to store a
>> pointer to the parent area. This will avoid more expensive searches.
>>
>> As example, the function find_vm_area is reimplemented, to take advantage
>> of the newly introduced field.
> 
> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
> page-table walk.  You eliminate a spinlock, which is great, but is the
> page-table walk more efficient?  I suppose it'll depend on the depth of
> the rb-tree, and (at least on x86), the page tables should already be
> in cache.

I thought the tradeoff favorable. How to verify it?

> Unrelated to this patch, I'm working on a patch to give us page_type,
> and I think I'll allocate a bit to mark pages which are vmalloced.

pmalloced too?

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-12 16:24       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 11/02/18 23:16, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>> The struct page has a "mapping" field, which can be re-used, to store a
>> pointer to the parent area. This will avoid more expensive searches.
>>
>> As example, the function find_vm_area is reimplemented, to take advantage
>> of the newly introduced field.
> 
> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
> page-table walk.  You eliminate a spinlock, which is great, but is the
> page-table walk more efficient?  I suppose it'll depend on the depth of
> the rb-tree, and (at least on x86), the page tables should already be
> in cache.

I thought the tradeoff favorable. How to verify it?

> Unrelated to this patch, I'm working on a patch to give us page_type,
> and I think I'll allocate a bit to mark pages which are vmalloced.

pmalloced too?

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
  2018-02-11  3:19   ` Igor Stoppa
  (?)
@ 2018-02-13  0:43     ` kbuild test robot
  -1 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-13  0:43 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, willy, rdunlap, corbet, keescook, mhocko, labbott,
	jglisse, hch, cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1286 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc1 next-20180212]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180212-192839
config: openrisc-allmodconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/mtk-vcodec/mtk-vcodec-common.o
   see include/linux/module.h for more information
   WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/tegra-cec/tegra_cec.o
   see include/linux/module.h for more information
>> ERROR: "gen_pool_best_fit" [drivers/tee/tee.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 44715 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-13  0:43     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-13  0:43 UTC (permalink / raw)
  To: linux-security-module

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc1 next-20180212]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180212-192839
config: openrisc-allmodconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/mtk-vcodec/mtk-vcodec-common.o
   see include/linux/module.h for more information
   WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/tegra-cec/tegra_cec.o
   see include/linux/module.h for more information
>> ERROR: "gen_pool_best_fit" [drivers/tee/tee.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 1/6] genalloc: track beginning of allocations
@ 2018-02-13  0:43     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-13  0:43 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, willy, rdunlap, corbet, keescook, mhocko, labbott,
	jglisse, hch, cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 1286 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc1 next-20180212]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180212-192839
config: openrisc-allmodconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/mtk-vcodec/mtk-vcodec-common.o
   see include/linux/module.h for more information
   WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/tegra-cec/tegra_cec.o
   see include/linux/module.h for more information
>> ERROR: "gen_pool_best_fit" [drivers/tee/tee.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 44715 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 5/6] Pmalloc: self-test
  2018-02-11  3:19   ` Igor Stoppa
  (?)
@ 2018-02-13  2:43     ` kbuild test robot
  -1 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-13  2:43 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, willy, rdunlap, corbet, keescook, mhocko, labbott,
	jglisse, hch, cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc1 next-20180212]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180212-192839
config: arm-allnoconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x268): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc':
   pmalloc.c:(.text+0x2ac): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
   pmalloc.c:(.text+0x86): undefined reference to `vfree_atomic'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6161 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 5/6] Pmalloc: self-test
@ 2018-02-13  2:43     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-13  2:43 UTC (permalink / raw)
  To: linux-security-module

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc1 next-20180212]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180212-192839
config: arm-allnoconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x268): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc':
   pmalloc.c:(.text+0x2ac): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
   pmalloc.c:(.text+0x86): undefined reference to `vfree_atomic'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 5/6] Pmalloc: self-test
@ 2018-02-13  2:43     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-13  2:43 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, willy, rdunlap, corbet, keescook, mhocko, labbott,
	jglisse, hch, cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc1 next-20180212]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/genalloc-track-beginning-of-allocations/20180212-192839
config: arm-allnoconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x268): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc':
   pmalloc.c:(.text+0x2ac): undefined reference to `vfree_atomic'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
   pmalloc.c:(.text+0x86): undefined reference to `vfree_atomic'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6161 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
  2018-02-12 16:24       ` Igor Stoppa
  (?)
  (?)
@ 2018-02-20 19:53         ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-20 19:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 12/02/18 18:24, Igor Stoppa wrote:
> 
> 
> On 11/02/18 23:16, Matthew Wilcox wrote:
>> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>>> The struct page has a "mapping" field, which can be re-used, to store a
>>> pointer to the parent area. This will avoid more expensive searches.
>>>
>>> As example, the function find_vm_area is reimplemented, to take advantage
>>> of the newly introduced field.
>>
>> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
>> page-table walk.  You eliminate a spinlock, which is great, but is the
>> page-table walk more efficient?  I suppose it'll depend on the depth of
>> the rb-tree, and (at least on x86), the page tables should already be
>> in cache.
> 
> I thought the tradeoff favorable.

It turns out that it's probably not so favorable.
The patch relies on the function vmalloc_to_page ... which will return
NULL when applied to huge mappings, while the original implementation
will still work.

It was found while testing on a configuration with framebuffer.

So it seems unlikely that there is any gain to be had, from this
perspective.

The use of the field still makes sense from the perspective of adding
pmalloc support to hardened usercopy, but there is no more need for the
field to exist as separate patch.

This patch can be simplified and merged with the pmalloc patch.

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-20 19:53         ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-20 19:53 UTC (permalink / raw)
  To: linux-security-module



On 12/02/18 18:24, Igor Stoppa wrote:
> 
> 
> On 11/02/18 23:16, Matthew Wilcox wrote:
>> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>>> The struct page has a "mapping" field, which can be re-used, to store a
>>> pointer to the parent area. This will avoid more expensive searches.
>>>
>>> As example, the function find_vm_area is reimplemented, to take advantage
>>> of the newly introduced field.
>>
>> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
>> page-table walk.  You eliminate a spinlock, which is great, but is the
>> page-table walk more efficient?  I suppose it'll depend on the depth of
>> the rb-tree, and (at least on x86), the page tables should already be
>> in cache.
> 
> I thought the tradeoff favorable.

It turns out that it's probably not so favorable.
The patch relies on the function vmalloc_to_page ... which will return
NULL when applied to huge mappings, while the original implementation
will still work.

It was found while testing on a configuration with framebuffer.

So it seems unlikely that there is any gain to be had, from this
perspective.

The use of the field still makes sense from the perspective of adding
pmalloc support to hardened usercopy, but there is no more need for the
field to exist as separate patch.

This patch can be simplified and merged with the pmalloc patch.

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-20 19:53         ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-20 19:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 12/02/18 18:24, Igor Stoppa wrote:
> 
> 
> On 11/02/18 23:16, Matthew Wilcox wrote:
>> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>>> The struct page has a "mapping" field, which can be re-used, to store a
>>> pointer to the parent area. This will avoid more expensive searches.
>>>
>>> As example, the function find_vm_area is reimplemented, to take advantage
>>> of the newly introduced field.
>>
>> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
>> page-table walk.  You eliminate a spinlock, which is great, but is the
>> page-table walk more efficient?  I suppose it'll depend on the depth of
>> the rb-tree, and (at least on x86), the page tables should already be
>> in cache.
> 
> I thought the tradeoff favorable.

It turns out that it's probably not so favorable.
The patch relies on the function vmalloc_to_page ... which will return
NULL when applied to huge mappings, while the original implementation
will still work.

It was found while testing on a configuration with framebuffer.

So it seems unlikely that there is any gain to be had, from this
perspective.

The use of the field still makes sense from the perspective of adding
pmalloc support to hardened usercopy, but there is no more need for the
field to exist as separate patch.

This patch can be simplified and merged with the pmalloc patch.

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-20 19:53         ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-20 19:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 12/02/18 18:24, Igor Stoppa wrote:
> 
> 
> On 11/02/18 23:16, Matthew Wilcox wrote:
>> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>>> The struct page has a "mapping" field, which can be re-used, to store a
>>> pointer to the parent area. This will avoid more expensive searches.
>>>
>>> As example, the function find_vm_area is reimplemented, to take advantage
>>> of the newly introduced field.
>>
>> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
>> page-table walk.  You eliminate a spinlock, which is great, but is the
>> page-table walk more efficient?  I suppose it'll depend on the depth of
>> the rb-tree, and (at least on x86), the page tables should already be
>> in cache.
> 
> I thought the tradeoff favorable.

It turns out that it's probably not so favorable.
The patch relies on the function vmalloc_to_page ... which will return
NULL when applied to huge mappings, while the original implementation
will still work.

It was found while testing on a configuration with framebuffer.

So it seems unlikely that there is any gain to be had, from this
perspective.

The use of the field still makes sense from the perspective of adding
pmalloc support to hardened usercopy, but there is no more need for the
field to exist as separate patch.

This patch can be simplified and merged with the pmalloc patch.

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
  2018-02-20 19:53         ` Igor Stoppa
  (?)
@ 2018-02-20 20:54           ` Matthew Wilcox
  -1 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-20 20:54 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On Tue, Feb 20, 2018 at 09:53:30PM +0200, Igor Stoppa wrote:
> The patch relies on the function vmalloc_to_page ... which will return
> NULL when applied to huge mappings, while the original implementation
> will still work.

Huh?  vmalloc_to_page() should work for huge mappings...

> It was found while testing on a configuration with framebuffer.

... ah.  You tried to use vmalloc_to_page() on something which wasn't
backed by a struct page.  That's *supposed* to return NULL, but my
guess is that after this patch it returned garbage.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-20 20:54           ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-20 20:54 UTC (permalink / raw)
  To: linux-security-module

On Tue, Feb 20, 2018 at 09:53:30PM +0200, Igor Stoppa wrote:
> The patch relies on the function vmalloc_to_page ... which will return
> NULL when applied to huge mappings, while the original implementation
> will still work.

Huh?  vmalloc_to_page() should work for huge mappings...

> It was found while testing on a configuration with framebuffer.

... ah.  You tried to use vmalloc_to_page() on something which wasn't
backed by a struct page.  That's *supposed* to return NULL, but my
guess is that after this patch it returned garbage.

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-20 20:54           ` Matthew Wilcox
  0 siblings, 0 replies; 146+ messages in thread
From: Matthew Wilcox @ 2018-02-20 20:54 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On Tue, Feb 20, 2018 at 09:53:30PM +0200, Igor Stoppa wrote:
> The patch relies on the function vmalloc_to_page ... which will return
> NULL when applied to huge mappings, while the original implementation
> will still work.

Huh?  vmalloc_to_page() should work for huge mappings...

> It was found while testing on a configuration with framebuffer.

... ah.  You tried to use vmalloc_to_page() on something which wasn't
backed by a struct page.  That's *supposed* to return NULL, but my
guess is that after this patch it returned garbage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
  2018-02-20 20:54           ` Matthew Wilcox
  (?)
  (?)
@ 2018-02-21 12:01             ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-21 12:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 20/02/18 22:54, Matthew Wilcox wrote:
> On Tue, Feb 20, 2018 at 09:53:30PM +0200, Igor Stoppa wrote:

[...]

>> It was found while testing on a configuration with framebuffer.
> 
> ... ah.  You tried to use vmalloc_to_page() on something which wasn't
> backed by a struct page.  That's *supposed* to return NULL, but my
> guess is that after this patch it returned garbage.

it seems to return garbage also without this patch, but I need to clean
up the code, try it again and possibly come up with a demo patch for
triggering the problem.

I'll investigate it more. However it doesn't seem to be related to the
functionality I need. So I plan to treat it as separate issue and leave
find_vm_area untouched, at least in pmalloc scope.

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-21 12:01             ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-21 12:01 UTC (permalink / raw)
  To: linux-security-module



On 20/02/18 22:54, Matthew Wilcox wrote:
> On Tue, Feb 20, 2018 at 09:53:30PM +0200, Igor Stoppa wrote:

[...]

>> It was found while testing on a configuration with framebuffer.
> 
> ... ah.  You tried to use vmalloc_to_page() on something which wasn't
> backed by a struct page.  That's *supposed* to return NULL, but my
> guess is that after this patch it returned garbage.

it seems to return garbage also without this patch, but I need to clean
up the code, try it again and possibly come up with a demo patch for
triggering the problem.

I'll investigate it more. However it doesn't seem to be related to the
functionality I need. So I plan to treat it as separate issue and leave
find_vm_area untouched, at least in pmalloc scope.

--
igor



--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-21 12:01             ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-21 12:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 20/02/18 22:54, Matthew Wilcox wrote:
> On Tue, Feb 20, 2018 at 09:53:30PM +0200, Igor Stoppa wrote:

[...]

>> It was found while testing on a configuration with framebuffer.
> 
> ... ah.  You tried to use vmalloc_to_page() on something which wasn't
> backed by a struct page.  That's *supposed* to return NULL, but my
> guess is that after this patch it returned garbage.

it seems to return garbage also without this patch, but I need to clean
up the code, try it again and possibly come up with a demo patch for
triggering the problem.

I'll investigate it more. However it doesn't seem to be related to the
functionality I need. So I plan to treat it as separate issue and leave
find_vm_area untouched, at least in pmalloc scope.

--
igor



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-21 12:01             ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-21 12:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening



On 20/02/18 22:54, Matthew Wilcox wrote:
> On Tue, Feb 20, 2018 at 09:53:30PM +0200, Igor Stoppa wrote:

[...]

>> It was found while testing on a configuration with framebuffer.
> 
> ... ah.  You tried to use vmalloc_to_page() on something which wasn't
> backed by a struct page.  That's *supposed* to return NULL, but my
> guess is that after this patch it returned garbage.

it seems to return garbage also without this patch, but I need to clean
up the code, try it again and possibly come up with a demo patch for
triggering the problem.

I'll investigate it more. However it doesn't seem to be related to the
functionality I need. So I plan to treat it as separate issue and leave
find_vm_area untouched, at least in pmalloc scope.

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
  2018-02-21 12:01             ` Igor Stoppa
  (?)
  (?)
@ 2018-02-22 14:20               ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-22 14:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 21/02/18 14:01, Igor Stoppa wrote:

> it seems to return garbage also without this patch, but I need to clean
> up the code, try it again and possibly come up with a demo patch for
> triggering the problem.
> 
> I'll investigate it more. However it doesn't seem to be related to the
> functionality I need. So I plan to treat it as separate issue and leave
> find_vm_area untouched, at least in pmalloc scope.


Follow-up:

https://lkml.org/lkml/2018/2/22/427

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-22 14:20               ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-22 14:20 UTC (permalink / raw)
  To: linux-security-module

On 21/02/18 14:01, Igor Stoppa wrote:

> it seems to return garbage also without this patch, but I need to clean
> up the code, try it again and possibly come up with a demo patch for
> triggering the problem.
> 
> I'll investigate it more. However it doesn't seem to be related to the
> functionality I need. So I plan to treat it as separate issue and leave
> find_vm_area untouched, at least in pmalloc scope.


Follow-up:

https://lkml.org/lkml/2018/2/22/427

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-22 14:20               ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-22 14:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 21/02/18 14:01, Igor Stoppa wrote:

> it seems to return garbage also without this patch, but I need to clean
> up the code, try it again and possibly come up with a demo patch for
> triggering the problem.
> 
> I'll investigate it more. However it doesn't seem to be related to the
> functionality I need. So I plan to treat it as separate issue and leave
> find_vm_area untouched, at least in pmalloc scope.


Follow-up:

https://lkml.org/lkml/2018/2/22/427

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 3/6] struct page: add field for vm_struct
@ 2018-02-22 14:20               ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-22 14:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 21/02/18 14:01, Igor Stoppa wrote:

> it seems to return garbage also without this patch, but I need to clean
> up the code, try it again and possibly come up with a demo patch for
> triggering the problem.
> 
> I'll investigate it more. However it doesn't seem to be related to the
> functionality I need. So I plan to treat it as separate issue and leave
> find_vm_area untouched, at least in pmalloc scope.


Follow-up:

https://lkml.org/lkml/2018/2/22/427

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* RE: [PATCH 4/6] Protectable Memory
  2018-02-13  0:40                       ` Laura Abbott
@ 2018-02-13 15:20                         ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-13 15:20 UTC (permalink / raw)
  To: Laura Abbott, Kees Cook
  Cc: Boris Lukashev, Christopher Lameter, Matthew Wilcox, Jann Horn,
	Jerome Glisse, Michal Hocko, Christoph Hellwig,
	linux-security-module, Linux-MM, kernel list, Kernel Hardening

[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]

hi,
apologies for (probably) breaking any email etiquette, but i'm travelling and i have available only the corporate mail client.
I'll reply more extensively to all the comments i go next week, when i'm back to the office.

In the meanwhile i would like to point out that I had already addressed this, in past thread, but got no reply.

To recap:
-1) vmalloced memory is harder to attack than kmalloced, because it requires the attacker to figuere out also the physical address. Currently it's sufficient to identify the randomized base address and the offset in memory of the victim.
I have not seen comments about this statement I made. Is it incorrect?
-2) this patchset is about protecting something that right now is not protected at all. That should be the starting point for comparison. If it was possible to have separate section like const or _ro_after init, the situation would be different, but i was told that it's not possible. furthermore, it would require reserving a fixed size "zone", i think.
-3)What is the attack we want to make harder to perform? Because even const data can be attacked, if we assume that the attacker can alter page mappings. In reality, the only safe way would be to have one-way only protection. But we do not have it. Why alterations of page properties are not considered a risk and the physmap is?
And how would it be easier (i suppose) to attack the latter?
I'm all for hardening what is possible, but I feel I do not have full understanding of some of the assumptions being made here.
Getting some answers to my questions above might help me seeing the point being made.

--
thanks, igor



--------------------------------------------------
Igor Stoppa Igor Stoppa
M:
E: igor.stoppa@huawei.com<mailto:igor.stoppa@huawei.com>
2012<tel:2012>实验室-赫尔辛基研究所
2012<tel:2012> Laboratories-Helsinki Research Center
From:Laura Abbott
To:Kees Cook,Igor Stoppa,
Cc:Boris Lukashev,Christopher Lameter,Matthew Wilcox,Jann Horn,Jerome Glisse,Michal Hocko,Christoph Hellwig,linux-security-module,Linux-MM,kernel list,Kernel Hardening,
Date:2018-02-13 00:40:54
Subject:Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

On 02/12/2018 03:27 PM, Kees Cook wrote:
> On Sun, Feb 4, 2018 at 7:05 AM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>> On 04/02/18 00:29, Boris Lukashev wrote:
>>> On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa <igor.stoppa@huawei.com> wrote:
>>
>> [...]
>>
>>>> What you are suggesting, if I have understood it correctly, is that,
>>>> when the pool is protected, the addresses already given out, will become
>>>> traps that get resolved through a lookup table that is built based on
>>>> the content of each allocation.
>>>>
>>>> That seems to generate a lot of overhead, not to mention the fact that
>>>> it might not play very well with the MMU.
>>>
>>> That is effectively what i'm suggesting - as a form of protection for
>>> consumers against direct reads of data which may have been corrupted
>>> by some irrelevant means. In the context of pmalloc, it would probably
>>> be a separate type of ro+verified pool
>> ok, that seems more like an extension though.
>>
>> ATM I am having problems gaining traction to get even the basic merged :-)
>>
>> I would consider this as a possibility for future work, unless it is
>> said that it's necessary for pmalloc to be accepted ...
>
> I would agree: let's get basic functionality in first. Both
> verification and the physmap part can be done separately, IMO.

Skipping over physmap leaves a pretty big area of exposure that could
be difficult to solve later. I appreciate this might block basic
functionality but I don't think we should just gloss over it without
at least some idea of what we would do.

Thanks,
Laura

[-- Attachment #2: Type: text/html, Size: 5191 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
  2018-02-12 16:52 [RFC PATCH v16 0/6] mm: security: ro protection for dynamic data Igor Stoppa
  2018-02-12 16:52   ` Igor Stoppa
  (?)
@ 2018-02-12 16:52   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:52 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 242 +++++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 499 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 812 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..afc2068d5545
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool() - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool	- success
+ * * NULL			- error
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object() - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return:
+ * * 0		- the object does not belong to pmalloc
+ * * 1		- the object belongs to pmalloc
+ * * \-1	- the object overlaps pmalloc memory incorrectly
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return:
+ * * true	- the vmalloc call was successful
+ * * false	- error
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc() - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc() - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array() - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc() - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup() - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return:
+ * * pointer to the replica	- success
+ * * NULL			- error
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool() - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree() - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool() - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 87f62f31b52f..24ed35035095 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -625,6 +625,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk() - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and it should have become unavailable for any other
+ * sort of operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..abddba90a9f6
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,499 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/*
+	 * is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/*
+	 * Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/*
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 16:52   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:52 UTC (permalink / raw)
  To: linux-security-module

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 242 +++++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 499 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 812 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..afc2068d5545
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool() - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool	- success
+ * * NULL			- error
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object() - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return:
+ * * 0		- the object does not belong to pmalloc
+ * * 1		- the object belongs to pmalloc
+ * * \-1	- the object overlaps pmalloc memory incorrectly
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return:
+ * * true	- the vmalloc call was successful
+ * * false	- error
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc() - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc() - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array() - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc() - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup() - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return:
+ * * pointer to the replica	- success
+ * * NULL			- error
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool() - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree() - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool() - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 87f62f31b52f..24ed35035095 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -625,6 +625,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk() - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and it should have become unavailable for any other
+ * sort of operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..abddba90a9f6
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,499 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/*
+	 * is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/*
+	 * Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/*
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info@ http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 16:52   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:52 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 242 +++++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 499 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 812 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..afc2068d5545
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool() - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool	- success
+ * * NULL			- error
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object() - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return:
+ * * 0		- the object does not belong to pmalloc
+ * * 1		- the object belongs to pmalloc
+ * * \-1	- the object overlaps pmalloc memory incorrectly
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return:
+ * * true	- the vmalloc call was successful
+ * * false	- error
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc() - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc() - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array() - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc() - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup() - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return:
+ * * pointer to the replica	- success
+ * * NULL			- error
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool() - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree() - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool() - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 87f62f31b52f..24ed35035095 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -625,6 +625,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk() - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and it should have become unavailable for any other
+ * sort of operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..abddba90a9f6
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,499 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/*
+	 * is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/*
+	 * Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/*
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-12 16:52   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-12 16:52 UTC (permalink / raw)
  To: willy, rdunlap, corbet, keescook, mhocko, labbott, jglisse, hch
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 242 +++++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Kconfig               |   6 +
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 499 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  33 ++++
 8 files changed, 812 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..afc2068d5545
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool() - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool	- success
+ * * NULL			- error
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+/**
+ * is_pmalloc_object() - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return:
+ * * 0		- the object does not belong to pmalloc
+ * * 1		- the object belongs to pmalloc
+ * * \-1	- the object overlaps pmalloc memory incorrectly
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc() - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ *
+ * Return:
+ * * true	- the vmalloc call was successful
+ * * false	- error
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc() - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc() - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return:
+ * * pointer to the memory requested	- success
+ * * NULL				- either no memory available or
+ *					  pool already read-only
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array() - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc() - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result	- success
+ * * NULL		- error
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup() - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return:
+ * * pointer to the replica	- success
+ * * NULL			- error
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool() - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree() - mark as unused memory that was previously in use
+ * @pool: handle to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be available for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool() - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Return:
+ * * 0		- success
+ * * -EINVAL	- error
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 87f62f31b52f..24ed35035095 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -625,6 +625,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk() - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and it should have become unavailable for any other
+ * sort of operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..be578fbdce6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,9 @@ config GUP_BENCHMARK
 	  performance of get_user_pages_fast().
 
 	  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+    bool
+    depends on ARCH_HAS_SET_MEMORY
+    select GENERIC_ALLOCATOR
+    default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..abddba90a9f6
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,499 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/* Exposes the pool and its attributes through sysfs. */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/* Removes the pool and its attributes from sysfs. */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/*
+	 * is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/*
+	 * Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/*
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+	int retv;
+
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user))
+			usercopy_abort("pmalloc",
+				       "trying to write to pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+		if (retv < 0)
+			usercopy_abort("pmalloc",
+				       "invalid pmalloc object",
+				       to_user, (const unsigned long)ptr, n);
+	}
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+				 bool to_user)
+{
+}
+#endif
+
 /*
  * Validates that the given object is:
  * - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+	/* Check if object is from a pmalloc chunk. */
+	check_pmalloc_object(ptr, n, to_user);
 }
 EXPORT_SYMBOL(__check_object_size);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-04 22:06     ` Randy Dunlap
  (?)
  (?)
@ 2018-02-11  1:04       ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  1:04 UTC (permalink / raw)
  To: Randy Dunlap, jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 05/02/18 00:06, Randy Dunlap wrote:
> On 02/04/2018 08:47 AM, Igor Stoppa wrote:

[...]

>> + * pmalloc_create_pool - create a new protectable memory pool -
> 
> Drop trailing " -".

yes

>> + * @name: the name of the pool, must be unique
> 
> Is that enforced?  Will return NULL if @name is duplicated?

ok, I'll state it more clearly that it's enforced

[...]

>> + * @pool: handler to the pool to be used for memory allocation
> 
>              handle (I think)

yes, also for all the other ones

[...]

>> + * avoid sleping during allocation.
> 
>             sleeping

yes

[...]

>> + * opposite to what is allocated on-demand when pmalloc runs out of free
> 
>       opposed to

yes

>> + * space already existing in the pool and has to invoke vmalloc.
>> + *
>> + * Returns true if the vmalloc call was successful, false otherwise.
> 
> Where is the allocated memory (pointer)?  I.e., how does the caller know
> where that memory is?
> Oh, that memory isn't yet available to the caller until it calls pmalloc(), right?

yes, it's a way to:
- preemptively beef up the pool, before entering atomic context
(unlikely that it will be needed, but possible), so that there is no
need to allocate extra pages (assuming one can estimate the max memory
that will be requested)
- avoid fragmentation caused by allocating smaller groups of pages


I'll add explanation for this.

[...]

>> + * @size: amount of memory (in bytes) requested
>> + * @gfp: flags for page allocation
>> + *
>> + * Allocates memory from an unprotected pool. If the pool doesn't have
>> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
>> + * is made to add a new chunk of memory to the pool
>> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> 
>                                              fill
> What if @size is > PAGE_SIZE?

Nothing special, it gets rounded up to the nearest multiple of
PAGE_SIZE. vmalloc doesn't have only drawbacks ;-)

[...]

>> + * Returns the pointer to the memory requested upon success,
>> + * NULL otherwise (either no memory available or pool already read-only).
> 
> It would be good to use the
>     * Return:
> kernel-doc notation for return values.

yes, good point, I'm fixing it everywhere in the patchset

[...]

>> + * will be availabel for further allocations.
> 
>               available

yes

[...]

>> +/**
> 
> /** means that the following comments are kernel-doc notation, but these
> comments are not, so just use /* there, please.

yes, also to the others

[...]

>> +	/* is_pmalloc_object gets called pretty late, so chances are high
>> +	 * that the object is indeed of vmalloc type
>> +	 */
> 
> Multi-line comment style is
> 	/*
> 	 * comment1
> 	 * comment..N
> 	 */

yes, also to the others

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-11  1:04       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  1:04 UTC (permalink / raw)
  To: linux-security-module

On 05/02/18 00:06, Randy Dunlap wrote:
> On 02/04/2018 08:47 AM, Igor Stoppa wrote:

[...]

>> + * pmalloc_create_pool - create a new protectable memory pool -
> 
> Drop trailing " -".

yes

>> + * @name: the name of the pool, must be unique
> 
> Is that enforced?  Will return NULL if @name is duplicated?

ok, I'll state it more clearly that it's enforced

[...]

>> + * @pool: handler to the pool to be used for memory allocation
> 
>              handle (I think)

yes, also for all the other ones

[...]

>> + * avoid sleping during allocation.
> 
>             sleeping

yes

[...]

>> + * opposite to what is allocated on-demand when pmalloc runs out of free
> 
>       opposed to

yes

>> + * space already existing in the pool and has to invoke vmalloc.
>> + *
>> + * Returns true if the vmalloc call was successful, false otherwise.
> 
> Where is the allocated memory (pointer)?  I.e., how does the caller know
> where that memory is?
> Oh, that memory isn't yet available to the caller until it calls pmalloc(), right?

yes, it's a way to:
- preemptively beef up the pool, before entering atomic context
(unlikely that it will be needed, but possible), so that there is no
need to allocate extra pages (assuming one can estimate the max memory
that will be requested)
- avoid fragmentation caused by allocating smaller groups of pages


I'll add explanation for this.

[...]

>> + * @size: amount of memory (in bytes) requested
>> + * @gfp: flags for page allocation
>> + *
>> + * Allocates memory from an unprotected pool. If the pool doesn't have
>> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
>> + * is made to add a new chunk of memory to the pool
>> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> 
>                                              fill
> What if @size is > PAGE_SIZE?

Nothing special, it gets rounded up to the nearest multiple of
PAGE_SIZE. vmalloc doesn't have only drawbacks ;-)

[...]

>> + * Returns the pointer to the memory requested upon success,
>> + * NULL otherwise (either no memory available or pool already read-only).
> 
> It would be good to use the
>     * Return:
> kernel-doc notation for return values.

yes, good point, I'm fixing it everywhere in the patchset

[...]

>> + * will be availabel for further allocations.
> 
>               available

yes

[...]

>> +/**
> 
> /** means that the following comments are kernel-doc notation, but these
> comments are not, so just use /* there, please.

yes, also to the others

[...]

>> +	/* is_pmalloc_object gets called pretty late, so chances are high
>> +	 * that the object is indeed of vmalloc type
>> +	 */
> 
> Multi-line comment style is
> 	/*
> 	 * comment1
> 	 * comment..N
> 	 */

yes, also to the others

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-11  1:04       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  1:04 UTC (permalink / raw)
  To: Randy Dunlap, jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 05/02/18 00:06, Randy Dunlap wrote:
> On 02/04/2018 08:47 AM, Igor Stoppa wrote:

[...]

>> + * pmalloc_create_pool - create a new protectable memory pool -
> 
> Drop trailing " -".

yes

>> + * @name: the name of the pool, must be unique
> 
> Is that enforced?  Will return NULL if @name is duplicated?

ok, I'll state it more clearly that it's enforced

[...]

>> + * @pool: handler to the pool to be used for memory allocation
> 
>              handle (I think)

yes, also for all the other ones

[...]

>> + * avoid sleping during allocation.
> 
>             sleeping

yes

[...]

>> + * opposite to what is allocated on-demand when pmalloc runs out of free
> 
>       opposed to

yes

>> + * space already existing in the pool and has to invoke vmalloc.
>> + *
>> + * Returns true if the vmalloc call was successful, false otherwise.
> 
> Where is the allocated memory (pointer)?  I.e., how does the caller know
> where that memory is?
> Oh, that memory isn't yet available to the caller until it calls pmalloc(), right?

yes, it's a way to:
- preemptively beef up the pool, before entering atomic context
(unlikely that it will be needed, but possible), so that there is no
need to allocate extra pages (assuming one can estimate the max memory
that will be requested)
- avoid fragmentation caused by allocating smaller groups of pages


I'll add explanation for this.

[...]

>> + * @size: amount of memory (in bytes) requested
>> + * @gfp: flags for page allocation
>> + *
>> + * Allocates memory from an unprotected pool. If the pool doesn't have
>> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
>> + * is made to add a new chunk of memory to the pool
>> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> 
>                                              fill
> What if @size is > PAGE_SIZE?

Nothing special, it gets rounded up to the nearest multiple of
PAGE_SIZE. vmalloc doesn't have only drawbacks ;-)

[...]

>> + * Returns the pointer to the memory requested upon success,
>> + * NULL otherwise (either no memory available or pool already read-only).
> 
> It would be good to use the
>     * Return:
> kernel-doc notation for return values.

yes, good point, I'm fixing it everywhere in the patchset

[...]

>> + * will be availabel for further allocations.
> 
>               available

yes

[...]

>> +/**
> 
> /** means that the following comments are kernel-doc notation, but these
> comments are not, so just use /* there, please.

yes, also to the others

[...]

>> +	/* is_pmalloc_object gets called pretty late, so chances are high
>> +	 * that the object is indeed of vmalloc type
>> +	 */
> 
> Multi-line comment style is
> 	/*
> 	 * comment1
> 	 * comment..N
> 	 */

yes, also to the others

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-11  1:04       ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-11  1:04 UTC (permalink / raw)
  To: Randy Dunlap, jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 05/02/18 00:06, Randy Dunlap wrote:
> On 02/04/2018 08:47 AM, Igor Stoppa wrote:

[...]

>> + * pmalloc_create_pool - create a new protectable memory pool -
> 
> Drop trailing " -".

yes

>> + * @name: the name of the pool, must be unique
> 
> Is that enforced?  Will return NULL if @name is duplicated?

ok, I'll state it more clearly that it's enforced

[...]

>> + * @pool: handler to the pool to be used for memory allocation
> 
>              handle (I think)

yes, also for all the other ones

[...]

>> + * avoid sleping during allocation.
> 
>             sleeping

yes

[...]

>> + * opposite to what is allocated on-demand when pmalloc runs out of free
> 
>       opposed to

yes

>> + * space already existing in the pool and has to invoke vmalloc.
>> + *
>> + * Returns true if the vmalloc call was successful, false otherwise.
> 
> Where is the allocated memory (pointer)?  I.e., how does the caller know
> where that memory is?
> Oh, that memory isn't yet available to the caller until it calls pmalloc(), right?

yes, it's a way to:
- preemptively beef up the pool, before entering atomic context
(unlikely that it will be needed, but possible), so that there is no
need to allocate extra pages (assuming one can estimate the max memory
that will be requested)
- avoid fragmentation caused by allocating smaller groups of pages


I'll add explanation for this.

[...]

>> + * @size: amount of memory (in bytes) requested
>> + * @gfp: flags for page allocation
>> + *
>> + * Allocates memory from an unprotected pool. If the pool doesn't have
>> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
>> + * is made to add a new chunk of memory to the pool
>> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> 
>                                              fill
> What if @size is > PAGE_SIZE?

Nothing special, it gets rounded up to the nearest multiple of
PAGE_SIZE. vmalloc doesn't have only drawbacks ;-)

[...]

>> + * Returns the pointer to the memory requested upon success,
>> + * NULL otherwise (either no memory available or pool already read-only).
> 
> It would be good to use the
>     * Return:
> kernel-doc notation for return values.

yes, good point, I'm fixing it everywhere in the patchset

[...]

>> + * will be availabel for further allocations.
> 
>               available

yes

[...]

>> +/**
> 
> /** means that the following comments are kernel-doc notation, but these
> comments are not, so just use /* there, please.

yes, also to the others

[...]

>> +	/* is_pmalloc_object gets called pretty late, so chances are high
>> +	 * that the object is indeed of vmalloc type
>> +	 */
> 
> Multi-line comment style is
> 	/*
> 	 * comment1
> 	 * comment..N
> 	 */

yes, also to the others

--
igor

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-04 16:47   ` Igor Stoppa
  (?)
@ 2018-02-07 22:21     ` kbuild test robot
  -1 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-07 22:21 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening,
	Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 2474 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on v4.15]
[cannot apply to linus/master mmotm/master next-20180207]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
config: um-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um 

All errors (new ones prefixed by >>):

   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0x951): warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   (.text+0x79c): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   (.text+0xab5): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoaddr':
   (.text+0xdee5): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametonetaddr':
   (.text+0xdf85): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoproto':
   (.text+0xe1a5): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoport':
   (.text+0xdfd7): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   mm/usercopy.o: In function `__check_object_size':
>> (.text+0x3aa): undefined reference to `is_pmalloc_object'
>> collect2: error: ld returned 1 exit status

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20060 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-07 22:21     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-07 22:21 UTC (permalink / raw)
  To: linux-security-module

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on v4.15]
[cannot apply to linus/master mmotm/master next-20180207]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
config: um-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um 

All errors (new ones prefixed by >>):

   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0x951): warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   (.text+0x79c): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   (.text+0xab5): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoaddr':
   (.text+0xdee5): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametonetaddr':
   (.text+0xdf85): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoproto':
   (.text+0xe1a5): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoport':
   (.text+0xdfd7): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   mm/usercopy.o: In function `__check_object_size':
>> (.text+0x3aa): undefined reference to `is_pmalloc_object'
>> collect2: error: ld returned 1 exit status

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-07 22:21     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-07 22:21 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 2474 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on v4.15]
[cannot apply to linus/master mmotm/master next-20180207]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
config: um-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um 

All errors (new ones prefixed by >>):

   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0x951): warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   (.text+0x79c): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   (.text+0xab5): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoaddr':
   (.text+0xdee5): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametonetaddr':
   (.text+0xdf85): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoproto':
   (.text+0xe1a5): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoport':
   (.text+0xdfd7): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
   mm/usercopy.o: In function `__check_object_size':
>> (.text+0x3aa): undefined reference to `is_pmalloc_object'
>> collect2: error: ld returned 1 exit status

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20060 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-04 16:47   ` Igor Stoppa
  (?)
@ 2018-02-07 10:03     ` kbuild test robot
  -1 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-07 10:03 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening,
	Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on v4.15]
[cannot apply to linus/master mmotm/master next-20180206]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 HEAD 99d0cb7905216da7595ef08a781a9be16a8ce687 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> mm/pmalloc.c:24:10: fatal error: pmalloc-selftest.h: No such file or directory
    #include "pmalloc-selftest.h"
             ^~~~~~~~~~~~~~~~~~~~
   compilation terminated.

vim +24 mm/pmalloc.c

    23	
  > 24	#include "pmalloc-selftest.h"
    25	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6806 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-07 10:03     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-07 10:03 UTC (permalink / raw)
  To: linux-security-module

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on v4.15]
[cannot apply to linus/master mmotm/master next-20180206]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 HEAD 99d0cb7905216da7595ef08a781a9be16a8ce687 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> mm/pmalloc.c:24:10: fatal error: pmalloc-selftest.h: No such file or directory
    #include "pmalloc-selftest.h"
             ^~~~~~~~~~~~~~~~~~~~
   compilation terminated.

vim +24 mm/pmalloc.c

    23	
  > 24	#include "pmalloc-selftest.h"
    25	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-07 10:03     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-07 10:03 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on v4.15]
[cannot apply to linus/master mmotm/master next-20180206]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252 HEAD 99d0cb7905216da7595ef08a781a9be16a8ce687 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> mm/pmalloc.c:24:10: fatal error: pmalloc-selftest.h: No such file or directory
    #include "pmalloc-selftest.h"
             ^~~~~~~~~~~~~~~~~~~~
   compilation terminated.

vim +24 mm/pmalloc.c

    23	
  > 24	#include "pmalloc-selftest.h"
    25	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6806 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-02-04 16:47   ` Igor Stoppa
  (?)
@ 2018-02-04 22:06     ` Randy Dunlap
  -1 siblings, 0 replies; 146+ messages in thread
From: Randy Dunlap @ 2018-02-04 22:06 UTC (permalink / raw)
  To: Igor Stoppa, jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 02/04/2018 08:47 AM, Igor Stoppa wrote:
> The MMU available in many systems running Linux can often provide R/O
> protection to the memory pages it handles.
> 
> However, the MMU-based protection works efficiently only when said pages
> contain exclusively data that will not need further modifications.
> 
> Statically allocated variables can be segregated into a dedicated
> section, but this does not sit very well with dynamically allocated
> ones.
> 
> Dynamic allocation does not provide, currently, any means for grouping
> variables in memory pages that would contain exclusively data suitable
> for conversion to read only access mode.
> 
> The allocator here provided (pmalloc - protectable memory allocator)
> introduces the concept of pools of protectable memory.
> 
> A module can request a pool and then refer any allocation request to the
> pool handler it has received.
> 
> Once all the chunks of memory associated to a specific pool are
> initialized, the pool can be protected.
> 
> After this point, the pool can only be destroyed (it is up to the module
> to avoid any further references to the memory from the pool, after
> the destruction is invoked).
> 
> The latter case is mainly meant for releasing memory, when a module is
> unloaded.
> 
> A module can have as many pools as needed, for example to support the
> protection of data that is initialized in sufficiently distinct phases.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   3 +
>  include/linux/pmalloc.h  | 213 ++++++++++++++++++++
>  include/linux/vmalloc.h  |   1 +
>  lib/genalloc.c           |  27 +++
>  mm/Makefile              |   1 +
>  mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
>  mm/usercopy.c            |  25 ++-
>  7 files changed, 780 insertions(+), 4 deletions(-)
>  create mode 100644 include/linux/pmalloc.h
>  create mode 100644 mm/pmalloc.c
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index dcaa33e74b1c..b6c4cea9fbd8 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
>  extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
>  		dma_addr_t *dma);
>  extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> +
> +extern void gen_pool_flush_chunk(struct gen_pool *pool,
> +				 struct gen_pool_chunk *chunk);
>  extern void gen_pool_for_each_chunk(struct gen_pool *,
>  	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
>  extern size_t gen_pool_avail(struct gen_pool *);
> diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
> new file mode 100644
> index 000000000000..5fa8a78be819
> --- /dev/null
> +++ b/include/linux/pmalloc.h
> @@ -0,0 +1,213 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.h: Header for Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#ifndef _PMALLOC_H
> +#define _PMALLOC_H

use        _LINUX_PMALLOC_H_

> +
> +
> +#include <linux/genalloc.h>
> +#include <linux/string.h>
> +
> +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
> +
> +/*
> + * Library for dynamic allocation of pools of memory that can be,
> + * after initialization, marked as read-only.
> + *
> + * This is intended to complement __read_only_after_init, for those cases
> + * where either it is not possible to know the initialization value before
> + * init is completed, or the amount of data is variable and can be
> + * determined only at run-time.
> + *
> + * ***WARNING***
> + * The user of the API is expected to synchronize:
> + * 1) allocation,
> + * 2) writes to the allocated memory,
> + * 3) write protection of the pool,
> + * 4) freeing of the allocated memory, and
> + * 5) destruction of the pool.
> + *
> + * For a non-threaded scenario, this type of locking is not even required.
> + *
> + * Even if the library were to provide support for locking, point 2)
> + * would still depend on the user taking the lock.
> + */
> +
> +
> +/**
> + * pmalloc_create_pool - create a new protectable memory pool -

Drop trailing " -".

> + * @name: the name of the pool, must be unique

Is that enforced?  Will return NULL if @name is duplicated?

> + * @min_alloc_order: log2 of the minimum allocation size obtainable
> + *                   from the pool
> + *
> + * Creates a new (empty) memory pool for allocation of protectable
> + * memory. Memory will be allocated upon request (through pmalloc).
> + *
> + * Returns a pointer to the new pool upon success, otherwise a NULL.
> + */
> +struct gen_pool *pmalloc_create_pool(const char *name,
> +					 int min_alloc_order);
> +
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +/**
> + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
> + * @pool: handler to the pool to be used for memory allocation

             handle (I think)

> + * @size: amount of memory (in bytes) requested
> + *
> + * Prepares a chunk of the requested size.
> + * This is intended to both minimize latency in later memory requests and
> + * avoid sleping during allocation.

            sleeping

> + * Memory allocated with prealloc is stored in one single chunk, as

                       with pmalloc_prealloc()

> + * opposite to what is allocated on-demand when pmalloc runs out of free

      opposed to

> + * space already existing in the pool and has to invoke vmalloc.
> + *
> + * Returns true if the vmalloc call was successful, false otherwise.

Where is the allocated memory (pointer)?  I.e., how does the caller know
where that memory is?
Oh, that memory isn't yet available to the caller until it calls pmalloc(), right?


> + */
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
> +
> +/**
> + * pmalloc - allocate protectable memory from a pool
> + * @pool: handler to the pool to be used for memory allocation

             handle (?)

> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Allocates memory from an unprotected pool. If the pool doesn't have
> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
> + * is made to add a new chunk of memory to the pool
> + * (a multiple of PAGE_SIZE), in order to fit the new request.

                                             fill
What if @size is > PAGE_SIZE?

> + * Otherwise, NULL is returned.
> + *
> + * Returns the pointer to the memory requested upon success,
> + * NULL otherwise (either no memory available or pool already read-only).

It would be good to use the
    * Return:
kernel-doc notation for return values.

> + */
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
> +
> +
> +/**
> + * pzalloc - zero-initialized version of pmalloc
> + * @pool: handler to the pool to be used for memory allocation

             handle (?)

> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Executes pmalloc, initializing the memory requested to 0,
> + * before returning the pointer to it.
> + *
> + * Returns the pointer to the zeroed memory requested, upon success,
> + * NULL otherwise (either no memory available or pool already read-only).
> + */
> +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	return pmalloc(pool, size, gfp | __GFP_ZERO);
> +}
> +
> +/**
> + * pmalloc_array - allocates an array according to the parameters
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested for each element
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc, if it has a chance to succeed.
> + *
> + * Returns either NULL or the pmalloc result.
> + */
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> +				  size_t size, gfp_t flags)
> +{
> +	if (unlikely(!(pool && n && size)))
> +		return NULL;
> +	return pmalloc(pool, n * size, flags);
> +}
> +
> +/**
> + * pcalloc - allocates a 0-initialized array according to the parameters
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc_array, if it has a chance to succeed.
> + *
> + * Returns either NULL or the pmalloc result.
> + */
> +static inline void *pcalloc(struct gen_pool *pool, size_t n,
> +			    size_t size, gfp_t flags)
> +{
> +	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
> +}
> +
> +/**
> + * pstrdup - duplicate a string, using pmalloc as allocator
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @s: string to duplicate
> + * @gfp: flags for page allocation
> + *
> + * Generates a copy of the given string, allocating sufficient memory
> + * from the given pmalloc pool.
> + *
> + * Returns a pointer to the replica, NULL in case of recoverable error.
> + */
> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> +	size_t len;
> +	char *buf;
> +
> +	if (unlikely(pool == NULL || s == NULL))
> +		return NULL;
> +
> +	len = strlen(s) + 1;
> +	buf = pmalloc(pool, len, gfp);
> +	if (likely(buf))
> +		strncpy(buf, s, len);
> +	return buf;
> +}
> +
> +/**
> + * pmalloc_protect_pool - turn a read/write pool read-only
> + * @pool: the pool to protect
> + *
> + * Write-protects all the memory chunks assigned to the pool.
> + * This prevents any further allocation.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_protect_pool(struct gen_pool *pool);
> +
> +/**
> + * pfree - mark as unused memory that was previously in use
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @addr: the beginning of the memory area to be freed
> + *
> + * The behavior of pfree is different, depending on the state of the
> + * protection.
> + * If the pool is not yet protected, the memory is marked as unused and
> + * will be availabel for further allocations.

              available

> + * If the pool is already protected, the memory is marked as unused, but
> + * it will still be impossible to perform further allocation, because of
> + * the existing protection.
> + * The freed memory, in this case, will be truly released only when the
> + * pool is destroyed.
> + */
> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> +	gen_pool_free(pool, (unsigned long)addr, 0);
> +}
> +
> +/**
> + * pmalloc_destroy_pool - destroys a pool and all the associated memory
> + * @pool: the pool to destroy
> + *
> + * All the memory that was allocated through pmalloc in the pool will be freed.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_destroy_pool(struct gen_pool *pool);
> +
> +#endif


> diff --git a/mm/pmalloc.c b/mm/pmalloc.c
> new file mode 100644
> index 000000000000..11daca252589
> --- /dev/null
> +++ b/mm/pmalloc.c
> @@ -0,0 +1,514 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.c: Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/genalloc.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/atomic.h>
> +#include <linux/rculist.h>
> +#include <linux/set_memory.h>
> +#include <asm/cacheflush.h>
> +#include <asm/page.h>
> +
> +#include "pmalloc-selftest.h"
> +
> +/**

/** means that the following comments are kernel-doc notation, but these
comments are not, so just use /* there, please.

> + * pmalloc_data contains the data specific to a pmalloc pool,
> + * in a format compatible with the design of gen_alloc.
> + * Some of the fields are used for exposing the corresponding parameter
> + * to userspace, through sysfs.
> + */
> +struct pmalloc_data {
> +	struct gen_pool *pool;  /* Link back to the associated pool. */
> +	bool protected;     /* Status of the pool: RO or RW. */
> +	struct kobj_attribute attr_protected; /* Sysfs attribute. */
> +	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
> +	struct kobj_attribute attr_size;      /* Sysfs attribute. */
> +	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
> +	struct kobject *pool_kobject;
> +	struct list_head node; /* list of pools */
> +};
> +
> +static LIST_HEAD(pmalloc_final_list);
> +static LIST_HEAD(pmalloc_tmp_list);
> +static struct list_head *pmalloc_list = &pmalloc_tmp_list;
> +static DEFINE_MUTEX(pmalloc_mutex);
> +static struct kobject *pmalloc_kobject;

[snip]

> +/**

Just use /* since this is not kernel-doc notation.

> + * Exposes the pool and its attributes through sysfs.
> + */
> +static struct kobject *pmalloc_connect(struct pmalloc_data *data)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +	struct kobject *kobj;
> +
> +	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
> +	if (unlikely(!kobj))
> +		return NULL;
> +
> +	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
> +		kobject_put(kobj);
> +		kobj = NULL;
> +	}
> +	return kobj;
> +}
> +
> +/**

Ditto.

> + * Removes the pool and its attributes from sysfs.
> + */
> +static void pmalloc_disconnect(struct pmalloc_data *data,
> +			       struct kobject *kobj)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +
> +	sysfs_remove_files(kobj, attrs);
> +	kobject_put(kobj);
> +}
> +
> +/**

Same.

> + * Declares an attribute of the pool.
> + */
> +
> +#define pmalloc_attr_init(data, attr_name) \
> +do { \
> +	sysfs_attr_init(&data->attr_##attr_name.attr); \
> +	data->attr_##attr_name.attr.name = #attr_name; \
> +	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
> +	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
> +} while (0)

[snip]


> +int is_pmalloc_object(const void *ptr, const unsigned long n)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +	unsigned long area_start;
> +	unsigned long area_end;
> +	unsigned long object_start;
> +	unsigned long object_end;
> +
> +
> +	/* is_pmalloc_object gets called pretty late, so chances are high
> +	 * that the object is indeed of vmalloc type
> +	 */

Multi-line comment style is
	/*
	 * comment1
	 * comment..N
	 */

> +	if (unlikely(!is_vmalloc_addr(ptr)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	page = vmalloc_to_page(ptr);
> +	if (unlikely(!page))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area = page->area;
> +
> +	if (likely(!(area->flags & VM_PMALLOC)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area_start = (unsigned long)area->addr;
> +	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
> +	object_start = (unsigned long)ptr;
> +	object_end = object_start + n - 1;
> +
> +	if (likely((area_start <= object_start) &&
> +		   (object_end <= area_end)))
> +		return VALID_PMALLOC_OBJECT;
> +	else
> +		return INVALID_PMALLOC_OBJECT;
> +}
> +
> +
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned int order;
> +
> +	if (check_alloc_params(pool, size))
> +		return false;
> +
> +	order = (unsigned int)pool->min_alloc_order;
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(chunk == NULL))
> +		return false;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error != 0))
> +		goto abort;
> +
> +	return true;
> +abort:
> +	vfree_atomic(chunk);
> +	return false;
> +
> +}
> +
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned long retval;
> +	unsigned int order;
> +
> +	if (check_alloc_params(pool, size))
> +		return NULL;
> +
> +	order = (unsigned int)pool->min_alloc_order;
> +
> +retry_alloc_from_pool:
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval)
> +		goto return_allocation;
> +
> +	if (unlikely((gfp & __GFP_ATOMIC))) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(!chunk)) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +	if (unlikely(!tag_chunk(chunk)))
> +		goto free;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error))
> +		goto abort;
> +
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval) {
> +return_allocation:
> +		*(size_t *)retval = size;
> +		if (gfp & __GFP_ZERO)
> +			memset((void *)retval, 0, size);
> +		return (void *)retval;
> +	}
> +	/* Here there is no test for __GFP_NO_FAIL because, in case of
> +	 * concurrent allocation, one thread might add a chunk to the
> +	 * pool and this memory could be allocated by another thread,
> +	 * before the first thread gets a chance to use it.
> +	 * As long as vmalloc succeeds, it's ok to retry.
> +	 */

Fix multi-line comment style.

> +	goto retry_alloc_from_pool;
> +abort:
> +	untag_chunk(chunk);
> +free:
> +	vfree_atomic(chunk);
> +	return NULL;
> +}

[snip]

> +/**

Just use /*

> + * When the sysfs is ready to receive registrations, connect all the
> + * pools previously created. Also enable further pools to be connected
> + * right away.
> + */
> +static int __init pmalloc_late_init(void)
> +{
> +	struct pmalloc_data *data, *n;
> +
> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
> +
> +	mutex_lock(&pmalloc_mutex);
> +	pmalloc_list = &pmalloc_final_list;
> +
> +	if (likely(pmalloc_kobject != NULL)) {
> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
> +			list_move(&data->node, &pmalloc_final_list);
> +			pmalloc_connect(data);
> +		}
> +	}
> +	mutex_unlock(&pmalloc_mutex);
> +	pmalloc_selftest();
> +	return 0;
> +}
> +late_initcall(pmalloc_late_init);

> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index a9852b24715d..c3b10298d808 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -15,6 +15,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <linux/mm.h>
> +#include <linux/pmalloc.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/sched/task.h>
> @@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
>  void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  {
>  	const char *err;
> +	int retv;
>  
>  	/* Skip all tests if size is zero. */
>  	if (!n)
> @@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  
>  	/* Check for invalid addresses. */
>  	err = check_bogus_address(ptr, n);
> -	if (err)
> +	if (unlikely(err))
>  		goto report;
>  
>  	/* Check for bad heap object. */
>  	err = check_heap_object(ptr, n, to_user);
> -	if (err)
> +	if (unlikely(err))
>  		goto report;
>  
>  	/* Check for bad stack object. */
> @@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  
>  	/* Check for object in kernel to avoid text exposure. */
>  	err = check_kernel_text_object(ptr, n);
> -	if (!err)
> -		return;
> +	if (unlikely(err))
> +		goto report;
> +
> +	/* Check if object is from a pmalloc chunk.
> +	 */

Use kernel multi-line comment style.

> +	retv = is_pmalloc_object(ptr, n);
> +	if (unlikely(retv)) {
> +		if (unlikely(!to_user)) {
> +			err = "<trying to write to pmalloc object>";
> +			goto report;
> +		}
> +		if (retv < 0) {
> +			err = "<invalid pmalloc object>";
> +			goto report;
> +		}
> +	}
> +	return;
>  
>  report:
>  	report_usercopy(ptr, n, to_user, err);
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-04 22:06     ` Randy Dunlap
  0 siblings, 0 replies; 146+ messages in thread
From: Randy Dunlap @ 2018-02-04 22:06 UTC (permalink / raw)
  To: linux-security-module

On 02/04/2018 08:47 AM, Igor Stoppa wrote:
> The MMU available in many systems running Linux can often provide R/O
> protection to the memory pages it handles.
> 
> However, the MMU-based protection works efficiently only when said pages
> contain exclusively data that will not need further modifications.
> 
> Statically allocated variables can be segregated into a dedicated
> section, but this does not sit very well with dynamically allocated
> ones.
> 
> Dynamic allocation does not provide, currently, any means for grouping
> variables in memory pages that would contain exclusively data suitable
> for conversion to read only access mode.
> 
> The allocator here provided (pmalloc - protectable memory allocator)
> introduces the concept of pools of protectable memory.
> 
> A module can request a pool and then refer any allocation request to the
> pool handler it has received.
> 
> Once all the chunks of memory associated to a specific pool are
> initialized, the pool can be protected.
> 
> After this point, the pool can only be destroyed (it is up to the module
> to avoid any further references to the memory from the pool, after
> the destruction is invoked).
> 
> The latter case is mainly meant for releasing memory, when a module is
> unloaded.
> 
> A module can have as many pools as needed, for example to support the
> protection of data that is initialized in sufficiently distinct phases.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   3 +
>  include/linux/pmalloc.h  | 213 ++++++++++++++++++++
>  include/linux/vmalloc.h  |   1 +
>  lib/genalloc.c           |  27 +++
>  mm/Makefile              |   1 +
>  mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
>  mm/usercopy.c            |  25 ++-
>  7 files changed, 780 insertions(+), 4 deletions(-)
>  create mode 100644 include/linux/pmalloc.h
>  create mode 100644 mm/pmalloc.c
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index dcaa33e74b1c..b6c4cea9fbd8 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
>  extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
>  		dma_addr_t *dma);
>  extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> +
> +extern void gen_pool_flush_chunk(struct gen_pool *pool,
> +				 struct gen_pool_chunk *chunk);
>  extern void gen_pool_for_each_chunk(struct gen_pool *,
>  	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
>  extern size_t gen_pool_avail(struct gen_pool *);
> diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
> new file mode 100644
> index 000000000000..5fa8a78be819
> --- /dev/null
> +++ b/include/linux/pmalloc.h
> @@ -0,0 +1,213 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.h: Header for Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#ifndef _PMALLOC_H
> +#define _PMALLOC_H

use        _LINUX_PMALLOC_H_

> +
> +
> +#include <linux/genalloc.h>
> +#include <linux/string.h>
> +
> +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
> +
> +/*
> + * Library for dynamic allocation of pools of memory that can be,
> + * after initialization, marked as read-only.
> + *
> + * This is intended to complement __read_only_after_init, for those cases
> + * where either it is not possible to know the initialization value before
> + * init is completed, or the amount of data is variable and can be
> + * determined only at run-time.
> + *
> + * ***WARNING***
> + * The user of the API is expected to synchronize:
> + * 1) allocation,
> + * 2) writes to the allocated memory,
> + * 3) write protection of the pool,
> + * 4) freeing of the allocated memory, and
> + * 5) destruction of the pool.
> + *
> + * For a non-threaded scenario, this type of locking is not even required.
> + *
> + * Even if the library were to provide support for locking, point 2)
> + * would still depend on the user taking the lock.
> + */
> +
> +
> +/**
> + * pmalloc_create_pool - create a new protectable memory pool -

Drop trailing " -".

> + * @name: the name of the pool, must be unique

Is that enforced?  Will return NULL if @name is duplicated?

> + * @min_alloc_order: log2 of the minimum allocation size obtainable
> + *                   from the pool
> + *
> + * Creates a new (empty) memory pool for allocation of protectable
> + * memory. Memory will be allocated upon request (through pmalloc).
> + *
> + * Returns a pointer to the new pool upon success, otherwise a NULL.
> + */
> +struct gen_pool *pmalloc_create_pool(const char *name,
> +					 int min_alloc_order);
> +
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +/**
> + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
> + * @pool: handler to the pool to be used for memory allocation

             handle (I think)

> + * @size: amount of memory (in bytes) requested
> + *
> + * Prepares a chunk of the requested size.
> + * This is intended to both minimize latency in later memory requests and
> + * avoid sleping during allocation.

            sleeping

> + * Memory allocated with prealloc is stored in one single chunk, as

                       with pmalloc_prealloc()

> + * opposite to what is allocated on-demand when pmalloc runs out of free

      opposed to

> + * space already existing in the pool and has to invoke vmalloc.
> + *
> + * Returns true if the vmalloc call was successful, false otherwise.

Where is the allocated memory (pointer)?  I.e., how does the caller know
where that memory is?
Oh, that memory isn't yet available to the caller until it calls pmalloc(), right?


> + */
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
> +
> +/**
> + * pmalloc - allocate protectable memory from a pool
> + * @pool: handler to the pool to be used for memory allocation

             handle (?)

> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Allocates memory from an unprotected pool. If the pool doesn't have
> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
> + * is made to add a new chunk of memory to the pool
> + * (a multiple of PAGE_SIZE), in order to fit the new request.

                                             fill
What if @size is > PAGE_SIZE?

> + * Otherwise, NULL is returned.
> + *
> + * Returns the pointer to the memory requested upon success,
> + * NULL otherwise (either no memory available or pool already read-only).

It would be good to use the
    * Return:
kernel-doc notation for return values.

> + */
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
> +
> +
> +/**
> + * pzalloc - zero-initialized version of pmalloc
> + * @pool: handler to the pool to be used for memory allocation

             handle (?)

> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Executes pmalloc, initializing the memory requested to 0,
> + * before returning the pointer to it.
> + *
> + * Returns the pointer to the zeroed memory requested, upon success,
> + * NULL otherwise (either no memory available or pool already read-only).
> + */
> +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	return pmalloc(pool, size, gfp | __GFP_ZERO);
> +}
> +
> +/**
> + * pmalloc_array - allocates an array according to the parameters
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested for each element
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc, if it has a chance to succeed.
> + *
> + * Returns either NULL or the pmalloc result.
> + */
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> +				  size_t size, gfp_t flags)
> +{
> +	if (unlikely(!(pool && n && size)))
> +		return NULL;
> +	return pmalloc(pool, n * size, flags);
> +}
> +
> +/**
> + * pcalloc - allocates a 0-initialized array according to the parameters
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc_array, if it has a chance to succeed.
> + *
> + * Returns either NULL or the pmalloc result.
> + */
> +static inline void *pcalloc(struct gen_pool *pool, size_t n,
> +			    size_t size, gfp_t flags)
> +{
> +	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
> +}
> +
> +/**
> + * pstrdup - duplicate a string, using pmalloc as allocator
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @s: string to duplicate
> + * @gfp: flags for page allocation
> + *
> + * Generates a copy of the given string, allocating sufficient memory
> + * from the given pmalloc pool.
> + *
> + * Returns a pointer to the replica, NULL in case of recoverable error.
> + */
> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> +	size_t len;
> +	char *buf;
> +
> +	if (unlikely(pool == NULL || s == NULL))
> +		return NULL;
> +
> +	len = strlen(s) + 1;
> +	buf = pmalloc(pool, len, gfp);
> +	if (likely(buf))
> +		strncpy(buf, s, len);
> +	return buf;
> +}
> +
> +/**
> + * pmalloc_protect_pool - turn a read/write pool read-only
> + * @pool: the pool to protect
> + *
> + * Write-protects all the memory chunks assigned to the pool.
> + * This prevents any further allocation.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_protect_pool(struct gen_pool *pool);
> +
> +/**
> + * pfree - mark as unused memory that was previously in use
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @addr: the beginning of the memory area to be freed
> + *
> + * The behavior of pfree is different, depending on the state of the
> + * protection.
> + * If the pool is not yet protected, the memory is marked as unused and
> + * will be availabel for further allocations.

              available

> + * If the pool is already protected, the memory is marked as unused, but
> + * it will still be impossible to perform further allocation, because of
> + * the existing protection.
> + * The freed memory, in this case, will be truly released only when the
> + * pool is destroyed.
> + */
> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> +	gen_pool_free(pool, (unsigned long)addr, 0);
> +}
> +
> +/**
> + * pmalloc_destroy_pool - destroys a pool and all the associated memory
> + * @pool: the pool to destroy
> + *
> + * All the memory that was allocated through pmalloc in the pool will be freed.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_destroy_pool(struct gen_pool *pool);
> +
> +#endif


> diff --git a/mm/pmalloc.c b/mm/pmalloc.c
> new file mode 100644
> index 000000000000..11daca252589
> --- /dev/null
> +++ b/mm/pmalloc.c
> @@ -0,0 +1,514 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.c: Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/genalloc.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/atomic.h>
> +#include <linux/rculist.h>
> +#include <linux/set_memory.h>
> +#include <asm/cacheflush.h>
> +#include <asm/page.h>
> +
> +#include "pmalloc-selftest.h"
> +
> +/**

/** means that the following comments are kernel-doc notation, but these
comments are not, so just use /* there, please.

> + * pmalloc_data contains the data specific to a pmalloc pool,
> + * in a format compatible with the design of gen_alloc.
> + * Some of the fields are used for exposing the corresponding parameter
> + * to userspace, through sysfs.
> + */
> +struct pmalloc_data {
> +	struct gen_pool *pool;  /* Link back to the associated pool. */
> +	bool protected;     /* Status of the pool: RO or RW. */
> +	struct kobj_attribute attr_protected; /* Sysfs attribute. */
> +	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
> +	struct kobj_attribute attr_size;      /* Sysfs attribute. */
> +	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
> +	struct kobject *pool_kobject;
> +	struct list_head node; /* list of pools */
> +};
> +
> +static LIST_HEAD(pmalloc_final_list);
> +static LIST_HEAD(pmalloc_tmp_list);
> +static struct list_head *pmalloc_list = &pmalloc_tmp_list;
> +static DEFINE_MUTEX(pmalloc_mutex);
> +static struct kobject *pmalloc_kobject;

[snip]

> +/**

Just use /* since this is not kernel-doc notation.

> + * Exposes the pool and its attributes through sysfs.
> + */
> +static struct kobject *pmalloc_connect(struct pmalloc_data *data)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +	struct kobject *kobj;
> +
> +	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
> +	if (unlikely(!kobj))
> +		return NULL;
> +
> +	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
> +		kobject_put(kobj);
> +		kobj = NULL;
> +	}
> +	return kobj;
> +}
> +
> +/**

Ditto.

> + * Removes the pool and its attributes from sysfs.
> + */
> +static void pmalloc_disconnect(struct pmalloc_data *data,
> +			       struct kobject *kobj)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +
> +	sysfs_remove_files(kobj, attrs);
> +	kobject_put(kobj);
> +}
> +
> +/**

Same.

> + * Declares an attribute of the pool.
> + */
> +
> +#define pmalloc_attr_init(data, attr_name) \
> +do { \
> +	sysfs_attr_init(&data->attr_##attr_name.attr); \
> +	data->attr_##attr_name.attr.name = #attr_name; \
> +	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
> +	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
> +} while (0)

[snip]


> +int is_pmalloc_object(const void *ptr, const unsigned long n)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +	unsigned long area_start;
> +	unsigned long area_end;
> +	unsigned long object_start;
> +	unsigned long object_end;
> +
> +
> +	/* is_pmalloc_object gets called pretty late, so chances are high
> +	 * that the object is indeed of vmalloc type
> +	 */

Multi-line comment style is
	/*
	 * comment1
	 * comment..N
	 */

> +	if (unlikely(!is_vmalloc_addr(ptr)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	page = vmalloc_to_page(ptr);
> +	if (unlikely(!page))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area = page->area;
> +
> +	if (likely(!(area->flags & VM_PMALLOC)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area_start = (unsigned long)area->addr;
> +	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
> +	object_start = (unsigned long)ptr;
> +	object_end = object_start + n - 1;
> +
> +	if (likely((area_start <= object_start) &&
> +		   (object_end <= area_end)))
> +		return VALID_PMALLOC_OBJECT;
> +	else
> +		return INVALID_PMALLOC_OBJECT;
> +}
> +
> +
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned int order;
> +
> +	if (check_alloc_params(pool, size))
> +		return false;
> +
> +	order = (unsigned int)pool->min_alloc_order;
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(chunk == NULL))
> +		return false;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error != 0))
> +		goto abort;
> +
> +	return true;
> +abort:
> +	vfree_atomic(chunk);
> +	return false;
> +
> +}
> +
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned long retval;
> +	unsigned int order;
> +
> +	if (check_alloc_params(pool, size))
> +		return NULL;
> +
> +	order = (unsigned int)pool->min_alloc_order;
> +
> +retry_alloc_from_pool:
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval)
> +		goto return_allocation;
> +
> +	if (unlikely((gfp & __GFP_ATOMIC))) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(!chunk)) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +	if (unlikely(!tag_chunk(chunk)))
> +		goto free;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error))
> +		goto abort;
> +
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval) {
> +return_allocation:
> +		*(size_t *)retval = size;
> +		if (gfp & __GFP_ZERO)
> +			memset((void *)retval, 0, size);
> +		return (void *)retval;
> +	}
> +	/* Here there is no test for __GFP_NO_FAIL because, in case of
> +	 * concurrent allocation, one thread might add a chunk to the
> +	 * pool and this memory could be allocated by another thread,
> +	 * before the first thread gets a chance to use it.
> +	 * As long as vmalloc succeeds, it's ok to retry.
> +	 */

Fix multi-line comment style.

> +	goto retry_alloc_from_pool;
> +abort:
> +	untag_chunk(chunk);
> +free:
> +	vfree_atomic(chunk);
> +	return NULL;
> +}

[snip]

> +/**

Just use /*

> + * When the sysfs is ready to receive registrations, connect all the
> + * pools previously created. Also enable further pools to be connected
> + * right away.
> + */
> +static int __init pmalloc_late_init(void)
> +{
> +	struct pmalloc_data *data, *n;
> +
> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
> +
> +	mutex_lock(&pmalloc_mutex);
> +	pmalloc_list = &pmalloc_final_list;
> +
> +	if (likely(pmalloc_kobject != NULL)) {
> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
> +			list_move(&data->node, &pmalloc_final_list);
> +			pmalloc_connect(data);
> +		}
> +	}
> +	mutex_unlock(&pmalloc_mutex);
> +	pmalloc_selftest();
> +	return 0;
> +}
> +late_initcall(pmalloc_late_init);

> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index a9852b24715d..c3b10298d808 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -15,6 +15,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <linux/mm.h>
> +#include <linux/pmalloc.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/sched/task.h>
> @@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
>  void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  {
>  	const char *err;
> +	int retv;
>  
>  	/* Skip all tests if size is zero. */
>  	if (!n)
> @@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  
>  	/* Check for invalid addresses. */
>  	err = check_bogus_address(ptr, n);
> -	if (err)
> +	if (unlikely(err))
>  		goto report;
>  
>  	/* Check for bad heap object. */
>  	err = check_heap_object(ptr, n, to_user);
> -	if (err)
> +	if (unlikely(err))
>  		goto report;
>  
>  	/* Check for bad stack object. */
> @@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  
>  	/* Check for object in kernel to avoid text exposure. */
>  	err = check_kernel_text_object(ptr, n);
> -	if (!err)
> -		return;
> +	if (unlikely(err))
> +		goto report;
> +
> +	/* Check if object is from a pmalloc chunk.
> +	 */

Use kernel multi-line comment style.

> +	retv = is_pmalloc_object(ptr, n);
> +	if (unlikely(retv)) {
> +		if (unlikely(!to_user)) {
> +			err = "<trying to write to pmalloc object>";
> +			goto report;
> +		}
> +		if (retv < 0) {
> +			err = "<invalid pmalloc object>";
> +			goto report;
> +		}
> +	}
> +	return;
>  
>  report:
>  	report_usercopy(ptr, n, to_user, err);
> 


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-04 22:06     ` Randy Dunlap
  0 siblings, 0 replies; 146+ messages in thread
From: Randy Dunlap @ 2018-02-04 22:06 UTC (permalink / raw)
  To: Igor Stoppa, jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 02/04/2018 08:47 AM, Igor Stoppa wrote:
> The MMU available in many systems running Linux can often provide R/O
> protection to the memory pages it handles.
> 
> However, the MMU-based protection works efficiently only when said pages
> contain exclusively data that will not need further modifications.
> 
> Statically allocated variables can be segregated into a dedicated
> section, but this does not sit very well with dynamically allocated
> ones.
> 
> Dynamic allocation does not provide, currently, any means for grouping
> variables in memory pages that would contain exclusively data suitable
> for conversion to read only access mode.
> 
> The allocator here provided (pmalloc - protectable memory allocator)
> introduces the concept of pools of protectable memory.
> 
> A module can request a pool and then refer any allocation request to the
> pool handler it has received.
> 
> Once all the chunks of memory associated to a specific pool are
> initialized, the pool can be protected.
> 
> After this point, the pool can only be destroyed (it is up to the module
> to avoid any further references to the memory from the pool, after
> the destruction is invoked).
> 
> The latter case is mainly meant for releasing memory, when a module is
> unloaded.
> 
> A module can have as many pools as needed, for example to support the
> protection of data that is initialized in sufficiently distinct phases.
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> ---
>  include/linux/genalloc.h |   3 +
>  include/linux/pmalloc.h  | 213 ++++++++++++++++++++
>  include/linux/vmalloc.h  |   1 +
>  lib/genalloc.c           |  27 +++
>  mm/Makefile              |   1 +
>  mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
>  mm/usercopy.c            |  25 ++-
>  7 files changed, 780 insertions(+), 4 deletions(-)
>  create mode 100644 include/linux/pmalloc.h
>  create mode 100644 mm/pmalloc.c
> 
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> index dcaa33e74b1c..b6c4cea9fbd8 100644
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
>  extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
>  		dma_addr_t *dma);
>  extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> +
> +extern void gen_pool_flush_chunk(struct gen_pool *pool,
> +				 struct gen_pool_chunk *chunk);
>  extern void gen_pool_for_each_chunk(struct gen_pool *,
>  	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
>  extern size_t gen_pool_avail(struct gen_pool *);
> diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
> new file mode 100644
> index 000000000000..5fa8a78be819
> --- /dev/null
> +++ b/include/linux/pmalloc.h
> @@ -0,0 +1,213 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.h: Header for Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#ifndef _PMALLOC_H
> +#define _PMALLOC_H

use        _LINUX_PMALLOC_H_

> +
> +
> +#include <linux/genalloc.h>
> +#include <linux/string.h>
> +
> +#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
> +
> +/*
> + * Library for dynamic allocation of pools of memory that can be,
> + * after initialization, marked as read-only.
> + *
> + * This is intended to complement __read_only_after_init, for those cases
> + * where either it is not possible to know the initialization value before
> + * init is completed, or the amount of data is variable and can be
> + * determined only at run-time.
> + *
> + * ***WARNING***
> + * The user of the API is expected to synchronize:
> + * 1) allocation,
> + * 2) writes to the allocated memory,
> + * 3) write protection of the pool,
> + * 4) freeing of the allocated memory, and
> + * 5) destruction of the pool.
> + *
> + * For a non-threaded scenario, this type of locking is not even required.
> + *
> + * Even if the library were to provide support for locking, point 2)
> + * would still depend on the user taking the lock.
> + */
> +
> +
> +/**
> + * pmalloc_create_pool - create a new protectable memory pool -

Drop trailing " -".

> + * @name: the name of the pool, must be unique

Is that enforced?  Will return NULL if @name is duplicated?

> + * @min_alloc_order: log2 of the minimum allocation size obtainable
> + *                   from the pool
> + *
> + * Creates a new (empty) memory pool for allocation of protectable
> + * memory. Memory will be allocated upon request (through pmalloc).
> + *
> + * Returns a pointer to the new pool upon success, otherwise a NULL.
> + */
> +struct gen_pool *pmalloc_create_pool(const char *name,
> +					 int min_alloc_order);
> +
> +
> +int is_pmalloc_object(const void *ptr, const unsigned long n);
> +
> +/**
> + * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
> + * @pool: handler to the pool to be used for memory allocation

             handle (I think)

> + * @size: amount of memory (in bytes) requested
> + *
> + * Prepares a chunk of the requested size.
> + * This is intended to both minimize latency in later memory requests and
> + * avoid sleping during allocation.

            sleeping

> + * Memory allocated with prealloc is stored in one single chunk, as

                       with pmalloc_prealloc()

> + * opposite to what is allocated on-demand when pmalloc runs out of free

      opposed to

> + * space already existing in the pool and has to invoke vmalloc.
> + *
> + * Returns true if the vmalloc call was successful, false otherwise.

Where is the allocated memory (pointer)?  I.e., how does the caller know
where that memory is?
Oh, that memory isn't yet available to the caller until it calls pmalloc(), right?


> + */
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
> +
> +/**
> + * pmalloc - allocate protectable memory from a pool
> + * @pool: handler to the pool to be used for memory allocation

             handle (?)

> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Allocates memory from an unprotected pool. If the pool doesn't have
> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
> + * is made to add a new chunk of memory to the pool
> + * (a multiple of PAGE_SIZE), in order to fit the new request.

                                             fill
What if @size is > PAGE_SIZE?

> + * Otherwise, NULL is returned.
> + *
> + * Returns the pointer to the memory requested upon success,
> + * NULL otherwise (either no memory available or pool already read-only).

It would be good to use the
    * Return:
kernel-doc notation for return values.

> + */
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
> +
> +
> +/**
> + * pzalloc - zero-initialized version of pmalloc
> + * @pool: handler to the pool to be used for memory allocation

             handle (?)

> + * @size: amount of memory (in bytes) requested
> + * @gfp: flags for page allocation
> + *
> + * Executes pmalloc, initializing the memory requested to 0,
> + * before returning the pointer to it.
> + *
> + * Returns the pointer to the zeroed memory requested, upon success,
> + * NULL otherwise (either no memory available or pool already read-only).
> + */
> +static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	return pmalloc(pool, size, gfp | __GFP_ZERO);
> +}
> +
> +/**
> + * pmalloc_array - allocates an array according to the parameters
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested for each element
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc, if it has a chance to succeed.
> + *
> + * Returns either NULL or the pmalloc result.
> + */
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> +				  size_t size, gfp_t flags)
> +{
> +	if (unlikely(!(pool && n && size)))
> +		return NULL;
> +	return pmalloc(pool, n * size, flags);
> +}
> +
> +/**
> + * pcalloc - allocates a 0-initialized array according to the parameters
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @n: number of elements in the array
> + * @size: amount of memory (in bytes) requested
> + * @flags: flags for page allocation
> + *
> + * Executes pmalloc_array, if it has a chance to succeed.
> + *
> + * Returns either NULL or the pmalloc result.
> + */
> +static inline void *pcalloc(struct gen_pool *pool, size_t n,
> +			    size_t size, gfp_t flags)
> +{
> +	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
> +}
> +
> +/**
> + * pstrdup - duplicate a string, using pmalloc as allocator
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @s: string to duplicate
> + * @gfp: flags for page allocation
> + *
> + * Generates a copy of the given string, allocating sufficient memory
> + * from the given pmalloc pool.
> + *
> + * Returns a pointer to the replica, NULL in case of recoverable error.
> + */
> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> +	size_t len;
> +	char *buf;
> +
> +	if (unlikely(pool == NULL || s == NULL))
> +		return NULL;
> +
> +	len = strlen(s) + 1;
> +	buf = pmalloc(pool, len, gfp);
> +	if (likely(buf))
> +		strncpy(buf, s, len);
> +	return buf;
> +}
> +
> +/**
> + * pmalloc_protect_pool - turn a read/write pool read-only
> + * @pool: the pool to protect
> + *
> + * Write-protects all the memory chunks assigned to the pool.
> + * This prevents any further allocation.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_protect_pool(struct gen_pool *pool);
> +
> +/**
> + * pfree - mark as unused memory that was previously in use
> + * @pool: handler to the pool to be used for memory allocation

             handle

> + * @addr: the beginning of the memory area to be freed
> + *
> + * The behavior of pfree is different, depending on the state of the
> + * protection.
> + * If the pool is not yet protected, the memory is marked as unused and
> + * will be availabel for further allocations.

              available

> + * If the pool is already protected, the memory is marked as unused, but
> + * it will still be impossible to perform further allocation, because of
> + * the existing protection.
> + * The freed memory, in this case, will be truly released only when the
> + * pool is destroyed.
> + */
> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> +	gen_pool_free(pool, (unsigned long)addr, 0);
> +}
> +
> +/**
> + * pmalloc_destroy_pool - destroys a pool and all the associated memory
> + * @pool: the pool to destroy
> + *
> + * All the memory that was allocated through pmalloc in the pool will be freed.
> + *
> + * Returns 0 upon success, -EINVAL in abnormal cases.
> + */
> +int pmalloc_destroy_pool(struct gen_pool *pool);
> +
> +#endif


> diff --git a/mm/pmalloc.c b/mm/pmalloc.c
> new file mode 100644
> index 000000000000..11daca252589
> --- /dev/null
> +++ b/mm/pmalloc.c
> @@ -0,0 +1,514 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * pmalloc.c: Protectable Memory Allocator
> + *
> + * (C) Copyright 2017 Huawei Technologies Co. Ltd.
> + * Author: Igor Stoppa <igor.stoppa@huawei.com>
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/genalloc.h>
> +#include <linux/kernel.h>
> +#include <linux/log2.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/atomic.h>
> +#include <linux/rculist.h>
> +#include <linux/set_memory.h>
> +#include <asm/cacheflush.h>
> +#include <asm/page.h>
> +
> +#include "pmalloc-selftest.h"
> +
> +/**

/** means that the following comments are kernel-doc notation, but these
comments are not, so just use /* there, please.

> + * pmalloc_data contains the data specific to a pmalloc pool,
> + * in a format compatible with the design of gen_alloc.
> + * Some of the fields are used for exposing the corresponding parameter
> + * to userspace, through sysfs.
> + */
> +struct pmalloc_data {
> +	struct gen_pool *pool;  /* Link back to the associated pool. */
> +	bool protected;     /* Status of the pool: RO or RW. */
> +	struct kobj_attribute attr_protected; /* Sysfs attribute. */
> +	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
> +	struct kobj_attribute attr_size;      /* Sysfs attribute. */
> +	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
> +	struct kobject *pool_kobject;
> +	struct list_head node; /* list of pools */
> +};
> +
> +static LIST_HEAD(pmalloc_final_list);
> +static LIST_HEAD(pmalloc_tmp_list);
> +static struct list_head *pmalloc_list = &pmalloc_tmp_list;
> +static DEFINE_MUTEX(pmalloc_mutex);
> +static struct kobject *pmalloc_kobject;

[snip]

> +/**

Just use /* since this is not kernel-doc notation.

> + * Exposes the pool and its attributes through sysfs.
> + */
> +static struct kobject *pmalloc_connect(struct pmalloc_data *data)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +	struct kobject *kobj;
> +
> +	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
> +	if (unlikely(!kobj))
> +		return NULL;
> +
> +	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
> +		kobject_put(kobj);
> +		kobj = NULL;
> +	}
> +	return kobj;
> +}
> +
> +/**

Ditto.

> + * Removes the pool and its attributes from sysfs.
> + */
> +static void pmalloc_disconnect(struct pmalloc_data *data,
> +			       struct kobject *kobj)
> +{
> +	const struct attribute *attrs[] = {
> +		&data->attr_protected.attr,
> +		&data->attr_avail.attr,
> +		&data->attr_size.attr,
> +		&data->attr_chunks.attr,
> +		NULL
> +	};
> +
> +	sysfs_remove_files(kobj, attrs);
> +	kobject_put(kobj);
> +}
> +
> +/**

Same.

> + * Declares an attribute of the pool.
> + */
> +
> +#define pmalloc_attr_init(data, attr_name) \
> +do { \
> +	sysfs_attr_init(&data->attr_##attr_name.attr); \
> +	data->attr_##attr_name.attr.name = #attr_name; \
> +	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
> +	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
> +} while (0)

[snip]


> +int is_pmalloc_object(const void *ptr, const unsigned long n)
> +{
> +	struct vm_struct *area;
> +	struct page *page;
> +	unsigned long area_start;
> +	unsigned long area_end;
> +	unsigned long object_start;
> +	unsigned long object_end;
> +
> +
> +	/* is_pmalloc_object gets called pretty late, so chances are high
> +	 * that the object is indeed of vmalloc type
> +	 */

Multi-line comment style is
	/*
	 * comment1
	 * comment..N
	 */

> +	if (unlikely(!is_vmalloc_addr(ptr)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	page = vmalloc_to_page(ptr);
> +	if (unlikely(!page))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area = page->area;
> +
> +	if (likely(!(area->flags & VM_PMALLOC)))
> +		return NOT_PMALLOC_OBJECT;
> +
> +	area_start = (unsigned long)area->addr;
> +	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
> +	object_start = (unsigned long)ptr;
> +	object_end = object_start + n - 1;
> +
> +	if (likely((area_start <= object_start) &&
> +		   (object_end <= area_end)))
> +		return VALID_PMALLOC_OBJECT;
> +	else
> +		return INVALID_PMALLOC_OBJECT;
> +}
> +
> +
> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned int order;
> +
> +	if (check_alloc_params(pool, size))
> +		return false;
> +
> +	order = (unsigned int)pool->min_alloc_order;
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(chunk == NULL))
> +		return false;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error != 0))
> +		goto abort;
> +
> +	return true;
> +abort:
> +	vfree_atomic(chunk);
> +	return false;
> +
> +}
> +
> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{
> +	void *chunk;
> +	size_t chunk_size;
> +	bool add_error;
> +	unsigned long retval;
> +	unsigned int order;
> +
> +	if (check_alloc_params(pool, size))
> +		return NULL;
> +
> +	order = (unsigned int)pool->min_alloc_order;
> +
> +retry_alloc_from_pool:
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval)
> +		goto return_allocation;
> +
> +	if (unlikely((gfp & __GFP_ATOMIC))) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +
> +	/* Expand pool */
> +	chunk_size = roundup(size, PAGE_SIZE);
> +	chunk = vmalloc(chunk_size);
> +	if (unlikely(!chunk)) {
> +		if (unlikely((gfp & __GFP_NOFAIL)))
> +			goto retry_alloc_from_pool;
> +		else
> +			return NULL;
> +	}
> +	if (unlikely(!tag_chunk(chunk)))
> +		goto free;
> +
> +	/* Locking is already done inside gen_pool_add */
> +	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
> +				 NUMA_NO_NODE);
> +	if (unlikely(add_error))
> +		goto abort;
> +
> +	retval = gen_pool_alloc(pool, size);
> +	if (retval) {
> +return_allocation:
> +		*(size_t *)retval = size;
> +		if (gfp & __GFP_ZERO)
> +			memset((void *)retval, 0, size);
> +		return (void *)retval;
> +	}
> +	/* Here there is no test for __GFP_NO_FAIL because, in case of
> +	 * concurrent allocation, one thread might add a chunk to the
> +	 * pool and this memory could be allocated by another thread,
> +	 * before the first thread gets a chance to use it.
> +	 * As long as vmalloc succeeds, it's ok to retry.
> +	 */

Fix multi-line comment style.

> +	goto retry_alloc_from_pool;
> +abort:
> +	untag_chunk(chunk);
> +free:
> +	vfree_atomic(chunk);
> +	return NULL;
> +}

[snip]

> +/**

Just use /*

> + * When the sysfs is ready to receive registrations, connect all the
> + * pools previously created. Also enable further pools to be connected
> + * right away.
> + */
> +static int __init pmalloc_late_init(void)
> +{
> +	struct pmalloc_data *data, *n;
> +
> +	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
> +
> +	mutex_lock(&pmalloc_mutex);
> +	pmalloc_list = &pmalloc_final_list;
> +
> +	if (likely(pmalloc_kobject != NULL)) {
> +		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
> +			list_move(&data->node, &pmalloc_final_list);
> +			pmalloc_connect(data);
> +		}
> +	}
> +	mutex_unlock(&pmalloc_mutex);
> +	pmalloc_selftest();
> +	return 0;
> +}
> +late_initcall(pmalloc_late_init);

> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index a9852b24715d..c3b10298d808 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -15,6 +15,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <linux/mm.h>
> +#include <linux/pmalloc.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/sched/task.h>
> @@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
>  void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  {
>  	const char *err;
> +	int retv;
>  
>  	/* Skip all tests if size is zero. */
>  	if (!n)
> @@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  
>  	/* Check for invalid addresses. */
>  	err = check_bogus_address(ptr, n);
> -	if (err)
> +	if (unlikely(err))
>  		goto report;
>  
>  	/* Check for bad heap object. */
>  	err = check_heap_object(ptr, n, to_user);
> -	if (err)
> +	if (unlikely(err))
>  		goto report;
>  
>  	/* Check for bad stack object. */
> @@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
>  
>  	/* Check for object in kernel to avoid text exposure. */
>  	err = check_kernel_text_object(ptr, n);
> -	if (!err)
> -		return;
> +	if (unlikely(err))
> +		goto report;
> +
> +	/* Check if object is from a pmalloc chunk.
> +	 */

Use kernel multi-line comment style.

> +	retv = is_pmalloc_object(ptr, n);
> +	if (unlikely(retv)) {
> +		if (unlikely(!to_user)) {
> +			err = "<trying to write to pmalloc object>";
> +			goto report;
> +		}
> +		if (retv < 0) {
> +			err = "<invalid pmalloc object>";
> +			goto report;
> +		}
> +	}
> +	return;
>  
>  report:
>  	report_usercopy(ptr, n, to_user, err);
> 


-- 
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
  2018-02-04 16:47 [RFC PATCH v14 0/6] mm: security: ro protection for dynamic data Igor Stoppa
  2018-02-04 16:47   ` Igor Stoppa
  (?)
@ 2018-02-04 16:47   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-04 16:47 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 213 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 780 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..5fa8a78be819
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,213 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde78307b093..7ba2ec96c360 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..a6a47e1b6e66 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..11daca252589
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..c3b10298d808 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.16.0

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-04 16:47   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-04 16:47 UTC (permalink / raw)
  To: linux-security-module

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 213 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 780 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..5fa8a78be819
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,213 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde78307b093..7ba2ec96c360 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..a6a47e1b6e66 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..11daca252589
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..c3b10298d808 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.16.0

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-04 16:47   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-04 16:47 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 213 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 780 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..5fa8a78be819
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,213 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde78307b093..7ba2ec96c360 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..a6a47e1b6e66 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..11daca252589
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..c3b10298d808 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.16.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-04 16:47   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-04 16:47 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 213 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 780 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..5fa8a78be819
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,213 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde78307b093..7ba2ec96c360 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..a6a47e1b6e66 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..11daca252589
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..c3b10298d808 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.16.0

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
  2018-02-03 19:42 [RFC PATCH v13 0/6] mm: security: ro protection for dynamic data Igor Stoppa
  2018-02-03 19:42   ` Igor Stoppa
  (?)
@ 2018-02-03 19:42   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-03 19:42 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 211 +++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 778 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e..b6c4cea 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..6d4a24e
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,211 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..116d280 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..7ba2ec9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..11daca2
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-03 19:42   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-03 19:42 UTC (permalink / raw)
  To: linux-security-module

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 211 +++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 778 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e..b6c4cea 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..6d4a24e
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,211 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..116d280 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..7ba2ec9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..11daca2
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-03 19:42   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-03 19:42 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 211 +++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 778 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e..b6c4cea 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..6d4a24e
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,211 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..116d280 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..7ba2ec9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..11daca2
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-03 19:42   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-02-03 19:42 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 211 +++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 514 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 778 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e..b6c4cea 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..6d4a24e
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,211 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..116d280 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..7ba2ec9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_long_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..11daca2
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,514 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include "pmalloc-selftest.h"
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n",
+		       (unsigned long)gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree_atomic(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree_atomic(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	pmalloc_selftest();
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-01-30 15:14   ` Igor Stoppa
  (?)
@ 2018-02-02  5:53     ` kbuild test robot
  -1 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-02  5:53 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening,
	Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_pool_show_chunks':
>> pmalloc.c:(.text+0x50): undefined reference to `gen_pool_for_each_chunk'
   mm/pmalloc.o: In function `pmalloc_pool_show_size':
>> pmalloc.c:(.text+0x6e): undefined reference to `gen_pool_size'
   mm/pmalloc.o: In function `pmalloc_pool_show_avail':
>> pmalloc.c:(.text+0x8a): undefined reference to `gen_pool_avail'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
>> pmalloc.c:(.text+0x171): undefined reference to `gen_pool_flush_chunk'
   mm/pmalloc.o: In function `pmalloc_create_pool':
>> pmalloc.c:(.text+0x19b): undefined reference to `gen_pool_create'
>> pmalloc.c:(.text+0x2bb): undefined reference to `gen_pool_destroy'
   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x350): undefined reference to `gen_pool_add_virt'
   mm/pmalloc.o: In function `pmalloc':
>> pmalloc.c:(.text+0x3a7): undefined reference to `gen_pool_alloc'
   pmalloc.c:(.text+0x3f1): undefined reference to `gen_pool_add_virt'
   pmalloc.c:(.text+0x401): undefined reference to `gen_pool_alloc'
   mm/pmalloc.o: In function `pmalloc_destroy_pool':
   pmalloc.c:(.text+0x4a1): undefined reference to `gen_pool_for_each_chunk'
   pmalloc.c:(.text+0x4a8): undefined reference to `gen_pool_destroy'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6802 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-02  5:53     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-02  5:53 UTC (permalink / raw)
  To: linux-security-module

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_pool_show_chunks':
>> pmalloc.c:(.text+0x50): undefined reference to `gen_pool_for_each_chunk'
   mm/pmalloc.o: In function `pmalloc_pool_show_size':
>> pmalloc.c:(.text+0x6e): undefined reference to `gen_pool_size'
   mm/pmalloc.o: In function `pmalloc_pool_show_avail':
>> pmalloc.c:(.text+0x8a): undefined reference to `gen_pool_avail'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
>> pmalloc.c:(.text+0x171): undefined reference to `gen_pool_flush_chunk'
   mm/pmalloc.o: In function `pmalloc_create_pool':
>> pmalloc.c:(.text+0x19b): undefined reference to `gen_pool_create'
>> pmalloc.c:(.text+0x2bb): undefined reference to `gen_pool_destroy'
   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x350): undefined reference to `gen_pool_add_virt'
   mm/pmalloc.o: In function `pmalloc':
>> pmalloc.c:(.text+0x3a7): undefined reference to `gen_pool_alloc'
   pmalloc.c:(.text+0x3f1): undefined reference to `gen_pool_add_virt'
   pmalloc.c:(.text+0x401): undefined reference to `gen_pool_alloc'
   mm/pmalloc.o: In function `pmalloc_destroy_pool':
   pmalloc.c:(.text+0x4a1): undefined reference to `gen_pool_for_each_chunk'
   pmalloc.c:(.text+0x4a8): undefined reference to `gen_pool_destroy'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-02  5:53     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-02  5:53 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_pool_show_chunks':
>> pmalloc.c:(.text+0x50): undefined reference to `gen_pool_for_each_chunk'
   mm/pmalloc.o: In function `pmalloc_pool_show_size':
>> pmalloc.c:(.text+0x6e): undefined reference to `gen_pool_size'
   mm/pmalloc.o: In function `pmalloc_pool_show_avail':
>> pmalloc.c:(.text+0x8a): undefined reference to `gen_pool_avail'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
>> pmalloc.c:(.text+0x171): undefined reference to `gen_pool_flush_chunk'
   mm/pmalloc.o: In function `pmalloc_create_pool':
>> pmalloc.c:(.text+0x19b): undefined reference to `gen_pool_create'
>> pmalloc.c:(.text+0x2bb): undefined reference to `gen_pool_destroy'
   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x350): undefined reference to `gen_pool_add_virt'
   mm/pmalloc.o: In function `pmalloc':
>> pmalloc.c:(.text+0x3a7): undefined reference to `gen_pool_alloc'
   pmalloc.c:(.text+0x3f1): undefined reference to `gen_pool_add_virt'
   pmalloc.c:(.text+0x401): undefined reference to `gen_pool_alloc'
   mm/pmalloc.o: In function `pmalloc_destroy_pool':
   pmalloc.c:(.text+0x4a1): undefined reference to `gen_pool_for_each_chunk'
   pmalloc.c:(.text+0x4a8): undefined reference to `gen_pool_destroy'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6802 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
  2018-01-30 15:14   ` Igor Stoppa
  (?)
@ 2018-02-02  5:41     ` kbuild test robot
  -1 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-02  5:41 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening,
	Igor Stoppa

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

Hi Igor,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-randconfig-x071-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   mm/pmalloc.c: In function 'pmalloc_pool_show_avail':
>> mm/pmalloc.c:71:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
     return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
                          ~~^     ~~~~~~~~~~~~~~~~~~~~~~~~~~
                          %u
   mm/pmalloc.c: In function 'pmalloc_pool_show_size':
   mm/pmalloc.c:81:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
     return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
                          ~~^     ~~~~~~~~~~~~~~~~~~~~~~~~~
                          %u

vim +71 mm/pmalloc.c

    63	
    64	static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
    65					       struct kobj_attribute *attr,
    66					       char *buf)
    67	{
    68		struct pmalloc_data *data;
    69	
    70		data = container_of(attr, struct pmalloc_data, attr_avail);
  > 71		return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
    72	}
    73	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31940 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-02-02  5:41     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-02  5:41 UTC (permalink / raw)
  To: linux-security-module

Hi Igor,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-randconfig-x071-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   mm/pmalloc.c: In function 'pmalloc_pool_show_avail':
>> mm/pmalloc.c:71:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
     return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
                          ~~^     ~~~~~~~~~~~~~~~~~~~~~~~~~~
                          %u
   mm/pmalloc.c: In function 'pmalloc_pool_show_size':
   mm/pmalloc.c:81:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
     return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
                          ~~^     ~~~~~~~~~~~~~~~~~~~~~~~~~
                          %u

vim +71 mm/pmalloc.c

    63	
    64	static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
    65					       struct kobj_attribute *attr,
    66					       char *buf)
    67	{
    68		struct pmalloc_data *data;
    69	
    70		data = container_of(attr, struct pmalloc_data, attr_avail);
  > 71		return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
    72	}
    73	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-02-02  5:41     ` kbuild test robot
  0 siblings, 0 replies; 146+ messages in thread
From: kbuild test robot @ 2018-02-02  5:41 UTC (permalink / raw)
  To: Igor Stoppa
  Cc: kbuild-all, jglisse, keescook, mhocko, labbott, hch, willy, cl,
	linux-security-module, linux-mm, linux-kernel, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

Hi Igor,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-randconfig-x071-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   mm/pmalloc.c: In function 'pmalloc_pool_show_avail':
>> mm/pmalloc.c:71:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
     return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
                          ~~^     ~~~~~~~~~~~~~~~~~~~~~~~~~~
                          %u
   mm/pmalloc.c: In function 'pmalloc_pool_show_size':
   mm/pmalloc.c:81:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
     return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
                          ~~^     ~~~~~~~~~~~~~~~~~~~~~~~~~
                          %u

vim +71 mm/pmalloc.c

    63	
    64	static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
    65					       struct kobj_attribute *attr,
    66					       char *buf)
    67	{
    68		struct pmalloc_data *data;
    69	
    70		data = container_of(attr, struct pmalloc_data, attr_avail);
  > 71		return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
    72	}
    73	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31940 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
  2018-01-30 15:14 [RFC PATCH v12 0/6] mm: security: ro protection for dynamic data Igor Stoppa
  2018-01-30 15:14   ` Igor Stoppa
  (?)
@ 2018-01-30 15:14   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-30 15:14 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 216 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 513 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 782 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 0377681..a486a26 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..ad7d557
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,216 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+#include <linux/gfp.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..e8171b6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see pmalloc.txt */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..62f69b3 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..a64ac49
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,513 @@
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-01-30 15:14   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-30 15:14 UTC (permalink / raw)
  To: linux-security-module

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 216 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 513 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 782 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 0377681..a486a26 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..ad7d557
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,216 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+#include <linux/gfp.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..e8171b6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see pmalloc.txt */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..62f69b3 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..a64ac49
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,513 @@
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-01-30 15:14   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-30 15:14 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 216 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 513 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 782 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 0377681..a486a26 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..ad7d557
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,216 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+#include <linux/gfp.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..e8171b6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see pmalloc.txt */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..62f69b3 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..a64ac49
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,513 @@
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-01-30 15:14   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-30 15:14 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 216 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 513 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 782 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 0377681..a486a26 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..ad7d557
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,216 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+#include <linux/gfp.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..e8171b6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see pmalloc.txt */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..62f69b3 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..a64ac49
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,513 @@
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
  2018-01-24 17:56   ` Igor Stoppa
@ 2018-01-26 19:41     ` Igor Stoppa
  -1 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-26 19:41 UTC (permalink / raw)
  To: linux-security-module

On 24/01/18 19:56, Igor Stoppa wrote:

[...]

> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{

[...]

> +abort:
> +	vfree(chunk);

this should be vfree_atomic()

[...]

> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{

[...]

> +free:
> +	vfree(chunk);

and this one too

I will fix them in the next iteration.
I am waiting to see if any more comments arrive.
Otherwise, I'll send it out probably next Tuesday.

--
igor
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [PATCH 4/6] Protectable Memory
@ 2018-01-26 19:41     ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-26 19:41 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel, kernel-hardening

On 24/01/18 19:56, Igor Stoppa wrote:

[...]

> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{

[...]

> +abort:
> +	vfree(chunk);

this should be vfree_atomic()

[...]

> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{

[...]

> +free:
> +	vfree(chunk);

and this one too

I will fix them in the next iteration.
I am waiting to see if any more comments arrive.
Otherwise, I'll send it out probably next Tuesday.

--
igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
  2018-01-24 17:56 [kernel-hardening] [RFC PATCH v11 0/6] mm: security: ro protection for dynamic data Igor Stoppa
@ 2018-01-24 17:56   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-24 17:56 UTC (permalink / raw)
  To: linux-security-module

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 215 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 513 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 781 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index a8fdabf..9f2974f 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..cb18739
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,215 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..116d280 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 13bc8cf..8ce616fb 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..a64ac49
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,513 @@
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 146+ messages in thread

* [PATCH 4/6] Protectable Memory
@ 2018-01-24 17:56   ` Igor Stoppa
  0 siblings, 0 replies; 146+ messages in thread
From: Igor Stoppa @ 2018-01-24 17:56 UTC (permalink / raw)
  To: jglisse, keescook, mhocko, labbott, hch, willy
  Cc: cl, linux-security-module, linux-mm, linux-kernel,
	kernel-hardening, Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 215 ++++++++++++++++++++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c           |  27 +++
 mm/Makefile              |   1 +
 mm/pmalloc.c             | 513 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/usercopy.c            |  25 ++-
 7 files changed, 781 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index a8fdabf..9f2974f 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
 		dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+				 struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
 	void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..cb18739
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,215 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *                   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+					 int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+				  size_t size, gfp_t flags)
+{
+	if (unlikely(!(pool && n && size)))
+		return NULL;
+	return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+			    size_t size, gfp_t flags)
+{
+	return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+	size_t len;
+	char *buf;
+
+	if (unlikely(pool == NULL || s == NULL))
+		return NULL;
+
+	len = strlen(s) + 1;
+	buf = pmalloc(pool, len, gfp);
+	if (likely(buf))
+		strncpy(buf, s, len);
+	return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+	gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..116d280 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_PMALLOC		0x00000100	/* pmalloc area - see docs */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 13bc8cf..8ce616fb 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
 }
 EXPORT_SYMBOL(gen_pool_free);
 
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool:	the generic memory pool
+ * @chunk:	The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+			  struct gen_pool_chunk *chunk)
+{
+	size_t size;
+
+	if (unlikely(!(pool && chunk)))
+		return;
+
+	size = chunk->end_addr + 1 - chunk->start_addr;
+	memset(chunk->entries, 0,
+	       DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+			    BITS_PER_BYTE));
+	atomic_set(&chunk->avail, size);
+}
+
+
 /**
  * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
  * @pool:	the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..a64ac49
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,513 @@
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.stoppa@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+	struct gen_pool *pool;  /* Link back to the associated pool. */
+	bool protected;     /* Status of the pool: RO or RW. */
+	struct kobj_attribute attr_protected; /* Sysfs attribute. */
+	struct kobj_attribute attr_avail;     /* Sysfs attribute. */
+	struct kobj_attribute attr_size;      /* Sysfs attribute. */
+	struct kobj_attribute attr_chunks;    /* Sysfs attribute. */
+	struct kobject *pool_kobject;
+	struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+					   struct kobj_attribute *attr,
+					   char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_protected);
+	if (data->protected)
+		return sprintf(buf, "protected\n");
+	else
+		return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+				       struct kobj_attribute *attr,
+				       char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_avail);
+	return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+				      struct kobj_attribute *attr,
+				      char *buf)
+{
+	struct pmalloc_data *data;
+
+	data = container_of(attr, struct pmalloc_data, attr_size);
+	return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+			      struct gen_pool_chunk *chunk, void *data)
+{
+	unsigned long *counter = data;
+
+	(*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	struct pmalloc_data *data;
+	unsigned long chunks_num = 0;
+
+	data = container_of(attr, struct pmalloc_data, attr_chunks);
+	gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+	return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+	struct kobject *kobj;
+
+	kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+	if (unlikely(!kobj))
+		return NULL;
+
+	if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+		kobject_put(kobj);
+		kobj = NULL;
+	}
+	return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+			       struct kobject *kobj)
+{
+	const struct attribute *attrs[] = {
+		&data->attr_protected.attr,
+		&data->attr_avail.attr,
+		&data->attr_size.attr,
+		&data->attr_chunks.attr,
+		NULL
+	};
+
+	sysfs_remove_files(kobj, attrs);
+	kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+	sysfs_attr_init(&data->attr_##attr_name.attr); \
+	data->attr_##attr_name.attr.name = #attr_name; \
+	data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+	data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+	struct gen_pool *pool;
+	const char *pool_name;
+	struct pmalloc_data *data;
+
+	if (!name) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (min_alloc_order < 0)
+		min_alloc_order = ilog2(sizeof(unsigned long));
+
+	pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+	if (unlikely(!pool))
+		return NULL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_for_each_entry(data, pmalloc_list, node)
+		if (!strcmp(name, data->pool->name))
+			goto same_name_err;
+
+	pool_name = kstrdup(name, GFP_KERNEL);
+	if (unlikely(!pool_name))
+		goto name_alloc_err;
+
+	data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+	if (unlikely(!data))
+		goto data_alloc_err;
+
+	data->protected = false;
+	data->pool = pool;
+	pmalloc_attr_init(data, protected);
+	pmalloc_attr_init(data, avail);
+	pmalloc_attr_init(data, size);
+	pmalloc_attr_init(data, chunks);
+	pool->data = data;
+	pool->name = pool_name;
+
+	list_add(&data->node, pmalloc_list);
+	if (pmalloc_list == &pmalloc_final_list)
+		data->pool_kobject = pmalloc_connect(data);
+	mutex_unlock(&pmalloc_mutex);
+	return pool;
+
+data_alloc_err:
+	kfree(pool_name);
+name_alloc_err:
+same_name_err:
+	mutex_unlock(&pmalloc_mutex);
+	gen_pool_destroy(pool);
+	return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+	struct pmalloc_data *data;
+	unsigned int order;
+
+	if (unlikely(!req_size || !pool))
+		return -1;
+
+	order = (unsigned int)pool->min_alloc_order;
+	data = pool->data;
+
+	if (data == NULL)
+		return -1;
+
+	if (unlikely(data->protected)) {
+		WARN_ON(1);
+		return -1;
+	}
+	return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+	struct vm_struct *area;
+	struct page *page;
+
+	if (!is_vmalloc_addr(chunk))
+		return false;
+
+	page = vmalloc_to_page(chunk);
+	if (unlikely(!page))
+		return false;
+
+	area = page->area;
+	if (tag)
+		area->flags |= VM_PMALLOC;
+	else
+		area->flags &= ~VM_PMALLOC;
+	return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+	return chunk_tagging(chunk, false);
+}
+
+enum {
+	INVALID_PMALLOC_OBJECT = -1,
+	NOT_PMALLOC_OBJECT = 0,
+	VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+	struct vm_struct *area;
+	struct page *page;
+	unsigned long area_start;
+	unsigned long area_end;
+	unsigned long object_start;
+	unsigned long object_end;
+
+
+	/* is_pmalloc_object gets called pretty late, so chances are high
+	 * that the object is indeed of vmalloc type
+	 */
+	if (unlikely(!is_vmalloc_addr(ptr)))
+		return NOT_PMALLOC_OBJECT;
+
+	page = vmalloc_to_page(ptr);
+	if (unlikely(!page))
+		return NOT_PMALLOC_OBJECT;
+
+	area = page->area;
+
+	if (likely(!(area->flags & VM_PMALLOC)))
+		return NOT_PMALLOC_OBJECT;
+
+	area_start = (unsigned long)area->addr;
+	area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+	object_start = (unsigned long)ptr;
+	object_end = object_start + n - 1;
+
+	if (likely((area_start <= object_start) &&
+		   (object_end <= area_end)))
+		return VALID_PMALLOC_OBJECT;
+	else
+		return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return false;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(chunk == NULL))
+		return false;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error != 0))
+		goto abort;
+
+	return true;
+abort:
+	vfree(chunk);
+	return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+	void *chunk;
+	size_t chunk_size;
+	bool add_error;
+	unsigned long retval;
+	unsigned int order;
+
+	if (check_alloc_params(pool, size))
+		return NULL;
+
+	order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+	retval = gen_pool_alloc(pool, size);
+	if (retval)
+		goto return_allocation;
+
+	if (unlikely((gfp & __GFP_ATOMIC))) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+
+	/* Expand pool */
+	chunk_size = roundup(size, PAGE_SIZE);
+	chunk = vmalloc(chunk_size);
+	if (unlikely(!chunk)) {
+		if (unlikely((gfp & __GFP_NOFAIL)))
+			goto retry_alloc_from_pool;
+		else
+			return NULL;
+	}
+	if (unlikely(!tag_chunk(chunk)))
+		goto free;
+
+	/* Locking is already done inside gen_pool_add */
+	add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+				 NUMA_NO_NODE);
+	if (unlikely(add_error))
+		goto abort;
+
+	retval = gen_pool_alloc(pool, size);
+	if (retval) {
+return_allocation:
+		*(size_t *)retval = size;
+		if (gfp & __GFP_ZERO)
+			memset((void *)retval, 0, size);
+		return (void *)retval;
+	}
+	/* Here there is no test for __GFP_NO_FAIL because, in case of
+	 * concurrent allocation, one thread might add a chunk to the
+	 * pool and this memory could be allocated by another thread,
+	 * before the first thread gets a chance to use it.
+	 * As long as vmalloc succeeds, it's ok to retry.
+	 */
+	goto retry_alloc_from_pool;
+abort:
+	untag_chunk(chunk);
+free:
+	vfree(chunk);
+	return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+					 struct gen_pool_chunk *chunk,
+					 void *data)
+{
+	const bool *flag = data;
+	size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+	unsigned long pages = chunk_size / PAGE_SIZE;
+
+	BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+	if (*flag)
+		set_memory_ro(chunk->start_addr, pages);
+	else
+		set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+	struct pmalloc_data *data;
+	struct gen_pool_chunk *chunk;
+
+	if (unlikely(!pool))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(!data))
+		return -EINVAL;
+
+	if (unlikely(data->protected == protection)) {
+		WARN_ON(1);
+		return 0;
+	}
+
+	data->protected = protection;
+	list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+		pmalloc_chunk_set_protection(pool, chunk, &protection);
+	return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+	return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+			       struct gen_pool_chunk *chunk, void *data)
+{
+	untag_chunk(chunk);
+	gen_pool_flush_chunk(pool, chunk);
+	vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+	struct pmalloc_data *data;
+
+	if (unlikely(pool == NULL))
+		return -EINVAL;
+
+	data = pool->data;
+
+	if (unlikely(data == NULL))
+		return -EINVAL;
+
+	mutex_lock(&pmalloc_mutex);
+	list_del(&data->node);
+	mutex_unlock(&pmalloc_mutex);
+
+	if (likely(data->pool_kobject))
+		pmalloc_disconnect(data, data->pool_kobject);
+
+	pmalloc_pool_set_protection(pool, false);
+	gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+	gen_pool_destroy(pool);
+	kfree(data);
+	return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+	struct pmalloc_data *data, *n;
+
+	pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+	mutex_lock(&pmalloc_mutex);
+	pmalloc_list = &pmalloc_final_list;
+
+	if (likely(pmalloc_kobject != NULL)) {
+		list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+			list_move(&data->node, &pmalloc_final_list);
+			pmalloc_connect(data);
+		}
+	}
+	mutex_unlock(&pmalloc_mutex);
+	return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/mm.h>
+#include <linux/pmalloc.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 {
 	const char *err;
+	int retv;
 
 	/* Skip all tests if size is zero. */
 	if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for invalid addresses. */
 	err = check_bogus_address(ptr, n);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad heap object. */
 	err = check_heap_object(ptr, n, to_user);
-	if (err)
+	if (unlikely(err))
 		goto report;
 
 	/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 
 	/* Check for object in kernel to avoid text exposure. */
 	err = check_kernel_text_object(ptr, n);
-	if (!err)
-		return;
+	if (unlikely(err))
+		goto report;
+
+	/* Check if object is from a pmalloc chunk.
+	 */
+	retv = is_pmalloc_object(ptr, n);
+	if (unlikely(retv)) {
+		if (unlikely(!to_user)) {
+			err = "<trying to write to pmalloc object>";
+			goto report;
+		}
+		if (retv < 0) {
+			err = "<invalid pmalloc object>";
+			goto report;
+		}
+	}
+	return;
 
 report:
 	report_usercopy(ptr, n, to_user, err);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 146+ messages in thread

end of thread, other threads:[~2018-02-22 14:20 UTC | newest]

Thread overview: 146+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-11  3:19 [RFC PATCH v15 0/6] mm: security: ro protection for dynamic data Igor Stoppa
2018-02-11  3:19 ` Igor Stoppa
2018-02-11  3:19 ` Igor Stoppa
2018-02-11  3:19 ` Igor Stoppa
2018-02-11  3:19 ` [PATCH 1/6] genalloc: track beginning of allocations Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11 12:24   ` Mike Rapoport
2018-02-11 12:24     ` Mike Rapoport
2018-02-11 12:24     ` Mike Rapoport
2018-02-12 11:17     ` Igor Stoppa
2018-02-12 11:17       ` Igor Stoppa
2018-02-12 11:17       ` Igor Stoppa
2018-02-12 11:17       ` Igor Stoppa
2018-02-12 11:36       ` Mike Rapoport
2018-02-12 11:36         ` Mike Rapoport
2018-02-12 11:36         ` Mike Rapoport
2018-02-13  0:43   ` kbuild test robot
2018-02-13  0:43     ` kbuild test robot
2018-02-13  0:43     ` kbuild test robot
2018-02-11  3:19 ` [PATCH 2/6] genalloc: selftest Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11 20:22   ` Philippe Ombredanne
2018-02-11 20:22     ` Philippe Ombredanne
2018-02-11 20:22     ` Philippe Ombredanne
2018-02-11 20:27     ` Randy Dunlap
2018-02-11 20:27       ` Randy Dunlap
2018-02-11 20:27       ` Randy Dunlap
2018-02-11 21:01       ` Matthew Wilcox
2018-02-11 21:01         ` Matthew Wilcox
2018-02-11 21:01         ` Matthew Wilcox
2018-02-11  3:19 ` [PATCH 3/6] struct page: add field for vm_struct Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11 21:16   ` Matthew Wilcox
2018-02-11 21:16     ` Matthew Wilcox
2018-02-11 21:16     ` Matthew Wilcox
2018-02-12 16:24     ` Igor Stoppa
2018-02-12 16:24       ` Igor Stoppa
2018-02-12 16:24       ` Igor Stoppa
2018-02-12 16:24       ` Igor Stoppa
2018-02-20 19:53       ` Igor Stoppa
2018-02-20 19:53         ` Igor Stoppa
2018-02-20 19:53         ` Igor Stoppa
2018-02-20 19:53         ` Igor Stoppa
2018-02-20 20:54         ` Matthew Wilcox
2018-02-20 20:54           ` Matthew Wilcox
2018-02-20 20:54           ` Matthew Wilcox
2018-02-21 12:01           ` Igor Stoppa
2018-02-21 12:01             ` Igor Stoppa
2018-02-21 12:01             ` Igor Stoppa
2018-02-21 12:01             ` Igor Stoppa
2018-02-22 14:20             ` Igor Stoppa
2018-02-22 14:20               ` Igor Stoppa
2018-02-22 14:20               ` Igor Stoppa
2018-02-22 14:20               ` Igor Stoppa
2018-02-11  3:19 ` [PATCH 4/6] Protectable Memory Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11 12:37   ` Mike Rapoport
2018-02-11 12:37     ` Mike Rapoport
2018-02-11 12:37     ` Mike Rapoport
2018-02-12 11:26     ` Igor Stoppa
2018-02-12 11:26       ` Igor Stoppa
2018-02-12 11:26       ` Igor Stoppa
2018-02-12 11:26       ` Igor Stoppa
2018-02-12 11:43       ` Mike Rapoport
2018-02-12 11:43         ` Mike Rapoport
2018-02-12 11:43         ` Mike Rapoport
2018-02-12 12:53         ` Mike Rapoport
2018-02-12 12:53           ` Mike Rapoport
2018-02-12 12:53           ` Mike Rapoport
2018-02-12 13:41           ` Igor Stoppa
2018-02-12 13:41             ` Igor Stoppa
2018-02-12 13:41             ` Igor Stoppa
2018-02-12 13:41             ` Igor Stoppa
2018-02-12 15:31             ` Mike Rapoport
2018-02-12 15:31               ` Mike Rapoport
2018-02-12 15:31               ` Mike Rapoport
2018-02-12 15:41               ` Igor Stoppa
2018-02-12 15:41                 ` Igor Stoppa
2018-02-12 15:41                 ` Igor Stoppa
2018-02-12 15:41                 ` Igor Stoppa
2018-02-11  3:19 ` [PATCH 5/6] Pmalloc: self-test Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-13  2:43   ` kbuild test robot
2018-02-13  2:43     ` kbuild test robot
2018-02-13  2:43     ` kbuild test robot
2018-02-11  3:19 ` [PATCH 6/6] Documentation for Pmalloc Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11  3:19   ` Igor Stoppa
2018-02-11 21:17   ` Matthew Wilcox
2018-02-11 21:17     ` Matthew Wilcox
2018-02-11 21:17     ` Matthew Wilcox
2018-02-12 11:28     ` Igor Stoppa
2018-02-12 11:28       ` Igor Stoppa
2018-02-12 11:28       ` Igor Stoppa
2018-02-12 11:28       ` Igor Stoppa
  -- strict thread matches above, loose matches on Subject: below --
2018-02-12 16:52 [RFC PATCH v16 0/6] mm: security: ro protection for dynamic data Igor Stoppa
2018-02-12 16:52 ` [PATCH 4/6] Protectable Memory Igor Stoppa
2018-02-12 16:52   ` Igor Stoppa
2018-02-12 16:52   ` Igor Stoppa
2018-02-12 16:52   ` Igor Stoppa
2018-02-04 16:47 [RFC PATCH v14 0/6] mm: security: ro protection for dynamic data Igor Stoppa
2018-02-04 16:47 ` [PATCH 4/6] Protectable Memory Igor Stoppa
2018-02-04 16:47   ` Igor Stoppa
2018-02-04 16:47   ` Igor Stoppa
2018-02-04 16:47   ` Igor Stoppa
2018-02-04 22:06   ` Randy Dunlap
2018-02-04 22:06     ` Randy Dunlap
2018-02-04 22:06     ` Randy Dunlap
2018-02-11  1:04     ` Igor Stoppa
2018-02-11  1:04       ` Igor Stoppa
2018-02-11  1:04       ` Igor Stoppa
2018-02-11  1:04       ` Igor Stoppa
2018-02-07 10:03   ` kbuild test robot
2018-02-07 10:03     ` kbuild test robot
2018-02-07 10:03     ` kbuild test robot
2018-02-07 22:21   ` kbuild test robot
2018-02-07 22:21     ` kbuild test robot
2018-02-07 22:21     ` kbuild test robot
2018-02-03 19:42 [RFC PATCH v13 0/6] mm: security: ro protection for dynamic data Igor Stoppa
2018-02-03 19:42 ` [PATCH 4/6] Protectable Memory Igor Stoppa
2018-02-03 19:42   ` Igor Stoppa
2018-02-03 19:42   ` Igor Stoppa
2018-02-03 19:42   ` Igor Stoppa
2018-01-30 15:14 [RFC PATCH v12 0/6] mm: security: ro protection for dynamic data Igor Stoppa
2018-01-30 15:14 ` [PATCH 4/6] Protectable Memory Igor Stoppa
2018-01-30 15:14   ` Igor Stoppa
2018-01-30 15:14   ` Igor Stoppa
2018-01-30 15:14   ` Igor Stoppa
2018-02-02  5:41   ` kbuild test robot
2018-02-02  5:41     ` kbuild test robot
2018-02-02  5:41     ` kbuild test robot
2018-02-02  5:53   ` kbuild test robot
2018-02-02  5:53     ` kbuild test robot
2018-02-02  5:53     ` kbuild test robot
2018-01-24 17:56 [kernel-hardening] [RFC PATCH v11 0/6] mm: security: ro protection for dynamic data Igor Stoppa
2018-01-24 17:56 ` [PATCH 4/6] Protectable Memory Igor Stoppa
2018-01-24 17:56   ` Igor Stoppa
2018-01-24 19:10   ` [kernel-hardening] " Jann Horn
2018-01-26  5:35     ` Matthew Wilcox
2018-02-02 18:39       ` Christopher Lameter
2018-02-03 15:38         ` Igor Stoppa
2018-02-03 19:57           ` Igor Stoppa
2018-02-03 20:12             ` Boris Lukashev
2018-02-03 20:32               ` Igor Stoppa
2018-02-03 22:29                 ` Boris Lukashev
2018-02-04 15:05                   ` Igor Stoppa
2018-02-12 23:27                     ` Kees Cook
2018-02-13  0:40                       ` Laura Abbott
2018-02-13 15:20                         ` Igor Stoppa
2018-01-26 19:41   ` Igor Stoppa
2018-01-26 19:41     ` Igor Stoppa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.