linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table)
@ 2020-05-04 14:57 Alexandre Chartre
  2020-05-04 14:57 ` [RFC v4][PATCH part-2 01/13] mm/x86: Introduce decorated page-table (dpt) Alexandre Chartre
                   ` (13 more replies)
  0 siblings, 14 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:57 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

This is part II of ASI RFC v4. Please refer to the cover letter of
part I for an overview the ASI RFC.

  https://lore.kernel.org/lkml/20200504144939.11318-1-alexandre.chartre@oracle.com/

This part introduces decorated page-table which encapsulate native page
table (e.g. a PGD) in order to provide convenient page-table management
functions, such as tracking address range mapped in a page-table or
safely handling references to another page-table.

Decorated page-table can then be used to easily create and manage page
tables to be used with ASI. It will be used by the ASI test driver (see
part III) and later by KVM ASI.

Decorated page-table is independent of ASI, and can potentially be used
anywhere a page-table is needed.

Thanks,

alex.

-----

Alexandre Chartre (13):
  mm/x86: Introduce decorated page-table (dpt)
  mm/dpt: Track buffers allocated for a decorated page-table
  mm/dpt: Add decorated page-table entry offset functions
  mm/dpt: Add decorated page-table entry allocation functions
  mm/dpt: Add decorated page-table entry set functions
  mm/dpt: Functions to populate a decorated page-table from a VA range
  mm/dpt: Helper functions to map module into a decorated page-table
  mm/dpt: Keep track of VA ranges mapped in a decorated page-table
  mm/dpt: Functions to clear decorated page-table entries for a VA range
  mm/dpt: Function to copy page-table entries for percpu buffer
  mm/dpt: Add decorated page-table remap function
  mm/dpt: Handle decorated page-table mapped range leaks and overlaps
  mm/asi: Function to init decorated page-table with ASI core mappings

 arch/x86/include/asm/asi.h |    2 +
 arch/x86/include/asm/dpt.h |   89 +++
 arch/x86/mm/Makefile       |    2 +-
 arch/x86/mm/asi.c          |   57 ++
 arch/x86/mm/dpt.c          | 1051 ++++++++++++++++++++++++++++++++++++
 5 files changed, 1200 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/dpt.h
 create mode 100644 arch/x86/mm/dpt.c

-- 
2.18.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 01/13] mm/x86: Introduce decorated page-table (dpt)
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
@ 2020-05-04 14:57 ` Alexandre Chartre
  2020-05-04 14:57 ` [RFC v4][PATCH part-2 02/13] mm/dpt: Track buffers allocated for a decorated page-table Alexandre Chartre
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:57 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

A decorated page-table (dpt) encapsulates a native page-table (e.g.
a PGD) and maintain additional attributes related to this page-table.
It aims to be the base structure for providing useful functions to
manage a page-table, such as tracking VA range mapped in a page-table
or safely handling references to another page-table.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h | 23 +++++++++++++
 arch/x86/mm/Makefile       |  2 +-
 arch/x86/mm/dpt.c          | 67 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/dpt.h
 create mode 100644 arch/x86/mm/dpt.c

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
new file mode 100644
index 000000000000..1da4d43d5e94
--- /dev/null
+++ b/arch/x86/include/asm/dpt.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ARCH_X86_MM_DPT_H
+#define ARCH_X86_MM_DPT_H
+
+#include <linux/spinlock.h>
+
+#include <asm/pgtable.h>
+
+/*
+ * A decorated page-table (dpt) encapsulates a native page-table (e.g.
+ * a PGD) and maintain additional attributes related to this page-table.
+ */
+struct dpt {
+	spinlock_t		lock;		/* protect all attributes */
+	pgd_t			*pagetable;	/* the actual page-table */
+	unsigned int		alignment;	/* page-table alignment */
+
+};
+
+extern struct dpt *dpt_create(unsigned int pgt_alignment);
+extern void dpt_destroy(struct dpt *dpt);
+
+#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index e57af263e870..5b52d854a030 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -48,7 +48,7 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)	+= pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY)			+= kaslr.o
 obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= pti.o
-obj-$(CONFIG_ADDRESS_SPACE_ISOLATION)		+= asi.o
+obj-$(CONFIG_ADDRESS_SPACE_ISOLATION)		+= asi.o dpt.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
new file mode 100644
index 000000000000..333e259c5b7f
--- /dev/null
+++ b/arch/x86/mm/dpt.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019, 2020, Oracle and/or its affiliates.
+ *
+ */
+
+#include <linux/slab.h>
+
+#include <asm/dpt.h>
+
+/*
+ * dpt_create - allocate a page-table and create a corresponding
+ * decorated page-table. The page-table is allocated and aligned
+ * at the specified alignment (pgt_alignment) which should be a
+ * multiple of PAGE_SIZE.
+ */
+struct dpt *dpt_create(unsigned int pgt_alignment)
+{
+	unsigned int alloc_order;
+	unsigned long pagetable;
+	struct dpt *dpt;
+
+	if (!IS_ALIGNED(pgt_alignment, PAGE_SIZE))
+		return NULL;
+
+	alloc_order = round_up(PAGE_SIZE + pgt_alignment,
+			       PAGE_SIZE) >> PAGE_SHIFT;
+
+	dpt = kzalloc(sizeof(*dpt), GFP_KERNEL);
+	if (!dpt)
+		return NULL;
+
+	pagetable = (unsigned long)__get_free_pages(GFP_KERNEL_ACCOUNT |
+						    __GFP_ZERO,
+						    alloc_order);
+	if (!pagetable) {
+		kfree(dpt);
+		return NULL;
+	}
+	dpt->pagetable = (pgd_t *)(pagetable + pgt_alignment);
+	dpt->alignment = pgt_alignment;
+
+	spin_lock_init(&dpt->lock);
+
+	return dpt;
+}
+EXPORT_SYMBOL(dpt_create);
+
+void dpt_destroy(struct dpt *dpt)
+{
+	unsigned int pgt_alignment;
+	unsigned int alloc_order;
+
+	if (!dpt)
+		return;
+
+	if (dpt->pagetable) {
+		pgt_alignment = dpt->alignment;
+		alloc_order = round_up(PAGE_SIZE + pgt_alignment,
+				       PAGE_SIZE) >> PAGE_SHIFT;
+		free_pages((unsigned long)(dpt->pagetable) - pgt_alignment,
+			   alloc_order);
+	}
+
+	kfree(dpt);
+}
+EXPORT_SYMBOL(dpt_destroy);
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 02/13] mm/dpt: Track buffers allocated for a decorated page-table
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
  2020-05-04 14:57 ` [RFC v4][PATCH part-2 01/13] mm/x86: Introduce decorated page-table (dpt) Alexandre Chartre
@ 2020-05-04 14:57 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 03/13] mm/dpt: Add decorated page-table entry offset functions Alexandre Chartre
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:57 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Add functions to track buffers allocated for a decorated page-table.
A page-table can have direct references to the kernel page table, at
different levels (PGD, P4D, PUD, PMD). When freeing a page-table, we
should make sure that we free parts actually allocated for the decorated
page-table, and not parts of the kernel page table referenced from the
page-table. To do so, we will keep track of buffers when building the
page-table.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h | 21 ++++++++++
 arch/x86/mm/dpt.c          | 82 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 1da4d43d5e94..b9cba051ebf2 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -3,9 +3,18 @@
 #define ARCH_X86_MM_DPT_H
 
 #include <linux/spinlock.h>
+#include <linux/xarray.h>
 
 #include <asm/pgtable.h>
 
+enum page_table_level {
+	PGT_LEVEL_PTE,
+	PGT_LEVEL_PMD,
+	PGT_LEVEL_PUD,
+	PGT_LEVEL_P4D,
+	PGT_LEVEL_PGD
+};
+
 /*
  * A decorated page-table (dpt) encapsulates a native page-table (e.g.
  * a PGD) and maintain additional attributes related to this page-table.
@@ -15,6 +24,18 @@ struct dpt {
 	pgd_t			*pagetable;	/* the actual page-table */
 	unsigned int		alignment;	/* page-table alignment */
 
+	/*
+	 * A page-table can have direct references to another page-table,
+	 * at different levels (PGD, P4D, PUD, PMD). When freeing or
+	 * modifying a page-table, we should make sure that we free/modify
+	 * parts effectively allocated to the actual page-table, and not
+	 * parts of another page-table referenced from this page-table.
+	 *
+	 * To do so, the backend_pages XArray is used to keep track of pages
+	 * used for this page-table.
+	 */
+	struct xarray		backend_pages;		/* page-table pages */
+	unsigned long		backend_pages_count;	/* pages count */
 };
 
 extern struct dpt *dpt_create(unsigned int pgt_alignment);
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 333e259c5b7f..6df2d4fde8ec 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -8,6 +8,80 @@
 
 #include <asm/dpt.h>
 
+/*
+ * Get the pointer to the beginning of a page table directory from a page
+ * table directory entry.
+ */
+#define DPT_BACKEND_PAGE_ALIGN(entry)	\
+	((typeof(entry))(((unsigned long)(entry)) & PAGE_MASK))
+
+/*
+ * Pages used to build a page-table are stored in the backend_pages XArray.
+ * Each entry in the array is a logical OR of the page address and the page
+ * table level (PTE, PMD, PUD, P4D) this page is used for in the page-table.
+ *
+ * As a page address is aligned with PAGE_SIZE, we have plenty of space
+ * for storing the page table level (which is a value between 0 and 4) in
+ * the low bits of the page address.
+ *
+ */
+
+#define DPT_BACKEND_PAGE_ENTRY(addr, level)	\
+	((typeof(addr))(((unsigned long)(addr)) | ((unsigned long)(level))))
+#define DPT_BACKEND_PAGE_ADDR(entry)		\
+	((void *)(((unsigned long)(entry)) & PAGE_MASK))
+#define DPT_BACKEND_PAGE_LEVEL(entry)		\
+	((enum page_table_level)(((unsigned long)(entry)) & ~PAGE_MASK))
+
+static int dpt_add_backend_page(struct dpt *dpt, void *addr,
+				enum page_table_level level)
+{
+	unsigned long index;
+	void *old_entry;
+
+	if ((!addr) || ((unsigned long)addr) & ~PAGE_MASK)
+		return -EINVAL;
+
+	lockdep_assert_held(&dpt->lock);
+	index = dpt->backend_pages_count;
+
+	old_entry = xa_store(&dpt->backend_pages, index,
+			     DPT_BACKEND_PAGE_ENTRY(addr, level),
+			     GFP_KERNEL);
+	if (xa_is_err(old_entry))
+		return xa_err(old_entry);
+	if (old_entry)
+		return -EBUSY;
+
+	dpt->backend_pages_count++;
+
+	return 0;
+}
+
+/*
+ * Check if an offset in the page-table is valid, i.e. check that the
+ * offset is on a page effectively belonging to the page-table.
+ */
+static bool dpt_valid_offset(struct dpt *dpt, void *offset)
+{
+	unsigned long index;
+	void *addr, *entry;
+	bool valid;
+
+	addr = DPT_BACKEND_PAGE_ALIGN(offset);
+	valid = false;
+
+	lockdep_assert_held(&dpt->lock);
+	xa_for_each(&dpt->backend_pages, index, entry) {
+		if (DPT_BACKEND_PAGE_ADDR(entry) == addr) {
+			valid = true;
+			break;
+		}
+	}
+
+	return valid;
+}
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
@@ -41,6 +115,7 @@ struct dpt *dpt_create(unsigned int pgt_alignment)
 	dpt->alignment = pgt_alignment;
 
 	spin_lock_init(&dpt->lock);
+	xa_init(&dpt->backend_pages);
 
 	return dpt;
 }
@@ -50,10 +125,17 @@ void dpt_destroy(struct dpt *dpt)
 {
 	unsigned int pgt_alignment;
 	unsigned int alloc_order;
+	unsigned long index;
+	void *entry;
 
 	if (!dpt)
 		return;
 
+	if (dpt->backend_pages_count) {
+		xa_for_each(&dpt->backend_pages, index, entry)
+			free_page((unsigned long)DPT_BACKEND_PAGE_ADDR(entry));
+	}
+
 	if (dpt->pagetable) {
 		pgt_alignment = dpt->alignment;
 		alloc_order = round_up(PAGE_SIZE + pgt_alignment,
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 03/13] mm/dpt: Add decorated page-table entry offset functions
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
  2020-05-04 14:57 ` [RFC v4][PATCH part-2 01/13] mm/x86: Introduce decorated page-table (dpt) Alexandre Chartre
  2020-05-04 14:57 ` [RFC v4][PATCH part-2 02/13] mm/dpt: Track buffers allocated for a decorated page-table Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 04/13] mm/dpt: Add decorated page-table entry allocation functions Alexandre Chartre
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Add wrappers around the p4d/pud/pmd/pte offset kernel functions which
ensure that page-table pointers are in the specified decorated page-table.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/mm/dpt.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 6df2d4fde8ec..44aad99bc3dc 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -82,6 +82,72 @@ static bool dpt_valid_offset(struct dpt *dpt, void *offset)
 	return valid;
 }
 
+/*
+ * dpt_pXX_offset() functions are equivalent to kernel pXX_offset()
+ * functions but, in addition, they ensure that page table pointers
+ * are in the specified decorated page table. Otherwise an error is
+ * returned.
+ */
+
+static pte_t *dpt_pte_offset(struct dpt *dpt,
+			     pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+
+	pte = pte_offset_map(pmd, addr);
+	if (!dpt_valid_offset(dpt, pte)) {
+		pr_err("DPT %p: PTE %px not found\n", dpt, pte);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return pte;
+}
+
+static pmd_t *dpt_pmd_offset(struct dpt *dpt,
+			     pud_t *pud, unsigned long addr)
+{
+	pmd_t *pmd;
+
+	pmd = pmd_offset(pud, addr);
+	if (!dpt_valid_offset(dpt, pmd)) {
+		pr_err("DPT %p: PMD %px not found\n", dpt, pmd);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return pmd;
+}
+
+static pud_t *dpt_pud_offset(struct dpt *dpt,
+			     p4d_t *p4d, unsigned long addr)
+{
+	pud_t *pud;
+
+	pud = pud_offset(p4d, addr);
+	if (!dpt_valid_offset(dpt, pud)) {
+		pr_err("DPT %p: PUD %px not found\n", dpt, pud);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return pud;
+}
+
+static p4d_t *dpt_p4d_offset(struct dpt *dpt,
+			     pgd_t *pgd, unsigned long addr)
+{
+	p4d_t *p4d;
+
+	p4d = p4d_offset(pgd, addr);
+	/*
+	 * p4d is the same has pgd if we don't have a 5-level page table.
+	 */
+	if ((p4d != (p4d_t *)pgd) && !dpt_valid_offset(dpt, p4d)) {
+		pr_err("DPT %p: P4D %px not found\n", dpt, p4d);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return p4d;
+}
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 04/13] mm/dpt: Add decorated page-table entry allocation functions
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (2 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 03/13] mm/dpt: Add decorated page-table entry offset functions Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 05/13] mm/dpt: Add decorated page-table entry set functions Alexandre Chartre
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Add functions to allocate p4d/pud/pmd/pte pages for an decorated
page-table and keep track of them.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/mm/dpt.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 44aad99bc3dc..a2f54ba00255 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -4,6 +4,7 @@
  *
  */
 
+#include <linux/mm.h>
 #include <linux/slab.h>
 
 #include <asm/dpt.h>
@@ -148,6 +149,115 @@ static p4d_t *dpt_p4d_offset(struct dpt *dpt,
 	return p4d;
 }
 
+/*
+ * dpt_pXX_alloc() functions are equivalent to kernel pXX_alloc() functions
+ * but, in addition, they keep track of new pages allocated for the specified
+ * decorated page-table.
+ */
+
+static pte_t *dpt_pte_alloc(struct dpt *dpt, pmd_t *pmd, unsigned long addr)
+{
+	struct page *page;
+	pte_t *pte;
+	int err;
+
+	if (pmd_none(*pmd)) {
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return ERR_PTR(-ENOMEM);
+		pte = (pte_t *)page_address(page);
+		err = dpt_add_backend_page(dpt, pte, PGT_LEVEL_PTE);
+		if (err) {
+			free_page((unsigned long)pte);
+			return ERR_PTR(err);
+		}
+		set_pmd_safe(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+		pte = pte_offset_map(pmd, addr);
+	} else {
+		pte = dpt_pte_offset(dpt, pmd,  addr);
+	}
+
+	return pte;
+}
+
+static pmd_t *dpt_pmd_alloc(struct dpt *dpt, pud_t *pud, unsigned long addr)
+{
+	struct page *page;
+	pmd_t *pmd;
+	int err;
+
+	if (pud_none(*pud)) {
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return ERR_PTR(-ENOMEM);
+		pmd = (pmd_t *)page_address(page);
+		err = dpt_add_backend_page(dpt, pmd, PGT_LEVEL_PMD);
+		if (err) {
+			free_page((unsigned long)pmd);
+			return ERR_PTR(err);
+		}
+		set_pud_safe(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+		pmd = pmd_offset(pud, addr);
+	} else {
+		pmd = dpt_pmd_offset(dpt, pud, addr);
+	}
+
+	return pmd;
+}
+
+static pud_t *dpt_pud_alloc(struct dpt *dpt, p4d_t *p4d, unsigned long addr)
+{
+	struct page *page;
+	pud_t *pud;
+	int err;
+
+	if (p4d_none(*p4d)) {
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return ERR_PTR(-ENOMEM);
+		pud = (pud_t *)page_address(page);
+		err = dpt_add_backend_page(dpt, pud, PGT_LEVEL_PUD);
+		if (err) {
+			free_page((unsigned long)pud);
+			return ERR_PTR(err);
+		}
+		set_p4d_safe(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
+		pud = pud_offset(p4d, addr);
+	} else {
+		pud = dpt_pud_offset(dpt, p4d, addr);
+	}
+
+	return pud;
+}
+
+static p4d_t *dpt_p4d_alloc(struct dpt *dpt, pgd_t *pgd, unsigned long addr)
+{
+	struct page *page;
+	p4d_t *p4d;
+	int err;
+
+	if (!pgtable_l5_enabled())
+		return (p4d_t *)pgd;
+
+	if (pgd_none(*pgd)) {
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return ERR_PTR(-ENOMEM);
+		p4d = (p4d_t *)page_address(page);
+		err = dpt_add_backend_page(dpt, p4d, PGT_LEVEL_P4D);
+		if (err) {
+			free_page((unsigned long)p4d);
+			return ERR_PTR(err);
+		}
+		set_pgd_safe(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
+		p4d = p4d_offset(pgd, addr);
+	} else {
+		p4d = dpt_p4d_offset(dpt, pgd, addr);
+	}
+
+	return p4d;
+}
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 05/13] mm/dpt: Add decorated page-table entry set functions
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (3 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 04/13] mm/dpt: Add decorated page-table entry allocation functions Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 06/13] mm/dpt: Functions to populate a decorated page-table from a VA range Alexandre Chartre
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Add wrappers around the page table entry (pgd/p4d/pud/pmd) set
functions which check that an existing entry is not being
overwritten.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/mm/dpt.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index a2f54ba00255..7a1b4cd53b03 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -258,6 +258,132 @@ static p4d_t *dpt_p4d_alloc(struct dpt *dpt, pgd_t *pgd, unsigned long addr)
 	return p4d;
 }
 
+/*
+ * dpt_set_pXX() functions are equivalent to kernel set_pXX() functions
+ * but, in addition, they ensure that they are not overwriting an already
+ * existing reference in the decorated page table. Otherwise an error is
+ * returned.
+ */
+
+static int dpt_set_pte(struct dpt *dpt, pte_t *pte, pte_t pte_value)
+{
+#ifdef DEBUG
+	/*
+	 * The pte pointer should come from dpt_pte_alloc() or dpt_pte_offset()
+	 * both of which check if the pointer is in the decorated page table.
+	 * So this is a paranoid check to ensure the pointer is really in the
+	 * decorated page table.
+	 */
+	if (!dpt_valid_offset(dpt, pte)) {
+		pr_err("DPT %p: PTE %px not found\n", dpt, pte);
+		return -EINVAL;
+	}
+#endif
+	set_pte(pte, pte_value);
+
+	return 0;
+}
+
+static int dpt_set_pmd(struct dpt *dpt, pmd_t *pmd, pmd_t pmd_value)
+{
+#ifdef DEBUG
+	/*
+	 * The pmd pointer should come from dpt_pmd_alloc() or dpt_pmd_offset()
+	 * both of which check if the pointer is in the decorated page table.
+	 * So this is a paranoid check to ensure the pointer is really in the
+	 * decorated page table.
+	 */
+	if (!dpt_valid_offset(dpt, pmd)) {
+		pr_err("DPT %p: PMD %px not found\n", dpt, pmd);
+		return -EINVAL;
+	}
+#endif
+	if (pmd_val(*pmd) == pmd_val(pmd_value))
+		return 0;
+
+	if (!pmd_none(*pmd)) {
+		pr_err("DPT %p: PMD %px overwriting %lx with %lx\n",
+		       dpt, pmd, pmd_val(*pmd), pmd_val(pmd_value));
+		return -EBUSY;
+	}
+
+	set_pmd(pmd, pmd_value);
+
+	return 0;
+}
+
+static int dpt_set_pud(struct dpt *dpt, pud_t *pud, pud_t pud_value)
+{
+#ifdef DEBUG
+	/*
+	 * The pud pointer should come from dpt_pud_alloc() or dpt_pud_offset()
+	 * both of which check if the pointer is in the decorated page table.
+	 * So this is a paranoid check to ensure the pointer is really in the
+	 * decorated page table.
+	 */
+	if (!dpt_valid_offset(dpt, pud)) {
+		pr_err("DPT %p: PUD %px not found\n", dpt, pud);
+		return -EINVAL;
+	}
+#endif
+	if (pud_val(*pud) == pud_val(pud_value))
+		return 0;
+
+	if (!pud_none(*pud)) {
+		pr_err("DPT %p: PUD %px overwriting %lx with %lx\n",
+		       dpt, pud, pud_val(*pud), pud_val(pud_value));
+		return -EBUSY;
+	}
+
+	set_pud(pud, pud_value);
+
+	return 0;
+}
+
+static int dpt_set_p4d(struct dpt *dpt, p4d_t *p4d, p4d_t p4d_value)
+{
+#ifdef DEBUG
+	/*
+	 * The p4d pointer should come from dpt_p4d_alloc() or dpt_p4d_offset()
+	 * both of which check if the pointer is in the decorated page table.
+	 * So this is a paranoid check to ensure the pointer is really in the
+	 * decorated page table.
+	 */
+	if (!dpt_valid_offset(dpt, p4d)) {
+		pr_err("DPT %p: P4D %px not found\n", dpt, p4d);
+		return -EINVAL;
+	}
+#endif
+	if (p4d_val(*p4d) == p4d_val(p4d_value))
+		return 0;
+
+	if (!p4d_none(*p4d)) {
+		pr_err("DPT %p: P4D %px overwriting %lx with %lx\n",
+		       dpt, p4d, p4d_val(*p4d), p4d_val(p4d_value));
+		return -EBUSY;
+	}
+
+	set_p4d(p4d, p4d_value);
+
+	return 0;
+}
+
+static int dpt_set_pgd(struct dpt *dpt, pgd_t *pgd, pgd_t pgd_value)
+{
+	if (pgd_val(*pgd) == pgd_val(pgd_value))
+		return 0;
+
+	if (!pgd_none(*pgd)) {
+		pr_err("DPT %p: PGD %px overwriting %lx with %lx\n",
+		       dpt, pgd, pgd_val(*pgd), pgd_val(pgd_value));
+		return -EBUSY;
+	}
+
+	set_pgd(pgd, pgd_value);
+
+	return 0;
+}
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 06/13] mm/dpt: Functions to populate a decorated page-table from a VA range
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (4 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 05/13] mm/dpt: Add decorated page-table entry set functions Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 07/13] mm/dpt: Helper functions to map module into a decorated page-table Alexandre Chartre
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Provide functions to copy page-table entries from the kernel page-table
to a decorated page-table for a specified VA range. These functions are
based on the copy_pxx_range() functions defined in mm/memory.c. A first
difference is that a level parameter can be specified to indicate the
page-table level (PGD, P4D, PUD PMD, PTE) at which the copy should be
done. Also functions don't rely on mm or vma, and they don't alter the
source page-table even if an entry is bad. Finally, the VA range start
and size don't need to be page-aligned.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h |   3 +
 arch/x86/mm/dpt.c          | 205 +++++++++++++++++++++++++++++++++++++
 2 files changed, 208 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index b9cba051ebf2..85d2c5051acb 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -40,5 +40,8 @@ struct dpt {
 
 extern struct dpt *dpt_create(unsigned int pgt_alignment);
 extern void dpt_destroy(struct dpt *dpt);
+extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
+			 enum page_table_level level);
+extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);
 
 #endif
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 7a1b4cd53b03..0e725344b921 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -384,6 +384,211 @@ static int dpt_set_pgd(struct dpt *dpt, pgd_t *pgd, pgd_t pgd_value)
 	return 0;
 }
 
+static int dpt_copy_pte_range(struct dpt *dpt, pmd_t *dst_pmd, pmd_t *src_pmd,
+			      unsigned long addr, unsigned long end)
+{
+	pte_t *src_pte, *dst_pte;
+
+	dst_pte = dpt_pte_alloc(dpt, dst_pmd, addr);
+	if (IS_ERR(dst_pte))
+		return PTR_ERR(dst_pte);
+
+	addr &= PAGE_MASK;
+	src_pte = pte_offset_map(src_pmd, addr);
+
+	do {
+		dpt_set_pte(dpt, dst_pte, *src_pte);
+
+	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr < end);
+
+	return 0;
+}
+
+static int dpt_copy_pmd_range(struct dpt *dpt, pud_t *dst_pud, pud_t *src_pud,
+			      unsigned long addr, unsigned long end,
+			      enum page_table_level level)
+{
+	pmd_t *src_pmd, *dst_pmd;
+	unsigned long next;
+	int err;
+
+	dst_pmd = dpt_pmd_alloc(dpt, dst_pud, addr);
+	if (IS_ERR(dst_pmd))
+		return PTR_ERR(dst_pmd);
+
+	src_pmd = pmd_offset(src_pud, addr);
+
+	do {
+		next = pmd_addr_end(addr, end);
+		if (level == PGT_LEVEL_PMD || pmd_none(*src_pmd) ||
+		    pmd_trans_huge(*src_pmd) || pmd_devmap(*src_pmd)) {
+			err = dpt_set_pmd(dpt, dst_pmd, *src_pmd);
+			if (err)
+				return err;
+			continue;
+		}
+
+		if (!pmd_present(*src_pmd)) {
+			pr_warn("DPT %p: PMD not present for [%lx,%lx]\n",
+				dpt, addr, next - 1);
+			pmd_clear(dst_pmd);
+			continue;
+		}
+
+		err = dpt_copy_pte_range(dpt, dst_pmd, src_pmd, addr, next);
+		if (err) {
+			pr_err("DPT %p: PMD error copying PTE addr=%lx next=%lx\n",
+			       dpt, addr, next);
+			return err;
+		}
+
+	} while (dst_pmd++, src_pmd++, addr = next, addr < end);
+
+	return 0;
+}
+
+static int dpt_copy_pud_range(struct dpt *dpt, p4d_t *dst_p4d, p4d_t *src_p4d,
+			      unsigned long addr, unsigned long end,
+			      enum page_table_level level)
+{
+	pud_t *src_pud, *dst_pud;
+	unsigned long next;
+	int err;
+
+	dst_pud = dpt_pud_alloc(dpt, dst_p4d, addr);
+	if (IS_ERR(dst_pud))
+		return PTR_ERR(dst_pud);
+
+	src_pud = pud_offset(src_p4d, addr);
+
+	do {
+		next = pud_addr_end(addr, end);
+		if (level == PGT_LEVEL_PUD || pud_none(*src_pud) ||
+		    pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) {
+			err = dpt_set_pud(dpt, dst_pud, *src_pud);
+			if (err)
+				return err;
+			continue;
+		}
+
+		err = dpt_copy_pmd_range(dpt, dst_pud, src_pud, addr, next,
+					 level);
+		if (err) {
+			pr_err("DPT %p: PUD error copying PMD addr=%lx next=%lx\n",
+			       dpt, addr, next);
+			return err;
+		}
+
+	} while (dst_pud++, src_pud++, addr = next, addr < end);
+
+	return 0;
+}
+
+static int dpt_copy_p4d_range(struct dpt *dpt, pgd_t *dst_pgd, pgd_t *src_pgd,
+			      unsigned long addr, unsigned long end,
+			      enum page_table_level level)
+{
+	p4d_t *src_p4d, *dst_p4d;
+	unsigned long next;
+	int err;
+
+	dst_p4d = dpt_p4d_alloc(dpt, dst_pgd, addr);
+	if (IS_ERR(dst_p4d))
+		return PTR_ERR(dst_p4d);
+
+	src_p4d = p4d_offset(src_pgd, addr);
+
+	do {
+		next = p4d_addr_end(addr, end);
+		if (level == PGT_LEVEL_P4D || p4d_none(*src_p4d)) {
+			err = dpt_set_p4d(dpt, dst_p4d, *src_p4d);
+			if (err)
+				return err;
+			continue;
+		}
+
+		err = dpt_copy_pud_range(dpt, dst_p4d, src_p4d, addr, next,
+					 level);
+		if (err) {
+			pr_err("DPT %p: P4D error copying PUD addr=%lx next=%lx\n",
+			       dpt, addr, next);
+			return err;
+		}
+
+	} while (dst_p4d++, src_p4d++, addr = next, addr < end);
+
+	return 0;
+}
+
+static int dpt_copy_pgd_range(struct dpt *dpt,
+			      pgd_t *dst_pagetable, pgd_t *src_pagetable,
+			      unsigned long addr, unsigned long end,
+			      enum page_table_level level)
+{
+	pgd_t *src_pgd, *dst_pgd;
+	unsigned long next;
+	int err;
+
+	dst_pgd = pgd_offset_pgd(dst_pagetable, addr);
+	src_pgd = pgd_offset_pgd(src_pagetable, addr);
+
+	do {
+		next = pgd_addr_end(addr, end);
+		if (level == PGT_LEVEL_PGD || pgd_none(*src_pgd)) {
+			err = dpt_set_pgd(dpt, dst_pgd, *src_pgd);
+			if (err)
+				return err;
+			continue;
+		}
+
+		err = dpt_copy_p4d_range(dpt, dst_pgd, src_pgd, addr, next,
+					 level);
+		if (err) {
+			pr_err("DPT %p: PGD error copying P4D addr=%lx next=%lx\n",
+			       dpt, addr, next);
+			return err;
+		}
+
+	} while (dst_pgd++, src_pgd++, addr = next, addr < end);
+
+	return 0;
+}
+
+/*
+ * Copy page table entries from the current page table (i.e. from the
+ * kernel page table) to the specified decorated page-table. The level
+ * parameter specifies the page-table level (PGD, P4D, PUD PMD, PTE)
+ * at which the copy should be done.
+ */
+int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
+		  enum page_table_level level)
+{
+	unsigned long addr = (unsigned long)ptr;
+	unsigned long end = addr + ((unsigned long)size);
+	unsigned long flags;
+	int err;
+
+	pr_debug("DPT %p: MAP %px/%lx/%d\n", dpt, ptr, size, level);
+
+	spin_lock_irqsave(&dpt->lock, flags);
+	err = dpt_copy_pgd_range(dpt, dpt->pagetable, current->mm->pgd,
+				 addr, end, level);
+	spin_unlock_irqrestore(&dpt->lock, flags);
+
+	return err;
+}
+EXPORT_SYMBOL(dpt_map_range);
+
+/*
+ * Copy page-table PTE entries from the current page-table to the
+ * specified decorated page-table.
+ */
+int dpt_map(struct dpt *dpt, void *ptr, unsigned long size)
+{
+	return dpt_map_range(dpt, ptr, size, PGT_LEVEL_PTE);
+}
+EXPORT_SYMBOL(dpt_map);
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 07/13] mm/dpt: Helper functions to map module into a decorated page-table
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (5 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 06/13] mm/dpt: Functions to populate a decorated page-table from a VA range Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 08/13] mm/dpt: Keep track of VA ranges mapped in " Alexandre Chartre
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Add helper functions to easily map a module into a decorated page-table.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 85d2c5051acb..5a38d97a70a8 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -2,6 +2,7 @@
 #ifndef ARCH_X86_MM_DPT_H
 #define ARCH_X86_MM_DPT_H
 
+#include <linux/module.h>
 #include <linux/spinlock.h>
 #include <linux/xarray.h>
 
@@ -44,4 +45,24 @@ extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
 			 enum page_table_level level);
 extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);
 
+static inline int dpt_map_module(struct dpt *dpt, char *module_name)
+{
+	struct module *module;
+
+	module = find_module(module_name);
+	if (!module)
+		return -ESRCH;
+
+	return dpt_map(dpt, module->core_layout.base, module->core_layout.size);
+}
+
+/*
+ * Copy the memory mapping for the current module. This is defined as a
+ * macro to ensure it is expanded in the module making the call so that
+ * THIS_MODULE has the correct value.
+ */
+#define DPT_MAP_THIS_MODULE(dpt)			\
+	(dpt_map(dpt, THIS_MODULE->core_layout.base,	\
+		 THIS_MODULE->core_layout.size))
+
 #endif
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 08/13] mm/dpt: Keep track of VA ranges mapped in a decorated page-table
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (6 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 07/13] mm/dpt: Helper functions to map module into a decorated page-table Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 09/13] mm/dpt: Functions to clear decorated page-table entries for a VA range Alexandre Chartre
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Add functions to keep track of VA ranges mapped in a decorated page-table.
This will be used when unmapping to ensure the same range is unmapped,
at the same page-table level. This will also be used to handle mapping
and unmapping of overlapping VA ranges.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h | 12 ++++++++
 arch/x86/mm/dpt.c          | 60 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 5a38d97a70a8..0d74afb10141 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -16,6 +16,17 @@ enum page_table_level {
 	PGT_LEVEL_PGD
 };
 
+/*
+ * Structure to keep track of address ranges mapped into a decorated
+ * page-table.
+ */
+struct dpt_range_mapping {
+	struct list_head list;
+	void *ptr;			/* range start address */
+	size_t size;			/* range size */
+	enum page_table_level level;	/* mapping level */
+};
+
 /*
  * A decorated page-table (dpt) encapsulates a native page-table (e.g.
  * a PGD) and maintain additional attributes related to this page-table.
@@ -24,6 +35,7 @@ struct dpt {
 	spinlock_t		lock;		/* protect all attributes */
 	pgd_t			*pagetable;	/* the actual page-table */
 	unsigned int		alignment;	/* page-table alignment */
+	struct list_head	mapping_list;	/* list of VA range mapping */
 
 	/*
 	 * A page-table can have direct references to another page-table,
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 0e725344b921..12eb0d794d84 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -59,6 +59,24 @@ static int dpt_add_backend_page(struct dpt *dpt, void *addr,
 	return 0;
 }
 
+/*
+ * Return the range mapping starting at the specified address, or NULL if
+ * no such range is found.
+ */
+static struct dpt_range_mapping *dpt_get_range_mapping(struct dpt *dpt,
+						       void *ptr)
+{
+	struct dpt_range_mapping *range;
+
+	lockdep_assert_held(&dpt->lock);
+	list_for_each_entry(range, &dpt->mapping_list, list) {
+		if (range->ptr == ptr)
+			return range;
+	}
+
+	return NULL;
+}
+
 /*
  * Check if an offset in the page-table is valid, i.e. check that the
  * offset is on a page effectively belonging to the page-table.
@@ -563,6 +581,7 @@ static int dpt_copy_pgd_range(struct dpt *dpt,
 int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
 		  enum page_table_level level)
 {
+	struct dpt_range_mapping *range_mapping;
 	unsigned long addr = (unsigned long)ptr;
 	unsigned long end = addr + ((unsigned long)size);
 	unsigned long flags;
@@ -571,8 +590,36 @@ int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
 	pr_debug("DPT %p: MAP %px/%lx/%d\n", dpt, ptr, size, level);
 
 	spin_lock_irqsave(&dpt->lock, flags);
+
+	/* check if the range is already mapped */
+	range_mapping = dpt_get_range_mapping(dpt, ptr);
+	if (range_mapping) {
+		pr_debug("DPT %p: MAP %px/%lx/%d already mapped\n",
+			 dpt, ptr, size, level);
+		err = -EBUSY;
+		goto done;
+	}
+
+	/* map new range */
+	range_mapping = kmalloc(sizeof(*range_mapping), GFP_KERNEL);
+	if (!range_mapping) {
+		err = -ENOMEM;
+		goto done;
+	}
+
 	err = dpt_copy_pgd_range(dpt, dpt->pagetable, current->mm->pgd,
 				 addr, end, level);
+	if (err) {
+		kfree(range_mapping);
+		goto done;
+	}
+
+	INIT_LIST_HEAD(&range_mapping->list);
+	range_mapping->ptr = ptr;
+	range_mapping->size = size;
+	range_mapping->level = level;
+	list_add(&range_mapping->list, &dpt->mapping_list);
+done:
 	spin_unlock_irqrestore(&dpt->lock, flags);
 
 	return err;
@@ -611,6 +658,8 @@ struct dpt *dpt_create(unsigned int pgt_alignment)
 	if (!dpt)
 		return NULL;
 
+	INIT_LIST_HEAD(&dpt->mapping_list);
+
 	pagetable = (unsigned long)__get_free_pages(GFP_KERNEL_ACCOUNT |
 						    __GFP_ZERO,
 						    alloc_order);
@@ -632,6 +681,7 @@ void dpt_destroy(struct dpt *dpt)
 {
 	unsigned int pgt_alignment;
 	unsigned int alloc_order;
+	struct dpt_range_mapping *range, *range_next;
 	unsigned long index;
 	void *entry;
 
@@ -643,6 +693,16 @@ void dpt_destroy(struct dpt *dpt)
 			free_page((unsigned long)DPT_BACKEND_PAGE_ADDR(entry));
 	}
 
+	list_for_each_entry_safe(range, range_next, &dpt->mapping_list, list) {
+		list_del(&range->list);
+		kfree(range);
+	}
+
+	if (dpt->backend_pages_count) {
+		xa_for_each(&dpt->backend_pages, index, entry)
+			free_page((unsigned long)DPT_BACKEND_PAGE_ADDR(entry));
+	}
+
 	if (dpt->pagetable) {
 		pgt_alignment = dpt->alignment;
 		alloc_order = round_up(PAGE_SIZE + pgt_alignment,
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 09/13] mm/dpt: Functions to clear decorated page-table entries for a VA range
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (7 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 08/13] mm/dpt: Keep track of VA ranges mapped in " Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 10/13] mm/dpt: Function to copy page-table entries for percpu buffer Alexandre Chartre
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Provide functions to clear page-table entries in a decorated page-table
for a specified VA range. Functions also check that the clearing
effectively happens in the decorated page-table and there is no crossing
of the decorated page-table boundary (through references to another page
table), so that another page table is not modified by mistake.

As information (address, size, page-table level) about VA ranges mapped
to the decorated page-table is tracked, clearing is done with just
specifying the start address of the range.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h |   1 +
 arch/x86/mm/dpt.c          | 135 +++++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 0d74afb10141..01727ef0577e 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -56,6 +56,7 @@ extern void dpt_destroy(struct dpt *dpt);
 extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
 			 enum page_table_level level);
 extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);
+extern void dpt_unmap(struct dpt *dpt, void *ptr);
 
 static inline int dpt_map_module(struct dpt *dpt, char *module_name)
 {
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 12eb0d794d84..c495c9b59b3e 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -636,6 +636,141 @@ int dpt_map(struct dpt *dpt, void *ptr, unsigned long size)
 }
 EXPORT_SYMBOL(dpt_map);
 
+static void dpt_clear_pte_range(struct dpt *dpt, pmd_t *pmd,
+				unsigned long addr, unsigned long end)
+{
+	pte_t *pte;
+
+	pte = dpt_pte_offset(dpt, pmd, addr);
+	if (IS_ERR(pte))
+		return;
+
+	do {
+		pte_clear(NULL, addr, pte);
+	} while (pte++, addr += PAGE_SIZE, addr < end);
+}
+
+static void dpt_clear_pmd_range(struct dpt *dpt, pud_t *pud,
+				unsigned long addr, unsigned long end,
+				enum page_table_level level)
+{
+	unsigned long next;
+	pmd_t *pmd;
+
+	pmd = dpt_pmd_offset(dpt, pud, addr);
+	if (IS_ERR(pmd))
+		return;
+
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(*pmd))
+			continue;
+		if (level == PGT_LEVEL_PMD || pmd_trans_huge(*pmd) ||
+		    pmd_devmap(*pmd) || !pmd_present(*pmd)) {
+			pmd_clear(pmd);
+			continue;
+		}
+		dpt_clear_pte_range(dpt, pmd, addr, next);
+	} while (pmd++, addr = next, addr < end);
+}
+
+static void dpt_clear_pud_range(struct dpt *dpt, p4d_t *p4d,
+				unsigned long addr, unsigned long end,
+				enum page_table_level level)
+{
+	unsigned long next;
+	pud_t *pud;
+
+	pud = dpt_pud_offset(dpt, p4d, addr);
+	if (IS_ERR(pud))
+		return;
+
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none(*pud))
+			continue;
+		if (level == PGT_LEVEL_PUD || pud_trans_huge(*pud) ||
+		    pud_devmap(*pud)) {
+			pud_clear(pud);
+			continue;
+		}
+		dpt_clear_pmd_range(dpt, pud, addr, next, level);
+	} while (pud++, addr = next, addr < end);
+}
+
+static void dpt_clear_p4d_range(struct dpt *dpt, pgd_t *pgd,
+				unsigned long addr, unsigned long end,
+				enum page_table_level level)
+{
+	unsigned long next;
+	p4d_t *p4d;
+
+	p4d = dpt_p4d_offset(dpt, pgd, addr);
+	if (IS_ERR(p4d))
+		return;
+
+	do {
+		next = p4d_addr_end(addr, end);
+		if (p4d_none(*p4d))
+			continue;
+		if (level == PGT_LEVEL_P4D) {
+			p4d_clear(p4d);
+			continue;
+		}
+		dpt_clear_pud_range(dpt, p4d, addr, next, level);
+	} while (p4d++, addr = next, addr < end);
+}
+
+static void dpt_clear_pgd_range(struct dpt *dpt, pgd_t *pagetable,
+				unsigned long addr, unsigned long end,
+				enum page_table_level level)
+{
+	unsigned long next;
+	pgd_t *pgd;
+
+	pgd = pgd_offset_pgd(pagetable, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgd))
+			continue;
+		if (level == PGT_LEVEL_PGD) {
+			pgd_clear(pgd);
+			continue;
+		}
+		dpt_clear_p4d_range(dpt, pgd, addr, next, level);
+	} while (pgd++, addr = next, addr < end);
+}
+
+/*
+ * Clear page table entries in the specified decorated page-table.
+ */
+void dpt_unmap(struct dpt *dpt, void *ptr)
+{
+	struct dpt_range_mapping *range_mapping;
+	unsigned long addr, end;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dpt->lock, flags);
+
+	range_mapping = dpt_get_range_mapping(dpt, ptr);
+	if (!range_mapping) {
+		pr_debug("DPT %p: UNMAP %px - not mapped\n", dpt, ptr);
+		goto done;
+	}
+
+	addr = (unsigned long)range_mapping->ptr;
+	end = addr + range_mapping->size;
+	pr_debug("DPT %p: UNMAP %px/%lx/%d\n", dpt, ptr,
+		 range_mapping->size, range_mapping->level);
+	dpt_clear_pgd_range(dpt, dpt->pagetable, addr, end,
+			    range_mapping->level);
+	list_del(&range_mapping->list);
+	kfree(range_mapping);
+done:
+	spin_unlock_irqrestore(&dpt->lock, flags);
+}
+EXPORT_SYMBOL(dpt_unmap);
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 10/13] mm/dpt: Function to copy page-table entries for percpu buffer
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (8 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 09/13] mm/dpt: Functions to clear decorated page-table entries for a VA range Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 11/13] mm/dpt: Add decorated page-table remap function Alexandre Chartre
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Provide functions to copy page-table entries from the kernel page-table
to a decorated page-table for a percpu buffer. A percpu buffer have a
different VA range for each cpu and all them have to be copied.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h |  6 ++++++
 arch/x86/mm/dpt.c          | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 01727ef0577e..fd8c1b84ffe2 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -78,4 +78,10 @@ static inline int dpt_map_module(struct dpt *dpt, char *module_name)
 	(dpt_map(dpt, THIS_MODULE->core_layout.base,	\
 		 THIS_MODULE->core_layout.size))
 
+extern int dpt_map_percpu(struct dpt *dpt, void *percpu_ptr, size_t size);
+extern void dpt_unmap_percpu(struct dpt *dpt, void *percpu_ptr);
+
+#define	DPT_MAP_CPUVAR(dpt, cpuvar)			\
+	dpt_map_percpu(dpt, &cpuvar, sizeof(cpuvar))
+
 #endif
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index c495c9b59b3e..adc59f9ed876 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -771,6 +771,44 @@ void dpt_unmap(struct dpt *dpt, void *ptr)
 }
 EXPORT_SYMBOL(dpt_unmap);
 
+void dpt_unmap_percpu(struct dpt *dpt, void *percpu_ptr)
+{
+	void *ptr;
+	int cpu;
+
+	pr_debug("DPT %p: UNMAP PERCPU %px\n", dpt, percpu_ptr);
+	for_each_possible_cpu(cpu) {
+		ptr = per_cpu_ptr(percpu_ptr, cpu);
+		pr_debug("DPT %p: UNMAP PERCPU%d %px\n", dpt, cpu, ptr);
+		dpt_unmap(dpt, ptr);
+	}
+}
+EXPORT_SYMBOL(dpt_unmap_percpu);
+
+int dpt_map_percpu(struct dpt *dpt, void *percpu_ptr, size_t size)
+{
+	int cpu, err;
+	void *ptr;
+
+	pr_debug("DPT %p: MAP PERCPU %px\n", dpt, percpu_ptr);
+	for_each_possible_cpu(cpu) {
+		ptr = per_cpu_ptr(percpu_ptr, cpu);
+		pr_debug("DPT %p: MAP PERCPU%d %px\n", dpt, cpu, ptr);
+		err = dpt_map(dpt, ptr, size);
+		if (err) {
+			/*
+			 * Need to unmap any percpu mapping which has
+			 * succeeded before the failure.
+			 */
+			dpt_unmap_percpu(dpt, percpu_ptr);
+			return err;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(dpt_map_percpu);
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 11/13] mm/dpt: Add decorated page-table remap function
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (9 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 10/13] mm/dpt: Function to copy page-table entries for percpu buffer Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 12/13] mm/dpt: Handle decorated page-table mapped range leaks and overlaps Alexandre Chartre
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Add a function to remap an already mapped buffer with a new address in
a decorated page-table: the already mapped buffer is unmapped, and a
new mapping is added for the specified new address.

This is useful to track and remap a buffer which can be freed and
then reallocated.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h |  1 +
 arch/x86/mm/dpt.c          | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index fd8c1b84ffe2..3234ba968d80 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -57,6 +57,7 @@ extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
 			 enum page_table_level level);
 extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);
 extern void dpt_unmap(struct dpt *dpt, void *ptr);
+extern int dpt_remap(struct dpt *dpt, void **mapping, void *ptr, size_t size);
 
 static inline int dpt_map_module(struct dpt *dpt, char *module_name)
 {
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index adc59f9ed876..9517e3081716 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -809,6 +809,31 @@ int dpt_map_percpu(struct dpt *dpt, void *percpu_ptr, size_t size)
 }
 EXPORT_SYMBOL(dpt_map_percpu);
 
+int dpt_remap(struct dpt *dpt, void **current_ptrp, void *new_ptr, size_t size)
+{
+	void *current_ptr = *current_ptrp;
+	int err;
+
+	if (current_ptr == new_ptr) {
+		/* no change, already mapped */
+		return 0;
+	}
+
+	if (current_ptr) {
+		dpt_unmap(dpt, current_ptr);
+		*current_ptrp = NULL;
+	}
+
+	err = dpt_map(dpt, new_ptr, size);
+	if (err)
+		return err;
+
+	*current_ptrp = new_ptr;
+
+	return 0;
+}
+EXPORT_SYMBOL(dpt_remap);
+
 /*
  * dpt_create - allocate a page-table and create a corresponding
  * decorated page-table. The page-table is allocated and aligned
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 12/13] mm/dpt: Handle decorated page-table mapped range leaks and overlaps
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (10 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 11/13] mm/dpt: Add decorated page-table remap function Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 13/13] mm/asi: Function to init decorated page-table with ASI core mappings Alexandre Chartre
  2020-05-14  9:29 ` [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Mike Rapoport
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

When mapping a buffer in a decorated page-table, data around the buffer can
also be mapped if the entire buffer is not aligned with the page directory
size used for the mapping. So, data can potentially leak into the decorated
page-table. In such a case, print a warning that data are leaking.

Also data effectively mapped can overlap with an already mapped buffer.
This is not an issue when mapping data but, when unmapping, make sure
data from another buffer don't get unmapped as a side effect.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/dpt.h |   1 +
 arch/x86/mm/dpt.c          | 197 +++++++++++++++++++++++++++++++------
 2 files changed, 168 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 3234ba968d80..e0adbf69dadf 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -25,6 +25,7 @@ struct dpt_range_mapping {
 	void *ptr;			/* range start address */
 	size_t size;			/* range size */
 	enum page_table_level level;	/* mapping level */
+	int refcnt;			/* reference count (for overlap) */
 };
 
 /*
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 9517e3081716..d3d3c3de2943 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -9,6 +9,22 @@
 
 #include <asm/dpt.h>
 
+
+static unsigned long page_directory_size[] = {
+	[PGT_LEVEL_PTE] = PAGE_SIZE,
+	[PGT_LEVEL_PMD] = PMD_SIZE,
+	[PGT_LEVEL_PUD] = PUD_SIZE,
+	[PGT_LEVEL_P4D] = P4D_SIZE,
+	[PGT_LEVEL_PGD] = PGDIR_SIZE,
+};
+
+#define DPT_RANGE_MAP_ADDR(r)	\
+	round_down((unsigned long)((r)->ptr), page_directory_size[(r)->level])
+
+#define DPT_RANGE_MAP_END(r)	\
+	round_up((unsigned long)((r)->ptr + (r)->size), \
+		 page_directory_size[(r)->level])
+
 /*
  * Get the pointer to the beginning of a page table directory from a page
  * table directory entry.
@@ -572,6 +588,70 @@ static int dpt_copy_pgd_range(struct dpt *dpt,
 	return 0;
 }
 
+/*
+ * Map a VA range, taking into account any overlap with already mapped
+ * VA ranges. On error, return < 0. Otherwise return the number of
+ * ranges the specified range is overlapping with.
+ */
+static int dpt_map_overlap(struct dpt *dpt, void *ptr, size_t size,
+			   enum page_table_level level)
+{
+	unsigned long map_addr, map_end;
+	unsigned long addr, end;
+	struct dpt_range_mapping *range;
+	bool need_mapping;
+	int err, overlap;
+
+	addr = (unsigned long)ptr;
+	end = addr + (unsigned long)size;
+	need_mapping = true;
+	overlap = 0;
+
+	lockdep_assert_held(&dpt->lock);
+	list_for_each_entry(range, &dpt->mapping_list, list) {
+
+		if (range->ptr == ptr && range->size == size) {
+			/* we are mapping the same range again */
+			pr_debug("DPT %p: MAP %px/%lx/%d already mapped\n",
+				 dpt, ptr, size, level);
+			return -EBUSY;
+		}
+
+		/* check overlap with mapped range */
+		map_addr = DPT_RANGE_MAP_ADDR(range);
+		map_end = DPT_RANGE_MAP_END(range);
+		if (end <= map_addr || addr >= map_end) {
+			/* no overlap, continue */
+			continue;
+		}
+
+		pr_debug("DPT %p: MAP %px/%lx/%d overlaps with %px/%lx/%d\n",
+			 dpt, ptr, size, level,
+			 range->ptr, range->size, range->level);
+		range->refcnt++;
+		overlap++;
+
+		/*
+		 * Check if new range is included into an existing range.
+		 * If so then the new range is already entirely mapped.
+		 */
+		if (addr >= map_addr && end <= map_end) {
+			pr_debug("DPT %p: MAP %px/%lx/%d implicitly mapped\n",
+				 dpt, ptr, size, level);
+			need_mapping = false;
+		}
+	}
+
+	if (need_mapping) {
+		err = dpt_copy_pgd_range(dpt, dpt->pagetable, current->mm->pgd,
+					 addr, end, level);
+		if (err)
+			return err;
+	}
+
+	return overlap;
+}
+
 /*
  * Copy page table entries from the current page table (i.e. from the
  * kernel page table) to the specified decorated page-table. The level
@@ -582,47 +662,48 @@ int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
 		  enum page_table_level level)
 {
 	struct dpt_range_mapping *range_mapping;
+	unsigned long page_dir_size = page_directory_size[level];
 	unsigned long addr = (unsigned long)ptr;
 	unsigned long end = addr + ((unsigned long)size);
+	unsigned long map_addr, map_end;
 	unsigned long flags;
-	int err;
+	int overlap;
 
-	pr_debug("DPT %p: MAP %px/%lx/%d\n", dpt, ptr, size, level);
+	map_addr = round_down(addr, page_dir_size);
+	map_end = round_up(end, page_dir_size);
 
-	spin_lock_irqsave(&dpt->lock, flags);
-
-	/* check if the range is already mapped */
-	range_mapping = dpt_get_range_mapping(dpt, ptr);
-	if (range_mapping) {
-		pr_debug("DPT %p: MAP %px/%lx/%d already mapped\n",
-			 dpt, ptr, size, level);
-		err = -EBUSY;
-		goto done;
-	}
+	pr_debug("DPT %p: MAP %px/%lx/%d -> %lx-%lx\n", dpt, ptr, size, level,
+		 map_addr, map_end);
+	if (map_addr < addr)
+		pr_debug("DPT %p: MAP LEAK %lx-%lx\n", dpt, map_addr, addr);
+	if (map_end > end)
+		pr_debug("DPT %p: MAP LEAK %lx-%lx\n", dpt, end, map_end);
 
-	/* map new range */
+	/* add new range */
 	range_mapping = kmalloc(sizeof(*range_mapping), GFP_KERNEL);
-	if (!range_mapping) {
-		err = -ENOMEM;
-		goto done;
-	}
+	if (!range_mapping)
+		return -ENOMEM;
 
-	err = dpt_copy_pgd_range(dpt, dpt->pagetable, current->mm->pgd,
-				 addr, end, level);
-	if (err) {
-		kfree(range_mapping);
-		goto done;
+	spin_lock_irqsave(&dpt->lock, flags);
+
+	/*
+	 * Map the new range with taking overlap with already mapped ranges
+	 * into account.
+	 */
+	overlap = dpt_map_overlap(dpt, ptr, size, level);
+	if (overlap < 0) {
+		spin_unlock_irqrestore(&dpt->lock, flags);
+		return overlap;
 	}
 
 	INIT_LIST_HEAD(&range_mapping->list);
 	range_mapping->ptr = ptr;
 	range_mapping->size = size;
 	range_mapping->level = level;
+	range_mapping->refcnt = overlap + 1;
 	list_add(&range_mapping->list, &dpt->mapping_list);
-done:
 	spin_unlock_irqrestore(&dpt->lock, flags);
-
-	return err;
+	return 0;
 }
 EXPORT_SYMBOL(dpt_map_range);
 
@@ -741,13 +822,72 @@ static void dpt_clear_pgd_range(struct dpt *dpt, pgd_t *pagetable,
 	} while (pgd++, addr = next, addr < end);
 }
 
+
+/*
+ * Unmap a VA range, taking into account any overlap with other mapped
+ * VA ranges.
+ */
+static void dpt_unmap_overlap(struct dpt *dpt, struct dpt_range_mapping *range)
+{
+	unsigned long pgdir_size = page_directory_size[range->level];
+	unsigned long chunk_addr, chunk_end;
+	unsigned long map_addr, map_end;
+	struct dpt_range_mapping *r;
+	unsigned long addr, end;
+	bool overlap;
+
+	addr = DPT_RANGE_MAP_ADDR(range);
+	end = DPT_RANGE_MAP_END(range);
+
+	lockdep_assert_held(&dpt->lock);
+
+	/*
+	 * Unmap the VA range by chunk to handle mapping overlap
+	 * with any another range.
+	 * XXX can be improved with a sorted range list
+	 */
+	chunk_addr = addr;
+	while (chunk_addr < end) {
+		overlap = false;
+		list_for_each_entry(r, &dpt->mapping_list, list) {
+			map_addr = DPT_RANGE_MAP_ADDR(r);
+			map_end = DPT_RANGE_MAP_END(r);
+			/*
+			 * Check if there's an overlap and how far it goes.
+			 */
+			chunk_end = chunk_addr;
+			while (chunk_end >= map_addr && chunk_end < map_end) {
+				overlap = true;
+				chunk_end += pgdir_size;
+				if (chunk_end >= end)
+					break;
+			}
+			if (overlap) {
+				pr_debug("DPT %p: UNMAP %px/%lx/%d overlaps with %px/%lx/%d\n",
+					 dpt, range->ptr, range->size,
+					 range->level,
+					 r->ptr, r->size, r->level);
+				break;
+			}
+		}
+
+		if (!overlap) {
+			pr_debug("DPT %p: UNMAP CHUNK %lx/%lx/%d\n", dpt,
+				 chunk_addr, pgdir_size, range->level);
+			chunk_end = chunk_addr + pgdir_size;
+			dpt_clear_pgd_range(dpt, dpt->pagetable, chunk_addr,
+					    chunk_end, range->level);
+		}
+		chunk_addr = chunk_end;
+	}
+}
+
 /*
  * Clear page table entries in the specified decorated page-table.
  */
 void dpt_unmap(struct dpt *dpt, void *ptr)
 {
 	struct dpt_range_mapping *range_mapping;
-	unsigned long addr, end;
 	unsigned long flags;
 
 	spin_lock_irqsave(&dpt->lock, flags);
@@ -758,13 +898,10 @@ void dpt_unmap(struct dpt *dpt, void *ptr)
 		goto done;
 	}
 
-	addr = (unsigned long)range_mapping->ptr;
-	end = addr + range_mapping->size;
 	pr_debug("DPT %p: UNMAP %px/%lx/%d\n", dpt, ptr,
 		 range_mapping->size, range_mapping->level);
-	dpt_clear_pgd_range(dpt, dpt->pagetable, addr, end,
-			    range_mapping->level);
 	list_del(&range_mapping->list);
+	dpt_unmap_overlap(dpt, range_mapping);
 	kfree(range_mapping);
 done:
 	spin_unlock_irqrestore(&dpt->lock, flags);
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4][PATCH part-2 13/13] mm/asi: Function to init decorated page-table with ASI core mappings
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (11 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 12/13] mm/dpt: Handle decorated page-table mapped range leaks and overlaps Alexandre Chartre
@ 2020-05-04 14:58 ` Alexandre Chartre
  2020-05-14  9:29 ` [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Mike Rapoport
  13 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-04 14:58 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel
  Cc: pbonzini, konrad.wilk, jan.setjeeilers, liran.alon, junaids,
	graf, rppt, kuzuno, mgross, alexandre.chartre

Core mappings are the minimal mappings we need to be able to
enter isolation and handle an isolation abort or exit. This
includes the kernel code, the GDT and the percpu ASI sessions.
We also need a stack so we map the current stack when entering
isolation and unmap it on exit/abort.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/asi.h |  2 ++
 arch/x86/mm/asi.c          | 57 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h
index ac0594d4f549..eafed750e07f 100644
--- a/arch/x86/include/asm/asi.h
+++ b/arch/x86/include/asm/asi.h
@@ -45,6 +45,7 @@
 #include <linux/export.h>
 
 #include <asm/asi_session.h>
+#include <asm/dpt.h>
 
 /*
  * ASI_NR_DYN_ASIDS is the same as TLB_NR_DYN_ASIDS. We can't directly
@@ -150,6 +151,7 @@ extern void asi_destroy(struct asi *asi);
 extern void asi_set_pagetable(struct asi *asi, pgd_t *pagetable);
 extern int asi_enter(struct asi *asi);
 extern void asi_exit(struct asi *asi);
+extern int asi_init_dpt(struct dpt *dpt);
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 DECLARE_ASI_TYPE(user);
diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c
index b63a0a883293..8b670ed13729 100644
--- a/arch/x86/mm/asi.c
+++ b/arch/x86/mm/asi.c
@@ -162,6 +162,63 @@ void asi_set_pagetable(struct asi *asi, pgd_t *pagetable)
 }
 EXPORT_SYMBOL(asi_set_pagetable);
 
+/*
+ * asi_init_dpt - Initialize a decorated page-table with the minimum
+ * mappings for using an ASI. Note that this function doesn't map any
+ * stack. If the stack of the task entering an ASI is not mapped then
+ * this will trigger a double-fault as soon as the task tries to access
+ * its stack.
+ */
+int asi_init_dpt(struct dpt *dpt)
+{
+	int err;
+
+	/*
+	 * Map the kernel.
+	 *
+	 * XXX We should check if we can map only kernel text, i.e. map with
+	 * size = _etext - _text
+	 */
+	err = dpt_map(dpt, (void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE);
+	if (err)
+		return err;
+
+	/*
+	 * Map the cpu_entry_area because we need the GDT to be mapped.
+	 * Not sure we need anything else from cpu_entry_area.
+	 */
+	err = dpt_map_range(dpt, (void *)CPU_ENTRY_AREA_PER_CPU, P4D_SIZE,
+			    PGT_LEVEL_P4D);
+	if (err)
+		return err;
+
+	/*
+	 * Map fixed_percpu_data to get the stack canary.
+	 */
+	if (IS_ENABLED(CONFIG_STACKPROTECTOR)) {
+		err = DPT_MAP_CPUVAR(dpt, fixed_percpu_data);
+		if (err)
+			return err;
+	}
+
+	/* Map current_task, we need it for __schedule() */
+	err = DPT_MAP_CPUVAR(dpt, current_task);
+	if (err)
+		return err;
+
+	/*
+	 * Map the percpu ASI tlbstate. This also maps the asi_session
+	 * which is used by interrupt handlers to figure out if we have
+	 * entered isolation and switch back to the kernel address space.
+	 */
+	err = DPT_MAP_CPUVAR(dpt, cpu_tlbstate);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(asi_init_dpt);
+
 /*
  * Update ASI TLB flush information for the specified ASI CR3 value.
  * Return an updated ASI CR3 value which specified if TLB needs to
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table)
  2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
                   ` (12 preceding siblings ...)
  2020-05-04 14:58 ` [RFC v4][PATCH part-2 13/13] mm/asi: Function to init decorated page-table with ASI core mappings Alexandre Chartre
@ 2020-05-14  9:29 ` Mike Rapoport
  2020-05-14 11:42   ` Alexandre Chartre
  13 siblings, 1 reply; 16+ messages in thread
From: Mike Rapoport @ 2020-05-14  9:29 UTC (permalink / raw)
  To: Alexandre Chartre
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel, pbonzini, konrad.wilk, jan.setjeeilers, liran.alon,
	junaids, graf, rppt, kuzuno, mgross

Hello Alexandre,

On Mon, May 04, 2020 at 04:57:57PM +0200, Alexandre Chartre wrote:
> This is part II of ASI RFC v4. Please refer to the cover letter of
> part I for an overview the ASI RFC.
> 
>   https://lore.kernel.org/lkml/20200504144939.11318-1-alexandre.chartre@oracle.com/
> 
> This part introduces decorated page-table which encapsulate native page
> table (e.g. a PGD) in order to provide convenient page-table management
> functions, such as tracking address range mapped in a page-table or
> safely handling references to another page-table.
> 
> Decorated page-table can then be used to easily create and manage page
> tables to be used with ASI. It will be used by the ASI test driver (see
> part III) and later by KVM ASI.
> 
> Decorated page-table is independent of ASI, and can potentially be used
> anywhere a page-table is needed.
 
This is very impressive work!

I wonder why did you decide to make dpt x86-specific? Unless I've missed
simething, the dpt implementation does not rely on anything architecture
specific and can go straight to linux/mm.

Another thing that comes to mind is that we already have a very
decorated page table, which is mm_struct. I admit that my attempt to
split out the core page table bits from the mm_struct [1] didn't  went
far, but I still think we need a first class abstraction for the page
table that will be used by both user memory management and the
management of the reduced kernel address spaces.


[1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=pg_table/v0.0

> Thanks,
> 
> alex.
> 
> -----
> 
> Alexandre Chartre (13):
>   mm/x86: Introduce decorated page-table (dpt)
>   mm/dpt: Track buffers allocated for a decorated page-table
>   mm/dpt: Add decorated page-table entry offset functions
>   mm/dpt: Add decorated page-table entry allocation functions
>   mm/dpt: Add decorated page-table entry set functions
>   mm/dpt: Functions to populate a decorated page-table from a VA range
>   mm/dpt: Helper functions to map module into a decorated page-table
>   mm/dpt: Keep track of VA ranges mapped in a decorated page-table
>   mm/dpt: Functions to clear decorated page-table entries for a VA range
>   mm/dpt: Function to copy page-table entries for percpu buffer
>   mm/dpt: Add decorated page-table remap function
>   mm/dpt: Handle decorated page-table mapped range leaks and overlaps
>   mm/asi: Function to init decorated page-table with ASI core mappings
> 
>  arch/x86/include/asm/asi.h |    2 +
>  arch/x86/include/asm/dpt.h |   89 +++
>  arch/x86/mm/Makefile       |    2 +-
>  arch/x86/mm/asi.c          |   57 ++
>  arch/x86/mm/dpt.c          | 1051 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 1200 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/include/asm/dpt.h
>  create mode 100644 arch/x86/mm/dpt.c
> 
> -- 
> 2.18.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table)
  2020-05-14  9:29 ` [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Mike Rapoport
@ 2020-05-14 11:42   ` Alexandre Chartre
  0 siblings, 0 replies; 16+ messages in thread
From: Alexandre Chartre @ 2020-05-14 11:42 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, linux-mm,
	linux-kernel, pbonzini, konrad.wilk, jan.setjeeilers, liran.alon,
	junaids, graf, rppt, kuzuno, mgross


On 5/14/20 11:29 AM, Mike Rapoport wrote:
> Hello Alexandre,
> 
> On Mon, May 04, 2020 at 04:57:57PM +0200, Alexandre Chartre wrote:
>> This is part II of ASI RFC v4. Please refer to the cover letter of
>> part I for an overview the ASI RFC.
>>
>>    https://lore.kernel.org/lkml/20200504144939.11318-1-alexandre.chartre@oracle.com/
>>
>> This part introduces decorated page-table which encapsulate native page
>> table (e.g. a PGD) in order to provide convenient page-table management
>> functions, such as tracking address range mapped in a page-table or
>> safely handling references to another page-table.
>>
>> Decorated page-table can then be used to easily create and manage page
>> tables to be used with ASI. It will be used by the ASI test driver (see
>> part III) and later by KVM ASI.
>>
>> Decorated page-table is independent of ASI, and can potentially be used
>> anywhere a page-table is needed.

Hi Mike,

> This is very impressive work!
> 
> I wonder why did you decide to make dpt x86-specific? Unless I've missed
> simething, the dpt implementation does not rely on anything architecture
> specific and can go straight to linux/mm.

Correct, this is not x86 specific. I put it in arch/x86 because that's currently
the only place were I use it, but it can be moved to linux/mm.

> Another thing that comes to mind is that we already have a very
> decorated page table, which is mm_struct.

mm_struct doesn't define a generic page-table encapsulation. mm_struct references
a page table (i.e. PGD) and adds all kind of attributes needed for mm management but
not necessarily related to page-table.

> I admit that my attempt to
> split out the core page table bits from the mm_struct [1] didn't  went
> far, but I still think we need a first class abstraction for the page
> table that will be used by both user memory management and the
> management of the reduced kernel address spaces.

Agree. I remember your attempt to extract the page-table from mm_struct; this is
not a simple work! For ASI, I didn't need mm, so it was simpler to built a simple
decorated page-table without attempting to use with mm (at least for now).

Thanks,

alex.

PS: if you want to play with dpt, there's a bug in dpt_destroy(), patch 08 adds a
a double free of dpt->backend_pages pages.

> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=pg_table/v0.0
> 
>> Thanks,
>>
>> alex.
>>
>> -----
>>
>> Alexandre Chartre (13):
>>    mm/x86: Introduce decorated page-table (dpt)
>>    mm/dpt: Track buffers allocated for a decorated page-table
>>    mm/dpt: Add decorated page-table entry offset functions
>>    mm/dpt: Add decorated page-table entry allocation functions
>>    mm/dpt: Add decorated page-table entry set functions
>>    mm/dpt: Functions to populate a decorated page-table from a VA range
>>    mm/dpt: Helper functions to map module into a decorated page-table
>>    mm/dpt: Keep track of VA ranges mapped in a decorated page-table
>>    mm/dpt: Functions to clear decorated page-table entries for a VA range
>>    mm/dpt: Function to copy page-table entries for percpu buffer
>>    mm/dpt: Add decorated page-table remap function
>>    mm/dpt: Handle decorated page-table mapped range leaks and overlaps
>>    mm/asi: Function to init decorated page-table with ASI core mappings
>>
>>   arch/x86/include/asm/asi.h |    2 +
>>   arch/x86/include/asm/dpt.h |   89 +++
>>   arch/x86/mm/Makefile       |    2 +-
>>   arch/x86/mm/asi.c          |   57 ++
>>   arch/x86/mm/dpt.c          | 1051 ++++++++++++++++++++++++++++++++++++
>>   5 files changed, 1200 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/x86/include/asm/dpt.h
>>   create mode 100644 arch/x86/mm/dpt.c
>>
>> -- 
>> 2.18.2
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-05-14 11:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 14:57 [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Alexandre Chartre
2020-05-04 14:57 ` [RFC v4][PATCH part-2 01/13] mm/x86: Introduce decorated page-table (dpt) Alexandre Chartre
2020-05-04 14:57 ` [RFC v4][PATCH part-2 02/13] mm/dpt: Track buffers allocated for a decorated page-table Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 03/13] mm/dpt: Add decorated page-table entry offset functions Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 04/13] mm/dpt: Add decorated page-table entry allocation functions Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 05/13] mm/dpt: Add decorated page-table entry set functions Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 06/13] mm/dpt: Functions to populate a decorated page-table from a VA range Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 07/13] mm/dpt: Helper functions to map module into a decorated page-table Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 08/13] mm/dpt: Keep track of VA ranges mapped in " Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 09/13] mm/dpt: Functions to clear decorated page-table entries for a VA range Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 10/13] mm/dpt: Function to copy page-table entries for percpu buffer Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 11/13] mm/dpt: Add decorated page-table remap function Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 12/13] mm/dpt: Handle decorated page-table mapped range leaks and overlaps Alexandre Chartre
2020-05-04 14:58 ` [RFC v4][PATCH part-2 13/13] mm/asi: Function to init decorated page-table with ASI core mappings Alexandre Chartre
2020-05-14  9:29 ` [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table) Mike Rapoport
2020-05-14 11:42   ` Alexandre Chartre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).