LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements
@ 2015-12-30  4:12 Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Andy Lutomirski
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski

This applies on top of the earlier vdso pvclock series I sent out.
Once that lands in -tip, this will apply to -tip.

This series cleans up the hack that is our vvar mapping.  We currently
initialize the vvar mapping as a special mapping vma backed by nothing
whatsoever and then we abuse remap_pfn_range to populate it.

This cheats the mm core, probably breaks under various evil madvise
workloads, and prevents handling faults in more interesting ways.

To clean it up, this series:

 - Adds a special mapping .fault operation
 - Adds a vm_insert_pfn_prot helper
 - Uses the new .fault infrastructure in x86's vdso and vvar mappings
 - Hardens the HPET mapping, mitigating an HW attack surface that bothers me

Changes from v2:
 - Added patch 1, which is needed in -tip to fix the build
 - Fixed -EBUSY handling in vvar's .fault.

Changes from v1:
 - Lots of changelog clarification requested by akpm
 - Minor tweaks to style and comments in the first two patches

Andy Lutomirski (7):
  x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
  mm: Add a vm_special_mapping .fault method
  mm: Add vm_insert_pfn_prot
  x86/vdso: Track each mm's loaded vdso image as well as its base
  x86,vdso: Use .fault for the vdso text mapping
  x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping
  x86/vdso: Disallow vvar access to vclock IO for never-used vclocks

 arch/x86/entry/vdso/vdso2c.h            |   7 --
 arch/x86/entry/vdso/vma.c               | 124 ++++++++++++++++++++------------
 arch/x86/entry/vsyscall/vsyscall_gtod.c |   9 ++-
 arch/x86/include/asm/clocksource.h      |   9 +--
 arch/x86/include/asm/mmu.h              |   3 +-
 arch/x86/include/asm/pvclock.h          |   2 +-
 arch/x86/include/asm/vdso.h             |   3 -
 arch/x86/include/asm/vgtod.h            |   6 ++
 include/linux/mm.h                      |   2 +
 include/linux/mm_types.h                |  22 +++++-
 mm/memory.c                             |  25 ++++++-
 mm/mmap.c                               |  13 ++--
 12 files changed, 152 insertions(+), 73 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
@ 2015-12-30  4:12 ` Andy Lutomirski
  2016-01-05 19:21   ` Borislav Petkov
  2016-01-06  9:54   ` [tip:x86/asm] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST= n tip-bot for Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 2/7] mm: Add a vm_special_mapping .fault method Andy Lutomirski
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/pvclock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 66df22b2e0c9..fdcc04020636 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -4,7 +4,7 @@
 #include <linux/clocksource.h>
 #include <asm/pvclock-abi.h>
 
-#ifdef CONFIG_PARAVIRT_CLOCK
+#ifdef CONFIG_KVM_GUEST
 extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
 #else
 static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 2/7] mm: Add a vm_special_mapping .fault method
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Andy Lutomirski
@ 2015-12-30  4:12 ` Andy Lutomirski
  2016-01-12 12:02   ` [tip:x86/asm] mm: Add a vm_special_mapping.fault() method tip-bot for Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 3/7] mm: Add vm_insert_pfn_prot Andy Lutomirski
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski, Andy Lutomirski

From: Andy Lutomirski <luto@amacapital.net>

Requiring special mappings to give a list of struct pages is
inflexible: it prevents sane use of IO memory in a special mapping,
it's inefficient (it requires arch code to initialize a list of
struct pages, and it requires the mm core to walk the entire list
just to figure out how long it is), and it prevents arch code from
doing anything fancy when a special mapping fault occurs.

Add a .fault method as an alternative to filling in a .pages array.

Looks-OK-to: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 include/linux/mm_types.h | 22 +++++++++++++++++++---
 mm/mmap.c                | 13 +++++++++----
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f8d1492a114f..c88e48a3c155 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -568,10 +568,26 @@ static inline void clear_tlb_flush_pending(struct mm_struct *mm)
 }
 #endif
 
-struct vm_special_mapping
-{
-	const char *name;
+struct vm_fault;
+
+struct vm_special_mapping {
+	const char *name;	/* The name, e.g. "[vdso]". */
+
+	/*
+	 * If .fault is not provided, this points to a
+	 * NULL-terminated array of pages that back the special mapping.
+	 *
+	 * This must not be NULL unless .fault is provided.
+	 */
 	struct page **pages;
+
+	/*
+	 * If non-NULL, then this is called to resolve page faults
+	 * on the special mapping.  If used, .pages is not checked.
+	 */
+	int (*fault)(const struct vm_special_mapping *sm,
+		     struct vm_area_struct *vma,
+		     struct vm_fault *vmf);
 };
 
 enum tlb_flush_reason {
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..f717453b1a57 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3030,11 +3030,16 @@ static int special_mapping_fault(struct vm_area_struct *vma,
 	pgoff_t pgoff;
 	struct page **pages;
 
-	if (vma->vm_ops == &legacy_special_mapping_vmops)
+	if (vma->vm_ops == &legacy_special_mapping_vmops) {
 		pages = vma->vm_private_data;
-	else
-		pages = ((struct vm_special_mapping *)vma->vm_private_data)->
-			pages;
+	} else {
+		struct vm_special_mapping *sm = vma->vm_private_data;
+
+		if (sm->fault)
+			return sm->fault(sm, vma, vmf);
+
+		pages = sm->pages;
+	}
 
 	for (pgoff = vmf->pgoff; pgoff && *pages; ++pages)
 		pgoff--;
-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 3/7] mm: Add vm_insert_pfn_prot
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 2/7] mm: Add a vm_special_mapping .fault method Andy Lutomirski
@ 2015-12-30  4:12 ` Andy Lutomirski
  2016-01-12 12:02   ` [tip:x86/asm] mm: Add vm_insert_pfn_prot() tip-bot for Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 4/7] x86/vdso: Track each mm's loaded vdso image as well as its base Andy Lutomirski
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski

The x86 vvar vma conntains pages with differing cacheability
flags.  x86 currently implements this by manually inserting all the ptes
using (io_)remap_pfn_range when the vma is set up.

x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up the
mappings as needed.  The correct API to use to insert a pfn in
.fault is vm_insert_pfn, but vm_insert_pfn can't override the vma's
cache mode, and the HPET page in particular needs to be uncached
despite the fact that the rest of the VMA is cached.

Add vm_insert_pfn_prot to support varying cacheability within the
same non-COW VMA in a more sane manner.

x86 could alternatively use multiple VMAs, but that's messy, would
break CRIU, and would create unnecessary VMAs that would waste
memory.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 25 +++++++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad7793788..87ef1d7730ba 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2080,6 +2080,8 @@ int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn);
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn, pgprot_t pgprot);
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn);
 int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index c387430f06c3..a29f0b90fc56 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1564,8 +1564,29 @@ out:
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn)
 {
+	return vm_insert_pfn_prot(vma, addr, pfn, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(vm_insert_pfn);
+
+/**
+ * vm_insert_pfn_prot - insert single pfn into user vma with specified pgprot
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @pfn: source kernel pfn
+ * @pgprot: pgprot flags for the inserted page
+ *
+ * This is exactly like vm_insert_pfn, except that it allows drivers to
+ * to override pgprot on a per-page basis.
+ *
+ * This only makes sense for IO mappings, and it makes no sense for
+ * cow mappings.  In general, using multiple vmas is preferable;
+ * vm_insert_pfn_prot should only be used if using multiple VMAs is
+ * impractical.
+ */
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn, pgprot_t pgprot)
+{
 	int ret;
-	pgprot_t pgprot = vma->vm_page_prot;
 	/*
 	 * Technically, architectures with pte_special can avoid all these
 	 * restrictions (same for remap_pfn_range).  However we would like
@@ -1587,7 +1608,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 
 	return ret;
 }
-EXPORT_SYMBOL(vm_insert_pfn);
+EXPORT_SYMBOL(vm_insert_pfn_prot);
 
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn)
-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 4/7] x86/vdso: Track each mm's loaded vdso image as well as its base
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
                   ` (2 preceding siblings ...)
  2015-12-30  4:12 ` [PATCH v3 3/7] mm: Add vm_insert_pfn_prot Andy Lutomirski
@ 2015-12-30  4:12 ` Andy Lutomirski
  2016-01-12 12:03   ` [tip:x86/asm] x86/vdso: Track each mm' s loaded vDSO " tip-bot for Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 5/7] x86,vdso: Use .fault for the vdso text mapping Andy Lutomirski
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski

As we start to do more intelligent things with the vdso at runtime
(as opposed to just at mm initialization time), we'll need to know
which vdso is in use.

In principle, we could guess based on the mm type, but that's
over-complicated and error-prone.  Instead, just track it in the mmu
context.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vdso/vma.c  | 1 +
 arch/x86/include/asm/mmu.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b8f69e264ac4..80b021067bd6 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -121,6 +121,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 
 	text_start = addr - image->sym_vvar_start;
 	current->mm->context.vdso = (void __user *)text_start;
+	current->mm->context.vdso_image = image;
 
 	/*
 	 * MAYWRITE to allow gdb to COW and set breakpoints
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 55234d5e7160..1ea0baef1175 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -19,7 +19,8 @@ typedef struct {
 #endif
 
 	struct mutex lock;
-	void __user *vdso;
+	void __user *vdso;			/* vdso base address */
+	const struct vdso_image *vdso_image;	/* vdso image in use */
 
 	atomic_t perf_rdpmc_allowed;	/* nonzero if rdpmc is allowed */
 } mm_context_t;
-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 5/7] x86,vdso: Use .fault for the vdso text mapping
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
                   ` (3 preceding siblings ...)
  2015-12-30  4:12 ` [PATCH v3 4/7] x86/vdso: Track each mm's loaded vdso image as well as its base Andy Lutomirski
@ 2015-12-30  4:12 ` Andy Lutomirski
  2016-01-12 12:03   ` [tip:x86/asm] x86/vdso: Use .fault for the vDSO " tip-bot for Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 6/7] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping Andy Lutomirski
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski

The old scheme for mapping the vdso text is rather complicated.  vdso2c
generates a struct vm_special_mapping and a blank .pages array of the
correct size for each vdso image.  Init code in vdso/vma.c populates
the .pages array for each vdso image, and the mapping code selects
the appropriate struct vm_special_mapping.

With .fault, we can use a less roundabout approach: vdso_fault
just returns the appropriate page for the selected vdso image.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vdso/vdso2c.h |  7 -------
 arch/x86/entry/vdso/vma.c    | 26 +++++++++++++++++++-------
 arch/x86/include/asm/vdso.h  |  3 ---
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 0224987556ce..abe961c7c71c 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -150,16 +150,9 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	}
 	fprintf(outfile, "\n};\n\n");
 
-	fprintf(outfile, "static struct page *pages[%lu];\n\n",
-		mapping_size / 4096);
-
 	fprintf(outfile, "const struct vdso_image %s = {\n", name);
 	fprintf(outfile, "\t.data = raw_data,\n");
 	fprintf(outfile, "\t.size = %lu,\n", mapping_size);
-	fprintf(outfile, "\t.text_mapping = {\n");
-	fprintf(outfile, "\t\t.name = \"[vdso]\",\n");
-	fprintf(outfile, "\t\t.pages = pages,\n");
-	fprintf(outfile, "\t},\n");
 	if (alt_sec) {
 		fprintf(outfile, "\t.alt = %lu,\n",
 			(unsigned long)GET_LE(&alt_sec->sh_offset));
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 80b021067bd6..eb50d7c1f161 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -27,13 +27,7 @@ unsigned int __read_mostly vdso64_enabled = 1;
 
 void __init init_vdso_image(const struct vdso_image *image)
 {
-	int i;
-	int npages = (image->size) / PAGE_SIZE;
-
 	BUG_ON(image->size % PAGE_SIZE != 0);
-	for (i = 0; i < npages; i++)
-		image->text_mapping.pages[i] =
-			virt_to_page(image->data + i*PAGE_SIZE);
 
 	apply_alternatives((struct alt_instr *)(image->data + image->alt),
 			   (struct alt_instr *)(image->data + image->alt +
@@ -90,6 +84,24 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 #endif
 }
 
+static int vdso_fault(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	const struct vdso_image *image = vma->vm_mm->context.vdso_image;
+
+	if (!image || (vmf->pgoff << PAGE_SHIFT) >= image->size)
+		return VM_FAULT_SIGBUS;
+
+	vmf->page = virt_to_page(image->data + (vmf->pgoff << PAGE_SHIFT));
+	get_page(vmf->page);
+	return 0;
+}
+
+static const struct vm_special_mapping text_mapping = {
+	.name = "[vdso]",
+	.fault = vdso_fault,
+};
+
 static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 {
 	struct mm_struct *mm = current->mm;
@@ -131,7 +143,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 				       image->size,
 				       VM_READ|VM_EXEC|
 				       VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
-				       &image->text_mapping);
+				       &text_mapping);
 
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index deabaf9759b6..43dc55be524e 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -13,9 +13,6 @@ struct vdso_image {
 	void *data;
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
-	/* text_mapping.pages is big enough for data/size page pointers */
-	struct vm_special_mapping text_mapping;
-
 	unsigned long alt, alt_len;
 
 	long sym_vvar_start;  /* Negative offset to the vvar area */
-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 6/7] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
                   ` (4 preceding siblings ...)
  2015-12-30  4:12 ` [PATCH v3 5/7] x86,vdso: Use .fault for the vdso text mapping Andy Lutomirski
@ 2015-12-30  4:12 ` Andy Lutomirski
  2016-01-12 12:04   ` [tip:x86/asm] x86/vdso: Use ->fault() instead of remap_pfn_range( ) " tip-bot for Andy Lutomirski
  2015-12-30  4:12 ` [PATCH v3 7/7] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks Andy Lutomirski
  2016-01-05  0:02 ` [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Kees Cook
  7 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski

This is IMO much less ugly, and it also opens the door to
disallowing unprivileged userspace HPET access on systems with
usable TSCs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vdso/vma.c | 97 ++++++++++++++++++++++++++++-------------------
 1 file changed, 57 insertions(+), 40 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index eb50d7c1f161..4b5461ba4f6b 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -102,18 +102,69 @@ static const struct vm_special_mapping text_mapping = {
 	.fault = vdso_fault,
 };
 
+static int vvar_fault(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	const struct vdso_image *image = vma->vm_mm->context.vdso_image;
+	long sym_offset;
+	int ret = -EFAULT;
+
+	if (!image)
+		return VM_FAULT_SIGBUS;
+
+	sym_offset = (long)(vmf->pgoff << PAGE_SHIFT) +
+		image->sym_vvar_start;
+
+	/*
+	 * Sanity check: a symbol offset of zero means that the page
+	 * does not exist for this vdso image, not that the page is at
+	 * offset zero relative to the text mapping.  This should be
+	 * impossible here, because sym_offset should only be zero for
+	 * the page past the end of the vvar mapping.
+	 */
+	if (sym_offset == 0)
+		return VM_FAULT_SIGBUS;
+
+	if (sym_offset == image->sym_vvar_page) {
+		ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address,
+				    __pa_symbol(&__vvar_page) >> PAGE_SHIFT);
+	} else if (sym_offset == image->sym_hpet_page) {
+#ifdef CONFIG_HPET_TIMER
+		if (hpet_address) {
+			ret = vm_insert_pfn_prot(
+				vma,
+				(unsigned long)vmf->virtual_address,
+				hpet_address >> PAGE_SHIFT,
+				pgprot_noncached(PAGE_READONLY));
+		}
+#endif
+	} else if (sym_offset == image->sym_pvclock_page) {
+		struct pvclock_vsyscall_time_info *pvti =
+			pvclock_pvti_cpu0_va();
+		if (pvti) {
+			ret = vm_insert_pfn(
+				vma,
+				(unsigned long)vmf->virtual_address,
+				__pa(pvti) >> PAGE_SHIFT);
+		}
+	}
+
+	if (ret == 0 || ret == -EBUSY)
+		return VM_FAULT_NOPAGE;
+
+	return VM_FAULT_SIGBUS;
+}
+
 static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	unsigned long addr, text_start;
 	int ret = 0;
-	static struct page *no_pages[] = {NULL};
-	static struct vm_special_mapping vvar_mapping = {
+	static const struct vm_special_mapping vvar_mapping = {
 		.name = "[vvar]",
-		.pages = no_pages,
+		.fault = vvar_fault,
 	};
-	struct pvclock_vsyscall_time_info *pvti;
 
 	if (calculate_addr) {
 		addr = vdso_addr(current->mm->start_stack,
@@ -153,7 +204,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 	vma = _install_special_mapping(mm,
 				       addr,
 				       -image->sym_vvar_start,
-				       VM_READ|VM_MAYREAD,
+				       VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
+				       VM_PFNMAP,
 				       &vvar_mapping);
 
 	if (IS_ERR(vma)) {
@@ -161,41 +213,6 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 		goto up_fail;
 	}
 
-	if (image->sym_vvar_page)
-		ret = remap_pfn_range(vma,
-				      text_start + image->sym_vvar_page,
-				      __pa_symbol(&__vvar_page) >> PAGE_SHIFT,
-				      PAGE_SIZE,
-				      PAGE_READONLY);
-
-	if (ret)
-		goto up_fail;
-
-#ifdef CONFIG_HPET_TIMER
-	if (hpet_address && image->sym_hpet_page) {
-		ret = io_remap_pfn_range(vma,
-			text_start + image->sym_hpet_page,
-			hpet_address >> PAGE_SHIFT,
-			PAGE_SIZE,
-			pgprot_noncached(PAGE_READONLY));
-
-		if (ret)
-			goto up_fail;
-	}
-#endif
-
-	pvti = pvclock_pvti_cpu0_va();
-	if (pvti && image->sym_pvclock_page) {
-		ret = remap_pfn_range(vma,
-				      text_start + image->sym_pvclock_page,
-				      __pa(pvti) >> PAGE_SHIFT,
-				      PAGE_SIZE,
-				      PAGE_READONLY);
-
-		if (ret)
-			goto up_fail;
-	}
-
 up_fail:
 	if (ret)
 		current->mm->context.vdso = NULL;
-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 7/7] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
                   ` (5 preceding siblings ...)
  2015-12-30  4:12 ` [PATCH v3 6/7] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping Andy Lutomirski
@ 2015-12-30  4:12 ` Andy Lutomirski
  2016-01-12 12:04   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-05  0:02 ` [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Kees Cook
  7 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2015-12-30  4:12 UTC (permalink / raw)
  To: x86, Borislav Petkov, Kees Cook
  Cc: linux-kernel, Oleg Nesterov, Andy Lutomirski

It makes me uncomfortable that even modern systems grant every
process direct read access to the HPET.

While fixing this for real without regressing anything is a mess
(unmapping the HPET is tricky because we don't adequately track all
the mappings), we can do almost as well by tracking which vclocks
have ever been used and only allowing pages associated with used
vclocks to be faulted in.

This will cause rogue programs that try to peek at the HPET to get
SIGBUS instead on most systems.

We can't restrict faults to vclock pages that are associated with
the currently selected vclock due to a race: a process could start
to access the HPET for the first time and race against a switch away
from the HPET as the current clocksource.  We can't segfault the
process trying to peek at the HPET in this case, even though the
process isn't going to do anything useful with the data.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vdso/vma.c               | 4 ++--
 arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++++++++-
 arch/x86/include/asm/clocksource.h      | 9 +++++----
 arch/x86/include/asm/vgtod.h            | 6 ++++++
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 4b5461ba4f6b..7c912fefe79b 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -130,7 +130,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 				    __pa_symbol(&__vvar_page) >> PAGE_SHIFT);
 	} else if (sym_offset == image->sym_hpet_page) {
 #ifdef CONFIG_HPET_TIMER
-		if (hpet_address) {
+		if (hpet_address && vclock_was_used(VCLOCK_HPET)) {
 			ret = vm_insert_pfn_prot(
 				vma,
 				(unsigned long)vmf->virtual_address,
@@ -141,7 +141,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 	} else if (sym_offset == image->sym_pvclock_page) {
 		struct pvclock_vsyscall_time_info *pvti =
 			pvclock_pvti_cpu0_va();
-		if (pvti) {
+		if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
 			ret = vm_insert_pfn(
 				vma,
 				(unsigned long)vmf->virtual_address,
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index 51e330416995..0fb3a104ac62 100644
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -16,6 +16,8 @@
 #include <asm/vgtod.h>
 #include <asm/vvar.h>
 
+int vclocks_used __read_mostly;
+
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);
 
 void update_vsyscall_tz(void)
@@ -26,12 +28,17 @@ void update_vsyscall_tz(void)
 
 void update_vsyscall(struct timekeeper *tk)
 {
+	int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
 	struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
 
+	/* Mark the new vclock used. */
+	BUILD_BUG_ON(VCLOCK_MAX >= 32);
+	WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 << vclock_mode));
+
 	gtod_write_begin(vdata);
 
 	/* copy vsyscall data */
-	vdata->vclock_mode	= tk->tkr_mono.clock->archdata.vclock_mode;
+	vdata->vclock_mode	= vclock_mode;
 	vdata->cycle_last	= tk->tkr_mono.cycle_last;
 	vdata->mask		= tk->tkr_mono.mask;
 	vdata->mult		= tk->tkr_mono.mult;
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eda81dc0f4ae..d194266acb28 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -3,10 +3,11 @@
 #ifndef _ASM_X86_CLOCKSOURCE_H
 #define _ASM_X86_CLOCKSOURCE_H
 
-#define VCLOCK_NONE 0  /* No vDSO clock available.	*/
-#define VCLOCK_TSC  1  /* vDSO should use vread_tsc.	*/
-#define VCLOCK_HPET 2  /* vDSO should use vread_hpet.	*/
-#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */
+#define VCLOCK_NONE	0  /* No vDSO clock available.	*/
+#define VCLOCK_TSC	1  /* vDSO should use vread_tsc.	*/
+#define VCLOCK_HPET	2  /* vDSO should use vread_hpet.	*/
+#define VCLOCK_PVCLOCK	3 /* vDSO should use vread_pvclock. */
+#define VCLOCK_MAX	3
 
 struct arch_clocksource_data {
 	int vclock_mode;
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index f556c4843aa1..e728699db774 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -37,6 +37,12 @@ struct vsyscall_gtod_data {
 };
 extern struct vsyscall_gtod_data vsyscall_gtod_data;
 
+extern int vclocks_used;
+static inline bool vclock_was_used(int vclock)
+{
+	return READ_ONCE(vclocks_used) & (1 << vclock);
+}
+
 static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s)
 {
 	unsigned ret;
-- 
2.5.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements
  2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
                   ` (6 preceding siblings ...)
  2015-12-30  4:12 ` [PATCH v3 7/7] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks Andy Lutomirski
@ 2016-01-05  0:02 ` Kees Cook
  7 siblings, 0 replies; 17+ messages in thread
From: Kees Cook @ 2016-01-05  0:02 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, Borislav Petkov, LKML, Oleg Nesterov

On Tue, Dec 29, 2015 at 8:12 PM, Andy Lutomirski <luto@kernel.org> wrote:
> This applies on top of the earlier vdso pvclock series I sent out.
> Once that lands in -tip, this will apply to -tip.
>
> This series cleans up the hack that is our vvar mapping.  We currently
> initialize the vvar mapping as a special mapping vma backed by nothing
> whatsoever and then we abuse remap_pfn_range to populate it.
>
> This cheats the mm core, probably breaks under various evil madvise
> workloads, and prevents handling faults in more interesting ways.
>
> To clean it up, this series:
>
>  - Adds a special mapping .fault operation
>  - Adds a vm_insert_pfn_prot helper
>  - Uses the new .fault infrastructure in x86's vdso and vvar mappings
>  - Hardens the HPET mapping, mitigating an HW attack surface that bothers me
>
> Changes from v2:
>  - Added patch 1, which is needed in -tip to fix the build
>  - Fixed -EBUSY handling in vvar's .fault.
>
> Changes from v1:
>  - Lots of changelog clarification requested by akpm
>  - Minor tweaks to style and comments in the first two patches
>
> Andy Lutomirski (7):
>   x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
>   mm: Add a vm_special_mapping .fault method
>   mm: Add vm_insert_pfn_prot
>   x86/vdso: Track each mm's loaded vdso image as well as its base
>   x86,vdso: Use .fault for the vdso text mapping
>   x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping
>   x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
>
>  arch/x86/entry/vdso/vdso2c.h            |   7 --
>  arch/x86/entry/vdso/vma.c               | 124 ++++++++++++++++++++------------
>  arch/x86/entry/vsyscall/vsyscall_gtod.c |   9 ++-
>  arch/x86/include/asm/clocksource.h      |   9 +--
>  arch/x86/include/asm/mmu.h              |   3 +-
>  arch/x86/include/asm/pvclock.h          |   2 +-
>  arch/x86/include/asm/vdso.h             |   3 -
>  arch/x86/include/asm/vgtod.h            |   6 ++
>  include/linux/mm.h                      |   2 +
>  include/linux/mm_types.h                |  22 +++++-
>  mm/memory.c                             |  25 ++++++-
>  mm/mmap.c                               |  13 ++--
>  12 files changed, 152 insertions(+), 73 deletions(-)

Reviewed-by: Kees Cook <keescook@chromium.org>

I think this makes things much more readable. :)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
  2015-12-30  4:12 ` [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Andy Lutomirski
@ 2016-01-05 19:21   ` Borislav Petkov
  2016-01-06  9:54   ` [tip:x86/asm] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST= n tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2016-01-05 19:21 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, Kees Cook, linux-kernel, Oleg Nesterov

On Tue, Dec 29, 2015 at 08:12:18PM -0800, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/pvclock.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 66df22b2e0c9..fdcc04020636 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -4,7 +4,7 @@
>  #include <linux/clocksource.h>
>  #include <asm/pvclock-abi.h>
>  
> -#ifdef CONFIG_PARAVIRT_CLOCK
> +#ifdef CONFIG_KVM_GUEST
>  extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
>  #else
>  static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> -- 

Tested-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST= n
  2015-12-30  4:12 ` [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Andy Lutomirski
  2016-01-05 19:21   ` Borislav Petkov
@ 2016-01-06  9:54   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-06  9:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, oleg, bp, keescook, luto, hpa, mingo, linux-kernel

Commit-ID:  8705d603edd49f1cff165cd3b7998f4c7f098d27
Gitweb:     http://git.kernel.org/tip/8705d603edd49f1cff165cd3b7998f4c7f098d27
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 29 Dec 2015 20:12:18 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 6 Jan 2016 10:49:53 +0100

x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n

arch/x86/built-in.o: In function `arch_setup_additional_pages':
 (.text+0x587): undefined reference to `pvclock_pvti_cpu0_va'

KVM_GUEST selects PARAVIRT_CLOCK, so we can make pvclock_pvti_cpu0_va depend
on KVM_GUEST.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/444d38a9bcba832685740ea1401b569861d09a72.1451446564.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/pvclock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 66df22b..fdcc040 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -4,7 +4,7 @@
 #include <linux/clocksource.h>
 #include <asm/pvclock-abi.h>
 
-#ifdef CONFIG_PARAVIRT_CLOCK
+#ifdef CONFIG_KVM_GUEST
 extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
 #else
 static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] mm: Add a vm_special_mapping.fault() method
  2015-12-30  4:12 ` [PATCH v3 2/7] mm: Add a vm_special_mapping .fault method Andy Lutomirski
@ 2016-01-12 12:02   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-12 12:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, tglx, mingo, luto, keescook, hpa, oleg, fenghua.yu,
	torvalds, quentin.casasnovas, linux-kernel, peterz, bp,
	dave.hansen

Commit-ID:  f872f5400cc01373d8e29d9c7a5296ccfaf4ccf3
Gitweb:     http://git.kernel.org/tip/f872f5400cc01373d8e29d9c7a5296ccfaf4ccf3
Author:     Andy Lutomirski <luto@amacapital.net>
AuthorDate: Tue, 29 Dec 2015 20:12:19 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Jan 2016 11:59:34 +0100

mm: Add a vm_special_mapping.fault() method

Requiring special mappings to give a list of struct pages is
inflexible: it prevents sane use of IO memory in a special
mapping, it's inefficient (it requires arch code to initialize a
list of struct pages, and it requires the mm core to walk the
entire list just to figure out how long it is), and it prevents
arch code from doing anything fancy when a special mapping fault
occurs.

Add a .fault method as an alternative to filling in a .pages
array.

Looks-OK-to: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a26d1677c0bc7e774c33f469451a78ca31e9e6af.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/mm_types.h | 22 +++++++++++++++++++---
 mm/mmap.c                | 13 +++++++++----
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f8d1492..c88e48a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -568,10 +568,26 @@ static inline void clear_tlb_flush_pending(struct mm_struct *mm)
 }
 #endif
 
-struct vm_special_mapping
-{
-	const char *name;
+struct vm_fault;
+
+struct vm_special_mapping {
+	const char *name;	/* The name, e.g. "[vdso]". */
+
+	/*
+	 * If .fault is not provided, this points to a
+	 * NULL-terminated array of pages that back the special mapping.
+	 *
+	 * This must not be NULL unless .fault is provided.
+	 */
 	struct page **pages;
+
+	/*
+	 * If non-NULL, then this is called to resolve page faults
+	 * on the special mapping.  If used, .pages is not checked.
+	 */
+	int (*fault)(const struct vm_special_mapping *sm,
+		     struct vm_area_struct *vma,
+		     struct vm_fault *vmf);
 };
 
 enum tlb_flush_reason {
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a6..f717453 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3030,11 +3030,16 @@ static int special_mapping_fault(struct vm_area_struct *vma,
 	pgoff_t pgoff;
 	struct page **pages;
 
-	if (vma->vm_ops == &legacy_special_mapping_vmops)
+	if (vma->vm_ops == &legacy_special_mapping_vmops) {
 		pages = vma->vm_private_data;
-	else
-		pages = ((struct vm_special_mapping *)vma->vm_private_data)->
-			pages;
+	} else {
+		struct vm_special_mapping *sm = vma->vm_private_data;
+
+		if (sm->fault)
+			return sm->fault(sm, vma, vmf);
+
+		pages = sm->pages;
+	}
 
 	for (pgoff = vmf->pgoff; pgoff && *pages; ++pages)
 		pgoff--;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] mm: Add vm_insert_pfn_prot()
  2015-12-30  4:12 ` [PATCH v3 3/7] mm: Add vm_insert_pfn_prot Andy Lutomirski
@ 2016-01-12 12:02   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-12 12:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: oleg, mingo, linux-kernel, tglx, keescook, luto, dave.hansen,
	fenghua.yu, luto, peterz, quentin.casasnovas, torvalds, bp, akpm,
	hpa

Commit-ID:  1745cbc5d0dee0749a6bc0ea8e872c5db0074061
Gitweb:     http://git.kernel.org/tip/1745cbc5d0dee0749a6bc0ea8e872c5db0074061
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 29 Dec 2015 20:12:20 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Jan 2016 11:59:34 +0100

mm: Add vm_insert_pfn_prot()

The x86 vvar vma contains pages with differing cacheability
flags.  x86 currently implements this by manually inserting all
the ptes using (io_)remap_pfn_range when the vma is set up.

x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up
the mappings as needed.  The correct API to use to insert a pfn
in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the
vma's cache mode, and the HPET page in particular needs to be
uncached despite the fact that the rest of the VMA is cached.

Add vm_insert_pfn_prot() to support varying cacheability within
the same non-COW VMA in a more sane manner.

x86 could alternatively use multiple VMAs, but that's messy,
would break CRIU, and would create unnecessary VMAs that would
waste memory.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 25 +++++++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad77..87ef1d7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2080,6 +2080,8 @@ int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn);
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn, pgprot_t pgprot);
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn);
 int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index c387430..a29f0b9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1564,8 +1564,29 @@ out:
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn)
 {
+	return vm_insert_pfn_prot(vma, addr, pfn, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(vm_insert_pfn);
+
+/**
+ * vm_insert_pfn_prot - insert single pfn into user vma with specified pgprot
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @pfn: source kernel pfn
+ * @pgprot: pgprot flags for the inserted page
+ *
+ * This is exactly like vm_insert_pfn, except that it allows drivers to
+ * to override pgprot on a per-page basis.
+ *
+ * This only makes sense for IO mappings, and it makes no sense for
+ * cow mappings.  In general, using multiple vmas is preferable;
+ * vm_insert_pfn_prot should only be used if using multiple VMAs is
+ * impractical.
+ */
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn, pgprot_t pgprot)
+{
 	int ret;
-	pgprot_t pgprot = vma->vm_page_prot;
 	/*
 	 * Technically, architectures with pte_special can avoid all these
 	 * restrictions (same for remap_pfn_range).  However we would like
@@ -1587,7 +1608,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 
 	return ret;
 }
-EXPORT_SYMBOL(vm_insert_pfn);
+EXPORT_SYMBOL(vm_insert_pfn_prot);
 
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] x86/vdso: Track each mm' s loaded vDSO image as well as its base
  2015-12-30  4:12 ` [PATCH v3 4/7] x86/vdso: Track each mm's loaded vdso image as well as its base Andy Lutomirski
@ 2016-01-12 12:03   ` " tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-12 12:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fenghua.yu, luto, quentin.casasnovas, bp, linux-kernel, luto,
	oleg, torvalds, dave.hansen, keescook, mingo, tglx, peterz, hpa

Commit-ID:  352b78c62f27b356b182008acd3117f3ee03ffd2
Gitweb:     http://git.kernel.org/tip/352b78c62f27b356b182008acd3117f3ee03ffd2
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 29 Dec 2015 20:12:21 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Jan 2016 11:59:34 +0100

x86/vdso: Track each mm's loaded vDSO image as well as its base

As we start to do more intelligent things with the vDSO at
runtime (as opposed to just at mm initialization time), we'll
need to know which vDSO is in use.

In principle, we could guess based on the mm type, but that's
over-complicated and error-prone.  Instead, just track it in the
mmu context.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vma.c  | 1 +
 arch/x86/include/asm/mmu.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b8f69e2..80b0210 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -121,6 +121,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 
 	text_start = addr - image->sym_vvar_start;
 	current->mm->context.vdso = (void __user *)text_start;
+	current->mm->context.vdso_image = image;
 
 	/*
 	 * MAYWRITE to allow gdb to COW and set breakpoints
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 55234d5..1ea0bae 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -19,7 +19,8 @@ typedef struct {
 #endif
 
 	struct mutex lock;
-	void __user *vdso;
+	void __user *vdso;			/* vdso base address */
+	const struct vdso_image *vdso_image;	/* vdso image in use */
 
 	atomic_t perf_rdpmc_allowed;	/* nonzero if rdpmc is allowed */
 } mm_context_t;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] x86/vdso: Use .fault for the vDSO text mapping
  2015-12-30  4:12 ` [PATCH v3 5/7] x86,vdso: Use .fault for the vdso text mapping Andy Lutomirski
@ 2016-01-12 12:03   ` " tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-12 12:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, dave.hansen, quentin.casasnovas, fenghua.yu, mingo,
	oleg, torvalds, tglx, hpa, bp, luto, keescook, peterz, luto

Commit-ID:  05ef76b20fc4297b0d3f8a956f1c809a8a1b3f1d
Gitweb:     http://git.kernel.org/tip/05ef76b20fc4297b0d3f8a956f1c809a8a1b3f1d
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 29 Dec 2015 20:12:22 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Jan 2016 11:59:34 +0100

x86/vdso: Use .fault for the vDSO text mapping

The old scheme for mapping the vDSO text is rather complicated.
vdso2c generates a struct vm_special_mapping and a blank .pages
array of the correct size for each vdso image.  Init code in
vdso/vma.c populates the .pages array for each vDSO image, and
the mapping code selects the appropriate struct
vm_special_mapping.

With .fault, we can use a less roundabout approach: vdso_fault()
just returns the appropriate page for the selected vDSO image.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vdso2c.h |  7 -------
 arch/x86/entry/vdso/vma.c    | 26 +++++++++++++++++++-------
 arch/x86/include/asm/vdso.h  |  3 ---
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 0224987..abe961c 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -150,16 +150,9 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	}
 	fprintf(outfile, "\n};\n\n");
 
-	fprintf(outfile, "static struct page *pages[%lu];\n\n",
-		mapping_size / 4096);
-
 	fprintf(outfile, "const struct vdso_image %s = {\n", name);
 	fprintf(outfile, "\t.data = raw_data,\n");
 	fprintf(outfile, "\t.size = %lu,\n", mapping_size);
-	fprintf(outfile, "\t.text_mapping = {\n");
-	fprintf(outfile, "\t\t.name = \"[vdso]\",\n");
-	fprintf(outfile, "\t\t.pages = pages,\n");
-	fprintf(outfile, "\t},\n");
 	if (alt_sec) {
 		fprintf(outfile, "\t.alt = %lu,\n",
 			(unsigned long)GET_LE(&alt_sec->sh_offset));
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 80b0210..eb50d7c 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -27,13 +27,7 @@ unsigned int __read_mostly vdso64_enabled = 1;
 
 void __init init_vdso_image(const struct vdso_image *image)
 {
-	int i;
-	int npages = (image->size) / PAGE_SIZE;
-
 	BUG_ON(image->size % PAGE_SIZE != 0);
-	for (i = 0; i < npages; i++)
-		image->text_mapping.pages[i] =
-			virt_to_page(image->data + i*PAGE_SIZE);
 
 	apply_alternatives((struct alt_instr *)(image->data + image->alt),
 			   (struct alt_instr *)(image->data + image->alt +
@@ -90,6 +84,24 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 #endif
 }
 
+static int vdso_fault(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	const struct vdso_image *image = vma->vm_mm->context.vdso_image;
+
+	if (!image || (vmf->pgoff << PAGE_SHIFT) >= image->size)
+		return VM_FAULT_SIGBUS;
+
+	vmf->page = virt_to_page(image->data + (vmf->pgoff << PAGE_SHIFT));
+	get_page(vmf->page);
+	return 0;
+}
+
+static const struct vm_special_mapping text_mapping = {
+	.name = "[vdso]",
+	.fault = vdso_fault,
+};
+
 static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 {
 	struct mm_struct *mm = current->mm;
@@ -131,7 +143,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 				       image->size,
 				       VM_READ|VM_EXEC|
 				       VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
-				       &image->text_mapping);
+				       &text_mapping);
 
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index deabaf9..43dc55b 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -13,9 +13,6 @@ struct vdso_image {
 	void *data;
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
-	/* text_mapping.pages is big enough for data/size page pointers */
-	struct vm_special_mapping text_mapping;
-
 	unsigned long alt, alt_len;
 
 	long sym_vvar_start;  /* Negative offset to the vvar area */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] x86/vdso: Use ->fault() instead of remap_pfn_range( ) for the vvar mapping
  2015-12-30  4:12 ` [PATCH v3 6/7] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping Andy Lutomirski
@ 2016-01-12 12:04   ` " tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-12 12:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, linux-kernel, dave.hansen, bp, luto, keescook, tglx, luto,
	peterz, quentin.casasnovas, mingo, oleg, torvalds, fenghua.yu

Commit-ID:  a48a7042613eb1524d18b7b1ed7d3a6b611fd21f
Gitweb:     http://git.kernel.org/tip/a48a7042613eb1524d18b7b1ed7d3a6b611fd21f
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 29 Dec 2015 20:12:23 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Jan 2016 11:59:35 +0100

x86/vdso: Use ->fault() instead of remap_pfn_range() for the vvar mapping

This is IMO much less ugly, and it also opens the door to
disallowing unprivileged userspace HPET access on systems with
usable TSCs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c19c2909e5ee3c3d8742f916586676bb7c40345f.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vma.c | 97 ++++++++++++++++++++++++++++-------------------
 1 file changed, 57 insertions(+), 40 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index eb50d7c..4b5461b 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -102,18 +102,69 @@ static const struct vm_special_mapping text_mapping = {
 	.fault = vdso_fault,
 };
 
+static int vvar_fault(const struct vm_special_mapping *sm,
+		      struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	const struct vdso_image *image = vma->vm_mm->context.vdso_image;
+	long sym_offset;
+	int ret = -EFAULT;
+
+	if (!image)
+		return VM_FAULT_SIGBUS;
+
+	sym_offset = (long)(vmf->pgoff << PAGE_SHIFT) +
+		image->sym_vvar_start;
+
+	/*
+	 * Sanity check: a symbol offset of zero means that the page
+	 * does not exist for this vdso image, not that the page is at
+	 * offset zero relative to the text mapping.  This should be
+	 * impossible here, because sym_offset should only be zero for
+	 * the page past the end of the vvar mapping.
+	 */
+	if (sym_offset == 0)
+		return VM_FAULT_SIGBUS;
+
+	if (sym_offset == image->sym_vvar_page) {
+		ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address,
+				    __pa_symbol(&__vvar_page) >> PAGE_SHIFT);
+	} else if (sym_offset == image->sym_hpet_page) {
+#ifdef CONFIG_HPET_TIMER
+		if (hpet_address) {
+			ret = vm_insert_pfn_prot(
+				vma,
+				(unsigned long)vmf->virtual_address,
+				hpet_address >> PAGE_SHIFT,
+				pgprot_noncached(PAGE_READONLY));
+		}
+#endif
+	} else if (sym_offset == image->sym_pvclock_page) {
+		struct pvclock_vsyscall_time_info *pvti =
+			pvclock_pvti_cpu0_va();
+		if (pvti) {
+			ret = vm_insert_pfn(
+				vma,
+				(unsigned long)vmf->virtual_address,
+				__pa(pvti) >> PAGE_SHIFT);
+		}
+	}
+
+	if (ret == 0 || ret == -EBUSY)
+		return VM_FAULT_NOPAGE;
+
+	return VM_FAULT_SIGBUS;
+}
+
 static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	unsigned long addr, text_start;
 	int ret = 0;
-	static struct page *no_pages[] = {NULL};
-	static struct vm_special_mapping vvar_mapping = {
+	static const struct vm_special_mapping vvar_mapping = {
 		.name = "[vvar]",
-		.pages = no_pages,
+		.fault = vvar_fault,
 	};
-	struct pvclock_vsyscall_time_info *pvti;
 
 	if (calculate_addr) {
 		addr = vdso_addr(current->mm->start_stack,
@@ -153,7 +204,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 	vma = _install_special_mapping(mm,
 				       addr,
 				       -image->sym_vvar_start,
-				       VM_READ|VM_MAYREAD,
+				       VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
+				       VM_PFNMAP,
 				       &vvar_mapping);
 
 	if (IS_ERR(vma)) {
@@ -161,41 +213,6 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 		goto up_fail;
 	}
 
-	if (image->sym_vvar_page)
-		ret = remap_pfn_range(vma,
-				      text_start + image->sym_vvar_page,
-				      __pa_symbol(&__vvar_page) >> PAGE_SHIFT,
-				      PAGE_SIZE,
-				      PAGE_READONLY);
-
-	if (ret)
-		goto up_fail;
-
-#ifdef CONFIG_HPET_TIMER
-	if (hpet_address && image->sym_hpet_page) {
-		ret = io_remap_pfn_range(vma,
-			text_start + image->sym_hpet_page,
-			hpet_address >> PAGE_SHIFT,
-			PAGE_SIZE,
-			pgprot_noncached(PAGE_READONLY));
-
-		if (ret)
-			goto up_fail;
-	}
-#endif
-
-	pvti = pvclock_pvti_cpu0_va();
-	if (pvti && image->sym_pvclock_page) {
-		ret = remap_pfn_range(vma,
-				      text_start + image->sym_pvclock_page,
-				      __pa(pvti) >> PAGE_SHIFT,
-				      PAGE_SIZE,
-				      PAGE_READONLY);
-
-		if (ret)
-			goto up_fail;
-	}
-
 up_fail:
 	if (ret)
 		current->mm->context.vdso = NULL;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
  2015-12-30  4:12 ` [PATCH v3 7/7] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks Andy Lutomirski
@ 2016-01-12 12:04   ` " tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-12 12:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, oleg, dave.hansen, quentin.casasnovas, peterz, keescook,
	mingo, luto, fenghua.yu, linux-kernel, tglx, luto, bp, torvalds

Commit-ID:  bd902c536298830e4d126dcf6491b46d3f1bf96e
Gitweb:     http://git.kernel.org/tip/bd902c536298830e4d126dcf6491b46d3f1bf96e
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 29 Dec 2015 20:12:24 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Jan 2016 11:59:35 +0100

x86/vdso: Disallow vvar access to vclock IO for never-used vclocks

It makes me uncomfortable that even modern systems grant every
process direct read access to the HPET.

While fixing this for real without regressing anything is a mess
(unmapping the HPET is tricky because we don't adequately track
all the mappings), we can do almost as well by tracking which
vclocks have ever been used and only allowing pages associated
with used vclocks to be faulted in.

This will cause rogue programs that try to peek at the HPET to
get SIGBUS instead on most systems.

We can't restrict faults to vclock pages that are associated
with the currently selected vclock due to a race: a process
could start to access the HPET for the first time and race
against a switch away from the HPET as the current clocksource.
We can't segfault the process trying to peek at the HPET in this
case, even though the process isn't going to do anything useful
with the data.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vma.c               | 4 ++--
 arch/x86/entry/vsyscall/vsyscall_gtod.c | 9 ++++++++-
 arch/x86/include/asm/clocksource.h      | 9 +++++----
 arch/x86/include/asm/vgtod.h            | 6 ++++++
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 4b5461b..7c912fe 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -130,7 +130,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 				    __pa_symbol(&__vvar_page) >> PAGE_SHIFT);
 	} else if (sym_offset == image->sym_hpet_page) {
 #ifdef CONFIG_HPET_TIMER
-		if (hpet_address) {
+		if (hpet_address && vclock_was_used(VCLOCK_HPET)) {
 			ret = vm_insert_pfn_prot(
 				vma,
 				(unsigned long)vmf->virtual_address,
@@ -141,7 +141,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 	} else if (sym_offset == image->sym_pvclock_page) {
 		struct pvclock_vsyscall_time_info *pvti =
 			pvclock_pvti_cpu0_va();
-		if (pvti) {
+		if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
 			ret = vm_insert_pfn(
 				vma,
 				(unsigned long)vmf->virtual_address,
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index 51e3304..0fb3a10 100644
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -16,6 +16,8 @@
 #include <asm/vgtod.h>
 #include <asm/vvar.h>
 
+int vclocks_used __read_mostly;
+
 DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data);
 
 void update_vsyscall_tz(void)
@@ -26,12 +28,17 @@ void update_vsyscall_tz(void)
 
 void update_vsyscall(struct timekeeper *tk)
 {
+	int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
 	struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
 
+	/* Mark the new vclock used. */
+	BUILD_BUG_ON(VCLOCK_MAX >= 32);
+	WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 << vclock_mode));
+
 	gtod_write_begin(vdata);
 
 	/* copy vsyscall data */
-	vdata->vclock_mode	= tk->tkr_mono.clock->archdata.vclock_mode;
+	vdata->vclock_mode	= vclock_mode;
 	vdata->cycle_last	= tk->tkr_mono.cycle_last;
 	vdata->mask		= tk->tkr_mono.mask;
 	vdata->mult		= tk->tkr_mono.mult;
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eda81dc..d194266 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -3,10 +3,11 @@
 #ifndef _ASM_X86_CLOCKSOURCE_H
 #define _ASM_X86_CLOCKSOURCE_H
 
-#define VCLOCK_NONE 0  /* No vDSO clock available.	*/
-#define VCLOCK_TSC  1  /* vDSO should use vread_tsc.	*/
-#define VCLOCK_HPET 2  /* vDSO should use vread_hpet.	*/
-#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */
+#define VCLOCK_NONE	0  /* No vDSO clock available.	*/
+#define VCLOCK_TSC	1  /* vDSO should use vread_tsc.	*/
+#define VCLOCK_HPET	2  /* vDSO should use vread_hpet.	*/
+#define VCLOCK_PVCLOCK	3 /* vDSO should use vread_pvclock. */
+#define VCLOCK_MAX	3
 
 struct arch_clocksource_data {
 	int vclock_mode;
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index f556c48..e728699 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -37,6 +37,12 @@ struct vsyscall_gtod_data {
 };
 extern struct vsyscall_gtod_data vsyscall_gtod_data;
 
+extern int vclocks_used;
+static inline bool vclock_was_used(int vclock)
+{
+	return READ_ONCE(vclocks_used) & (1 << vclock);
+}
+
 static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s)
 {
 	unsigned ret;

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-30  4:12 [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Andy Lutomirski
2015-12-30  4:12 ` [PATCH v3 1/7] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Andy Lutomirski
2016-01-05 19:21   ` Borislav Petkov
2016-01-06  9:54   ` [tip:x86/asm] x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST= n tip-bot for Andy Lutomirski
2015-12-30  4:12 ` [PATCH v3 2/7] mm: Add a vm_special_mapping .fault method Andy Lutomirski
2016-01-12 12:02   ` [tip:x86/asm] mm: Add a vm_special_mapping.fault() method tip-bot for Andy Lutomirski
2015-12-30  4:12 ` [PATCH v3 3/7] mm: Add vm_insert_pfn_prot Andy Lutomirski
2016-01-12 12:02   ` [tip:x86/asm] mm: Add vm_insert_pfn_prot() tip-bot for Andy Lutomirski
2015-12-30  4:12 ` [PATCH v3 4/7] x86/vdso: Track each mm's loaded vdso image as well as its base Andy Lutomirski
2016-01-12 12:03   ` [tip:x86/asm] x86/vdso: Track each mm' s loaded vDSO " tip-bot for Andy Lutomirski
2015-12-30  4:12 ` [PATCH v3 5/7] x86,vdso: Use .fault for the vdso text mapping Andy Lutomirski
2016-01-12 12:03   ` [tip:x86/asm] x86/vdso: Use .fault for the vDSO " tip-bot for Andy Lutomirski
2015-12-30  4:12 ` [PATCH v3 6/7] x86,vdso: Use .fault instead of remap_pfn_range for the vvar mapping Andy Lutomirski
2016-01-12 12:04   ` [tip:x86/asm] x86/vdso: Use ->fault() instead of remap_pfn_range( ) " tip-bot for Andy Lutomirski
2015-12-30  4:12 ` [PATCH v3 7/7] x86/vdso: Disallow vvar access to vclock IO for never-used vclocks Andy Lutomirski
2016-01-12 12:04   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-05  0:02 ` [PATCH v3 0/7] mm, x86/vdso: Special IO mapping improvements Kees Cook

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox