linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] [RFC] rework /dev/mem code vs. highmem and DEBUG_VIRTUAL
@ 2013-04-10 23:32 Dave Hansen
  2013-04-10 23:32 ` [PATCH 1/5] clean up checks against "high_memory" variable Dave Hansen
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Dave Hansen @ 2013-04-10 23:32 UTC (permalink / raw)
  To: bp; +Cc: hpa, linux-kernel, x86, Dave Hansen

/dev/[k]mem are a bit of a mess.  They've probably been subtly
broken for a long time on highmem or when the kernel's page
tables have been munged in other ways.

This has only been lightly tested on x86.  Needs some review
and some wider testing before getting applied anywhere.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] clean up checks against "high_memory" variable
  2013-04-10 23:32 [PATCH 0/5] [RFC] rework /dev/mem code vs. highmem and DEBUG_VIRTUAL Dave Hansen
@ 2013-04-10 23:32 ` Dave Hansen
  2013-04-11  0:44   ` Borislav Petkov
  2013-04-10 23:32 ` [PATCH 2/5] make /dev/kmem return error for highmem Dave Hansen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2013-04-10 23:32 UTC (permalink / raw)
  To: bp; +Cc: hpa, linux-kernel, x86, Dave Hansen


We have a new debugging check on x86 that has caught a number
of long-standing bugs.  However, there is a _bit_ of collateral
damage with things that call __pa(high_memory).

We are now checking that any addresses passed to __pa() are
*valid* and can be dereferenced.

"high_memory", however, is not valid.  It marks the start of
highmem, and isn't itself a valid pointer.  But, those users
are really just asking "is this vaddr mapped"?  So, give them
a helper that does that, plus is also kind to our new
debugging check.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/arch/x86/mm/pat.c     |    9 +++++----
 linux.git-davehans/drivers/char/mem.c    |    4 ++--
 linux.git-davehans/drivers/mtd/mtdchar.c |    2 +-
 linux.git-davehans/include/linux/mm.h    |   13 +++++++++++++
 4 files changed, 21 insertions(+), 7 deletions(-)

diff -puN arch/x86/mm/pat.c~clean-up-highmem-checks arch/x86/mm/pat.c
--- linux.git/arch/x86/mm/pat.c~clean-up-highmem-checks	2013-04-10 16:23:44.906086836 -0700
+++ linux.git-davehans/arch/x86/mm/pat.c	2013-04-10 16:23:44.914086844 -0700
@@ -542,7 +542,7 @@ int phys_mem_access_prot_allowed(struct
 	      boot_cpu_has(X86_FEATURE_K6_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
 	      boot_cpu_has(X86_FEATURE_CENTAUR_MCR)) &&
-	    (pfn << PAGE_SHIFT) >= __pa(high_memory)) {
+	    phys_addr_is_highmem(pfn << PAGE_SHIFT)) {
 		flags = _PAGE_CACHE_UC;
 	}
 #endif
@@ -570,9 +570,10 @@ int kernel_map_sync_memtype(u64 base, un
 	if (!page_is_ram(base >> PAGE_SHIFT))
 		return 0;
 
-	id_sz = (__pa(high_memory-1) <= base + size) ?
-				__pa(high_memory) - base :
-				size;
+	if (phys_addr_is_highmem(base + size - 1))
+		id_sz = last_lowmem_phys_addr() - base + 1;
+	else
+		id_sz = size;
 
 	if (ioremap_change_attr((unsigned long)__va(base), id_sz, flags) < 0) {
 		printk(KERN_INFO "%s:%d ioremap_change_attr failed %s "
diff -puN drivers/char/mem.c~clean-up-highmem-checks drivers/char/mem.c
--- linux.git/drivers/char/mem.c~clean-up-highmem-checks	2013-04-10 16:23:44.907086837 -0700
+++ linux.git-davehans/drivers/char/mem.c	2013-04-10 16:23:44.914086844 -0700
@@ -50,7 +50,7 @@ static inline unsigned long size_inside_
 #ifndef ARCH_HAS_VALID_PHYS_ADDR_RANGE
 static inline int valid_phys_addr_range(phys_addr_t addr, size_t count)
 {
-	return addr + count <= __pa(high_memory);
+	return !phys_addr_is_highmem(addr + count);
 }
 
 static inline int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
@@ -249,7 +249,7 @@ static int uncached_access(struct file *
 	 */
 	if (file->f_flags & O_DSYNC)
 		return 1;
-	return addr >= __pa(high_memory);
+	return phys_addr_is_highmem(addr);
 #endif
 }
 #endif
diff -puN drivers/mtd/mtdchar.c~clean-up-highmem-checks drivers/mtd/mtdchar.c
--- linux.git/drivers/mtd/mtdchar.c~clean-up-highmem-checks	2013-04-10 16:23:44.909086839 -0700
+++ linux.git-davehans/drivers/mtd/mtdchar.c	2013-04-10 16:23:44.915086845 -0700
@@ -1189,7 +1189,7 @@ static int mtdchar_mmap(struct file *fil
 		vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
 
 #ifdef pgprot_noncached
-		if (file->f_flags & O_DSYNC || off >= __pa(high_memory))
+		if (file->f_flags & O_DSYNC || phys_addr_is_highmem(off))
 			vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 #endif
 		if (io_remap_pfn_range(vma, vma->vm_start, off >> PAGE_SHIFT,
diff -puN include/linux/mm.h~clean-up-highmem-checks include/linux/mm.h
--- linux.git/include/linux/mm.h~clean-up-highmem-checks	2013-04-10 16:23:44.910086840 -0700
+++ linux.git-davehans/include/linux/mm.h	2013-04-10 16:23:44.916086846 -0700
@@ -1754,5 +1754,18 @@ static inline unsigned int debug_guardpa
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
+static inline phys_addr_t last_lowmem_phys_addr(void)
+{
+	/*
+	 * 'high_memory' is not a pointer that can be dereferenced, so
+	 * avoid calling __pa() on it directly.
+	 */
+	return __pa(high_memory - 1);
+}
+static inline bool phys_addr_is_highmem(phys_addr_t addr)
+{
+	return addr > last_lowmem_phys_addr();
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/5] make /dev/kmem return error for highmem
  2013-04-10 23:32 [PATCH 0/5] [RFC] rework /dev/mem code vs. highmem and DEBUG_VIRTUAL Dave Hansen
  2013-04-10 23:32 ` [PATCH 1/5] clean up checks against "high_memory" variable Dave Hansen
@ 2013-04-10 23:32 ` Dave Hansen
  2013-04-11  9:58   ` Borislav Petkov
  2013-04-10 23:32 ` [PATCH 3/5] avoid /dev/kmem oopses with DEBUG_VIRTUAL Dave Hansen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2013-04-10 23:32 UTC (permalink / raw)
  To: bp; +Cc: hpa, linux-kernel, x86, Dave Hansen


I was auding the /dev/mem code for more questionable uses of
__pa(), and ran across this.

My assumption is that if you use /dev/kmem, you expect to be
able to read the kernel virtual mappings.  However, those
mappings _stop_ as soon as we hit high memory.  The
pfn_valid() check in here is good for memory holes, but since
highmem pages are still valid, it does no good for those.

Also, since we are now checking that __pa() is being done on
valid virtual addresses, this might have tripped the new
check.  Even with the new check, this code would have been
broken with the NUMA remapping code had we not ripped it
out:

	https://patchwork.kernel.org/patch/2075911/

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/drivers/char/mem.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff -puN drivers/char/mem.c~make-kmem-return-error-for-highmem drivers/char/mem.c
--- linux.git/drivers/char/mem.c~make-kmem-return-error-for-highmem	2013-04-10 16:23:45.151087081 -0700
+++ linux.git-davehans/drivers/char/mem.c	2013-04-10 16:23:45.154087084 -0700
@@ -336,10 +336,19 @@ static int mmap_mem(struct file *file, s
 #ifdef CONFIG_DEVKMEM
 static int mmap_kmem(struct file *file, struct vm_area_struct *vma)
 {
+	unsigned long kernel_vaddr;
 	unsigned long pfn;
 
+	kernel_vaddr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	/*
+	 * pfn_valid() (below) does not trip for highmem addresses.  This
+	 * essentially means that we will be mapping gibberish in for them
+	 * instead of what the _kernel_ has mapped at the requested address.
+	 */
+	if (kernel_vaddr >= high_memory)
+		return -EIO;
 	/* Turn a kernel-virtual address into a physical page frame */
-	pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT;
+	pfn = __pa(kernel_vaddr) >> PAGE_SHIFT;
 
 	/*
 	 * RED-PEN: on some architectures there is more mapped memory than
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/5] avoid /dev/kmem oopses with DEBUG_VIRTUAL
  2013-04-10 23:32 [PATCH 0/5] [RFC] rework /dev/mem code vs. highmem and DEBUG_VIRTUAL Dave Hansen
  2013-04-10 23:32 ` [PATCH 1/5] clean up checks against "high_memory" variable Dave Hansen
  2013-04-10 23:32 ` [PATCH 2/5] make /dev/kmem return error for highmem Dave Hansen
@ 2013-04-10 23:32 ` Dave Hansen
  2013-04-10 23:32 ` [PATCH 4/5] break up slow_virt_to_phys() Dave Hansen
  2013-04-10 23:32 ` [PATCH 5/5] keep /dev/kmem from triggering BUG_ON() with DEBUG_VIRTUAL Dave Hansen
  4 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2013-04-10 23:32 UTC (permalink / raw)
  To: bp; +Cc: hpa, linux-kernel, x86, Dave Hansen


DEBUG_VIRTUAL, ensures that when __pa() is only called on
addresses for which its results are correct.  For everything
else, we BUG_ON().

This changes the /dev/kmem ordering so that we check pfn_valid()
_before_ we call __pa().  This ensures we're not calling __pa()
with gibberish and avoids the BUG_ON() in those cases, instead
returning the error to userspace _before_ the __pa() call.



Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/drivers/char/mem.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff -puN drivers/char/mem.c~make-devmem-work-on-highmem3a drivers/char/mem.c
--- linux.git/drivers/char/mem.c~make-devmem-work-on-highmem3a	2013-04-10 16:23:45.361087291 -0700
+++ linux.git-davehans/drivers/char/mem.c	2013-04-10 16:23:45.365087295 -0700
@@ -347,9 +347,6 @@ static int mmap_kmem(struct file *file,
 	 */
 	if (kernel_vaddr >= high_memory)
 		return -EIO;
-	/* Turn a kernel-virtual address into a physical page frame */
-	pfn = __pa(kernel_vaddr) >> PAGE_SHIFT;
-
 	/*
 	 * RED-PEN: on some architectures there is more mapped memory than
 	 * available in mem_map which pfn_valid checks for. Perhaps should add a
@@ -360,6 +357,9 @@ static int mmap_kmem(struct file *file,
 	if (!pfn_valid(pfn))
 		return -EIO;
 
+	/* Turn a kernel-virtual address into a physical page frame */
+	pfn = __pa(kernel_vaddr) >> PAGE_SHIFT;
+
 	vma->vm_pgoff = pfn;
 	return mmap_mem(file, vma);
 }
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/5] break up slow_virt_to_phys()
  2013-04-10 23:32 [PATCH 0/5] [RFC] rework /dev/mem code vs. highmem and DEBUG_VIRTUAL Dave Hansen
                   ` (2 preceding siblings ...)
  2013-04-10 23:32 ` [PATCH 3/5] avoid /dev/kmem oopses with DEBUG_VIRTUAL Dave Hansen
@ 2013-04-10 23:32 ` Dave Hansen
  2013-04-11 12:29   ` Borislav Petkov
  2013-04-10 23:32 ` [PATCH 5/5] keep /dev/kmem from triggering BUG_ON() with DEBUG_VIRTUAL Dave Hansen
  4 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2013-04-10 23:32 UTC (permalink / raw)
  To: bp; +Cc: hpa, linux-kernel, x86, Dave Hansen


I need to use slow_virt_to_phys()'s functionality for addresses
which might not be valid.  So, I need a copy which can cleanly
return errors instead of doing a BUG_ON().

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/arch/x86/mm/pageattr.c |   40 +++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff -puN arch/x86/mm/pageattr.c~break-up-slow-virt_to_phys arch/x86/mm/pageattr.c
--- linux.git/arch/x86/mm/pageattr.c~break-up-slow-virt_to_phys	2013-04-10 16:23:45.571087500 -0700
+++ linux.git-davehans/arch/x86/mm/pageattr.c	2013-04-10 16:23:45.574087504 -0700
@@ -363,18 +363,7 @@ pte_t *lookup_address(unsigned long addr
 }
 EXPORT_SYMBOL_GPL(lookup_address);
 
-/*
- * This is necessary because __pa() does not work on some
- * kinds of memory, like vmalloc() or the alloc_remap()
- * areas on 32-bit NUMA systems.  The percpu areas can
- * end up in this kind of memory, for instance.
- *
- * This could be optimized, but it is only intended to be
- * used at inititalization time, and keeping it
- * unoptimized should increase the testing coverage for
- * the more obscure platforms.
- */
-phys_addr_t slow_virt_to_phys(void *__virt_addr)
+int kernel_lookup_vaddr(void *__virt_addr, phys_addr_t *result)
 {
 	unsigned long virt_addr = (unsigned long)__virt_addr;
 	phys_addr_t phys_addr;
@@ -385,12 +374,35 @@ phys_addr_t slow_virt_to_phys(void *__vi
 	pte_t *pte;
 
 	pte = lookup_address(virt_addr, &level);
-	BUG_ON(!pte);
+	if (!pte)
+		return -EFAULT;
 	psize = page_level_size(level);
 	pmask = page_level_mask(level);
 	offset = virt_addr & ~pmask;
 	phys_addr = pte_pfn(*pte) << PAGE_SHIFT;
-	return (phys_addr | offset);
+	*result = (phys_addr | offset);
+	return 0;
+}
+
+/*
+ * This is necessary because __pa() does not work on some
+ * kinds of memory, like vmalloc() or the alloc_remap()
+ * areas on 32-bit NUMA systems.  The percpu areas can
+ * end up in this kind of memory, for instance.
+ *
+ * This could be optimized, but it is only intended to be
+ * used at inititalization time, and keeping it
+ * unoptimized should increase the testing coverage for
+ * the more obscure platforms.
+ */
+phys_addr_t slow_virt_to_phys(void *virt_addr)
+{
+	phys_addr_t result;
+	int ret;
+
+	ret = kernel_lookup_vaddr(virt_addr, &result);
+	BUG_ON(ret);
+	return result;
 }
 EXPORT_SYMBOL_GPL(slow_virt_to_phys);
 
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/5] keep /dev/kmem from triggering BUG_ON() with DEBUG_VIRTUAL
  2013-04-10 23:32 [PATCH 0/5] [RFC] rework /dev/mem code vs. highmem and DEBUG_VIRTUAL Dave Hansen
                   ` (3 preceding siblings ...)
  2013-04-10 23:32 ` [PATCH 4/5] break up slow_virt_to_phys() Dave Hansen
@ 2013-04-10 23:32 ` Dave Hansen
  2013-04-11 12:37   ` Borislav Petkov
  4 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2013-04-10 23:32 UTC (permalink / raw)
  To: bp; +Cc: hpa, linux-kernel, x86, Dave Hansen


/dev/kmem can essentially be used to make the kernel call __pa()
on any address.  But, with DEBUG_VIRTUAL, we are now ensuring
that when __pa() is only called on addresses for which its
results are correct.  For everything else, we BUG_ON().

What this means is that a hapless /dev/kmem user can
unintentionally make the kernel BUG_ON().

This adds a function, is_kernel_linear_vaddr() to the /dev/kmem
code to ensure that __pa() will not BUG_ON().  It will, instead,
return an error up to the user.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/arch/x86/mm/pageattr.c  |   19 +++++++++++++++++++
 linux.git-davehans/drivers/char/mem.c      |    8 ++++++++
 linux.git-davehans/include/linux/mmdebug.h |    2 ++
 3 files changed, 29 insertions(+)

diff -puN arch/x86/mm/pageattr.c~make-devmem-work-on-highmem3b arch/x86/mm/pageattr.c
--- linux.git/arch/x86/mm/pageattr.c~make-devmem-work-on-highmem3b	2013-04-10 16:23:45.780087708 -0700
+++ linux.git-davehans/arch/x86/mm/pageattr.c	2013-04-10 16:23:45.786087714 -0700
@@ -406,6 +406,25 @@ phys_addr_t slow_virt_to_phys(void *virt
 }
 EXPORT_SYMBOL_GPL(slow_virt_to_phys);
 
+#ifdef CONFIG_DEBUG_VIRTUAL
+int is_kernel_linear_vaddr(void *kernel_vaddr)
+{
+	int err;
+	unsigned long pte_walk_paddr;
+	unsigned long linear_paddr = __pa_nodebug(kernel_vaddr);
+
+	err = kernel_lookup_vaddr(kernel_vaddr, &pte_walk_paddr);
+	/* if there is no pte, it can not be in the normal linear map */
+	if (err)
+		return 0;
+	/* is there is a pte which does not match the linear layout? */
+	if (pte_walk_paddr != linear_paddr)
+		return 0;
+
+	return 1;
+}
+#endif /* CONFIG_DEBUG_VIRTUAL */
+
 /*
  * Set the new pmd in all the pgds we know about:
  */
diff -puN drivers/char/mem.c~make-devmem-work-on-highmem3b drivers/char/mem.c
--- linux.git/drivers/char/mem.c~make-devmem-work-on-highmem3b	2013-04-10 16:23:45.781087709 -0700
+++ linux.git-davehans/drivers/char/mem.c	2013-04-10 16:23:45.787087715 -0700
@@ -357,6 +357,14 @@ static int mmap_kmem(struct file *file,
 	if (!pfn_valid(pfn))
 		return -EIO;
 
+	/*
+	 * the __pa() calculation below only makes sense if the kernel's
+	 * pagetables are set up in the way that __pa() expects.  Otherwise,
+	 * we are effectively mapping random memory in to place.
+	 */
+	if (!is_kernel_linear_vaddr(kernel_vaddr))
+		return -EIO;
+
 	/* Turn a kernel-virtual address into a physical page frame */
 	pfn = __pa(kernel_vaddr) >> PAGE_SHIFT;
 
diff -puN include/linux/mmdebug.h~make-devmem-work-on-highmem3b include/linux/mmdebug.h
--- linux.git/include/linux/mmdebug.h~make-devmem-work-on-highmem3b	2013-04-10 16:23:45.783087711 -0700
+++ linux.git-davehans/include/linux/mmdebug.h	2013-04-10 16:23:45.787087715 -0700
@@ -9,8 +9,10 @@
 
 #ifdef CONFIG_DEBUG_VIRTUAL
 #define VIRTUAL_BUG_ON(cond) BUG_ON(cond)
+extern int is_kernel_linear_vaddr(void *kernel_vaddr);
 #else
 #define VIRTUAL_BUG_ON(cond) do { } while (0)
+static inline int is_kernel_linear_vaddr(void *kernel_vaddr) { return 1; }
 #endif
 
 #endif
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] clean up checks against "high_memory" variable
  2013-04-10 23:32 ` [PATCH 1/5] clean up checks against "high_memory" variable Dave Hansen
@ 2013-04-11  0:44   ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2013-04-11  0:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: hpa, linux-kernel, x86

On Wed, Apr 10, 2013 at 04:32:50PM -0700, Dave Hansen wrote:
> 
> We have a new debugging check on x86 that has caught a number
> of long-standing bugs.  However, there is a _bit_ of collateral
> damage with things that call __pa(high_memory).
> 
> We are now checking that any addresses passed to __pa() are
> *valid* and can be dereferenced.
> 
> "high_memory", however, is not valid.  It marks the start of
> highmem, and isn't itself a valid pointer.  But, those users
> are really just asking "is this vaddr mapped"?  So, give them
> a helper that does that, plus is also kind to our new
> debugging check.
> 
> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

FWIW,

this one already fixes the BUG below I'm seeing when booting 32-bit
Atom guests in qemu.

Tested-by: Borislav Petkov <bp@suse.de>

More testing tomorrow.

[   57.035880] ------------[ cut here ]------------
[   57.036190] kernel BUG at arch/x86/mm/physaddr.c:79!
[   57.036190] invalid opcode: 0000 [#1] PREEMPT SMP 
[   57.036190] Modules linked in:
[   57.036190] Pid: 2555, comm: wdm Tainted: G S      W    3.9.0-rc6+ #5 Bochs Bochs
[   57.036190] EIP: 0060:[<c10321b1>] EFLAGS: 00010206 CPU: 56
[   57.036190] EIP is at __phys_addr+0x71/0x80
[   57.036190] EAX: 00000000 EBX: 373fe000 ECX: 0000000c EDX: 00000000
[   57.036190] ESI: 00000000 EDI: 00002000 EBP: ec509f40 ESP: ec509f3c
[   57.036190]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   57.036190] CR0: 8005003b CR2: b7620b48 CR3: 2f5b5000 CR4: 000006d0
[   57.036190] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   57.036190] DR6: ffff0ff0 DR7: 00000400
[   57.036190] Process wdm (pid: 2555, ti=ec508000 task=ec500000 task.ti=ec508000)
[   57.036190] Stack:
[   57.036190]  00002000 ec509f64 c1304268 ec570000 ec50e300 bf87299c 00002000 ec50e300
[   57.036190]  c1304240 00002000 ec509f8c c1120276 ec509f98 00000000 ed540314 c1304240
[   57.036190]  ec570000 ec50e300 00000000 08065294 ec509fac c1120488 ec509f98 00000000
[   57.036190] Call Trace:
[   57.036190]  [<c1304268>] read_mem+0x28/0xf0
[   57.036190]  [<c1304240>] ? write_mem+0xf0/0xf0
[   57.036190]  [<c1120276>] vfs_read+0x86/0x140
[   57.036190]  [<c1304240>] ? write_mem+0xf0/0xf0
[   57.036190]  [<c1120488>] sys_read+0x48/0x90
[   57.036190]  [<c15231fe>] sysenter_do_call+0x12/0x36
[   57.036190] Code: 39 0b c2 81 c2 00 00 80 00 39 d0 72 cc 8b 15 2c 4c 74 c1 81 ea 00 a0 a4 00 81 e2 00 00 c0 ff 81 ea 00 20 00 00 39 d0 73 b0 0f 0b <0f> 0b 0f 0b 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 53 3e 8d
[   57.036190] EIP: [<c10321b1>] __phys_addr+0x71/0x80 SS:ESP 0068:ec509f3c
[   57.619214] ---[ end trace 4e05f3724c460358 ]---

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] make /dev/kmem return error for highmem
  2013-04-10 23:32 ` [PATCH 2/5] make /dev/kmem return error for highmem Dave Hansen
@ 2013-04-11  9:58   ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2013-04-11  9:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: hpa, linux-kernel, x86

Just nitpicks:

On Wed, Apr 10, 2013 at 04:32:52PM -0700, Dave Hansen wrote:
> 
> I was auding the /dev/mem code for more questionable uses of

	auditing

> __pa(), and ran across this.
> 
> My assumption is that if you use /dev/kmem, you expect to be
> able to read the kernel virtual mappings.  However, those
> mappings _stop_ as soon as we hit high memory.  The
> pfn_valid() check in here is good for memory holes, but since
> highmem pages are still valid, it does no good for those.
> 
> Also, since we are now checking that __pa() is being done on
> valid virtual addresses, this might have tripped the new
> check.  Even with the new check, this code would have been
> broken with the NUMA remapping code had we not ripped it
> out:
> 
> 	https://patchwork.kernel.org/patch/2075911/

This is an upstream commit so you probably might want to state the
commit id here instead of some website which may or may not exist in the
future.

> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  linux.git-davehans/drivers/char/mem.c |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff -puN drivers/char/mem.c~make-kmem-return-error-for-highmem drivers/char/mem.c
> --- linux.git/drivers/char/mem.c~make-kmem-return-error-for-highmem	2013-04-10 16:23:45.151087081 -0700
> +++ linux.git-davehans/drivers/char/mem.c	2013-04-10 16:23:45.154087084 -0700
> @@ -336,10 +336,19 @@ static int mmap_mem(struct file *file, s
>  #ifdef CONFIG_DEVKMEM
>  static int mmap_kmem(struct file *file, struct vm_area_struct *vma)
>  {
> +	unsigned long kernel_vaddr;
>  	unsigned long pfn;
>  
> +	kernel_vaddr = (u64)vma->vm_pgoff << PAGE_SHIFT;
> +	/*
> +	 * pfn_valid() (below) does not trip for highmem addresses.  This
> +	 * essentially means that we will be mapping gibberish in for them
> +	 * instead of what the _kernel_ has mapped at the requested address.
> +	 */
> +	if (kernel_vaddr >= high_memory)
> +		return -EIO;

drivers/char/mem.c: In function ‘mmap_kmem’:
drivers/char/mem.c:348:19: warning: comparison between pointer and integer [enabled by default]

>  	/* Turn a kernel-virtual address into a physical page frame */
> -	pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT;
> +	pfn = __pa(kernel_vaddr) >> PAGE_SHIFT;
>  
>  	/*
>  	 * RED-PEN: on some architectures there is more mapped memory than
> _
> 

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] break up slow_virt_to_phys()
  2013-04-10 23:32 ` [PATCH 4/5] break up slow_virt_to_phys() Dave Hansen
@ 2013-04-11 12:29   ` Borislav Petkov
  2013-04-11 16:28     ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2013-04-11 12:29 UTC (permalink / raw)
  To: Dave Hansen; +Cc: hpa, linux-kernel, x86

On Wed, Apr 10, 2013 at 04:32:54PM -0700, Dave Hansen wrote:
> +phys_addr_t slow_virt_to_phys(void *virt_addr)
> +{
> +	phys_addr_t result;
> +	int ret;
> +
> +	ret = kernel_lookup_vaddr(virt_addr, &result);
> +	BUG_ON(ret);

Isn't that BUG_ON still too harsh though? How about WARN_ON instead?
It would still create a lot of noise so that it gets fixed without
bringing down the system.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] keep /dev/kmem from triggering BUG_ON() with DEBUG_VIRTUAL
  2013-04-10 23:32 ` [PATCH 5/5] keep /dev/kmem from triggering BUG_ON() with DEBUG_VIRTUAL Dave Hansen
@ 2013-04-11 12:37   ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2013-04-11 12:37 UTC (permalink / raw)
  To: Dave Hansen; +Cc: hpa, linux-kernel, x86

Some more compiler warnings. Btw, I'm building 32-bit only for now - no
64-bit testing yet.

On Wed, Apr 10, 2013 at 04:32:56PM -0700, Dave Hansen wrote:
> 
> /dev/kmem can essentially be used to make the kernel call __pa()
> on any address.  But, with DEBUG_VIRTUAL, we are now ensuring
> that when __pa() is only called on addresses for which its
> results are correct.  For everything else, we BUG_ON().
> 
> What this means is that a hapless /dev/kmem user can
> unintentionally make the kernel BUG_ON().
> 
> This adds a function, is_kernel_linear_vaddr() to the /dev/kmem
> code to ensure that __pa() will not BUG_ON().  It will, instead,
> return an error up to the user.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  linux.git-davehans/arch/x86/mm/pageattr.c  |   19 +++++++++++++++++++
>  linux.git-davehans/drivers/char/mem.c      |    8 ++++++++
>  linux.git-davehans/include/linux/mmdebug.h |    2 ++
>  3 files changed, 29 insertions(+)
> 
> diff -puN arch/x86/mm/pageattr.c~make-devmem-work-on-highmem3b arch/x86/mm/pageattr.c
> --- linux.git/arch/x86/mm/pageattr.c~make-devmem-work-on-highmem3b	2013-04-10 16:23:45.780087708 -0700
> +++ linux.git-davehans/arch/x86/mm/pageattr.c	2013-04-10 16:23:45.786087714 -0700
> @@ -406,6 +406,25 @@ phys_addr_t slow_virt_to_phys(void *virt
>  }
>  EXPORT_SYMBOL_GPL(slow_virt_to_phys);
>  
> +#ifdef CONFIG_DEBUG_VIRTUAL
> +int is_kernel_linear_vaddr(void *kernel_vaddr)
> +{
> +	int err;
> +	unsigned long pte_walk_paddr;
> +	unsigned long linear_paddr = __pa_nodebug(kernel_vaddr);
> +
> +	err = kernel_lookup_vaddr(kernel_vaddr, &pte_walk_paddr);

arch/x86/mm/pageattr.c: In function ‘is_kernel_linear_vaddr’:
arch/x86/mm/pageattr.c:416:2: warning: passing argument 2 of ‘kernel_lookup_vaddr’ from incompatible pointer type [enabled by default]
arch/x86/mm/pageattr.c:366:5: note: expected ‘phys_addr_t *’ but argument is of type ‘long unsigned int *’


> +	/* if there is no pte, it can not be in the normal linear map */
> +	if (err)
> +		return 0;
> +	/* is there is a pte which does not match the linear layout? */
> +	if (pte_walk_paddr != linear_paddr)
> +		return 0;
> +
> +	return 1;
> +}
> +#endif /* CONFIG_DEBUG_VIRTUAL */
> +
>  /*
>   * Set the new pmd in all the pgds we know about:
>   */
> diff -puN drivers/char/mem.c~make-devmem-work-on-highmem3b drivers/char/mem.c
> --- linux.git/drivers/char/mem.c~make-devmem-work-on-highmem3b	2013-04-10 16:23:45.781087709 -0700
> +++ linux.git-davehans/drivers/char/mem.c	2013-04-10 16:23:45.787087715 -0700
> @@ -357,6 +357,14 @@ static int mmap_kmem(struct file *file,
>  	if (!pfn_valid(pfn))
>  		return -EIO;
>  
> +	/*
> +	 * the __pa() calculation below only makes sense if the kernel's
> +	 * pagetables are set up in the way that __pa() expects.  Otherwise,
> +	 * we are effectively mapping random memory in to place.
> +	 */
> +	if (!is_kernel_linear_vaddr(kernel_vaddr))

Ditto:

drivers/char/mem.c:365:2: warning: passing argument 1 of ‘is_kernel_linear_vaddr’ makes pointer from integer without a cast [enabled by default]


> +		return -EIO;
> +
>  	/* Turn a kernel-virtual address into a physical page frame */
>  	pfn = __pa(kernel_vaddr) >> PAGE_SHIFT;
>  
> diff -puN include/linux/mmdebug.h~make-devmem-work-on-highmem3b include/linux/mmdebug.h
> --- linux.git/include/linux/mmdebug.h~make-devmem-work-on-highmem3b	2013-04-10 16:23:45.783087711 -0700
> +++ linux.git-davehans/include/linux/mmdebug.h	2013-04-10 16:23:45.787087715 -0700
> @@ -9,8 +9,10 @@
>  
>  #ifdef CONFIG_DEBUG_VIRTUAL
>  #define VIRTUAL_BUG_ON(cond) BUG_ON(cond)
> +extern int is_kernel_linear_vaddr(void *kernel_vaddr);
>  #else
>  #define VIRTUAL_BUG_ON(cond) do { } while (0)
> +static inline int is_kernel_linear_vaddr(void *kernel_vaddr) { return 1; }a

In file included from include/linux/gfp.h:8:0,
                 from include/linux/mm.h:8,
                 from drivers/char/mem.c:11:
include/linux/mmdebug.h:12:12: note: expected ‘void *’ but argument is of type ‘long unsigned int’
In file included from include/linux/gfp.h:4:0,
                 from include/linux/mm.h:8,
                 from drivers/char/mem.c:11:
include/linux/mmzone.h:1187:6: warning: ‘pfn’ may be used uninitialized in this function [-Wuninitialized]
drivers/char/mem.c:340:16: note: ‘pfn’ was declared here

HTH.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] break up slow_virt_to_phys()
  2013-04-11 12:29   ` Borislav Petkov
@ 2013-04-11 16:28     ` Dave Hansen
  2013-04-11 17:12       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2013-04-11 16:28 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: hpa, linux-kernel, x86

On 04/11/2013 05:29 AM, Borislav Petkov wrote:
> On Wed, Apr 10, 2013 at 04:32:54PM -0700, Dave Hansen wrote:
>> +phys_addr_t slow_virt_to_phys(void *virt_addr)
>> +{
>> +	phys_addr_t result;
>> +	int ret;
>> +
>> +	ret = kernel_lookup_vaddr(virt_addr, &result);
>> +	BUG_ON(ret);
> 
> Isn't that BUG_ON still too harsh though? How about WARN_ON instead?
> It would still create a lot of noise so that it gets fixed without
> bringing down the system.

It's harsh for the cases where __pa()'s result never gets used directly,
like for checking against 'high_memory'.  If it gets used (like in
/dev/mem's case) the kernel really is doing something it does not intend
to do.  It's essentially reading or writing garbage.

On the other hand, I guess if we've been letting it do that bad thing
for years, we might as well continue.  We're not causing any _more_ damage.

I'm leaning toward making it a WARN_ON() at the moment. :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] break up slow_virt_to_phys()
  2013-04-11 16:28     ` Dave Hansen
@ 2013-04-11 17:12       ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2013-04-11 17:12 UTC (permalink / raw)
  To: Dave Hansen; +Cc: hpa, linux-kernel, x86

On Thu, Apr 11, 2013 at 09:28:59AM -0700, Dave Hansen wrote:
> It's harsh for the cases where __pa()'s result never gets used
> directly, like for checking against 'high_memory'. If it gets used
> (like in /dev/mem's case) the kernel really is doing something it does
> not intend to do. It's essentially reading or writing garbage.
>
> On the other hand, I guess if we've been letting it do that bad thing
> for years, we might as well continue. We're not causing any _more_
> damage.

Well, if it wasn't an already used interface I'd say we forbid accesses
which are void of sense and issue a WARN_ON but who knows who'll come
screaming if we did that.

> I'm leaning toward making it a WARN_ON() at the moment. :)

Right, well, yeah, people would still be able to shoot themselves in the
foot over /dev/mem but now they'll at least be warned about it. Kernel's
job's done. :)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-04-11 17:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-10 23:32 [PATCH 0/5] [RFC] rework /dev/mem code vs. highmem and DEBUG_VIRTUAL Dave Hansen
2013-04-10 23:32 ` [PATCH 1/5] clean up checks against "high_memory" variable Dave Hansen
2013-04-11  0:44   ` Borislav Petkov
2013-04-10 23:32 ` [PATCH 2/5] make /dev/kmem return error for highmem Dave Hansen
2013-04-11  9:58   ` Borislav Petkov
2013-04-10 23:32 ` [PATCH 3/5] avoid /dev/kmem oopses with DEBUG_VIRTUAL Dave Hansen
2013-04-10 23:32 ` [PATCH 4/5] break up slow_virt_to_phys() Dave Hansen
2013-04-11 12:29   ` Borislav Petkov
2013-04-11 16:28     ` Dave Hansen
2013-04-11 17:12       ` Borislav Petkov
2013-04-10 23:32 ` [PATCH 5/5] keep /dev/kmem from triggering BUG_ON() with DEBUG_VIRTUAL Dave Hansen
2013-04-11 12:37   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).