All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND PATCH 0/4] x86: /dev/mem fixes
@ 2014-09-29 11:32 ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Hi all,

this is a combination of two patch sets I sent a while ago
1) Prevent possible PTE corruption with /dev/mem mmap
2) x86: allow read/write /dev/mem to access non-system RAM above high_memory

The original thread with both patch sets can be found here
   https://lkml.org/lkml/2014/8/14/229
   lkml: <1408025927-16826-1-git-send-email-fhrbata@redhat.com>

1) Prevent possible PTE corruption with /dev/mem mmap
x86: add arch_pfn_possible helper
x86: add phys addr validity check for /dev/mem mmap

Many thanks goes to Dave Hansen, who helped with the "final" check. Other than
that it did not get much attention, except H. Peter Anvin's complain that having
two checks for mmap and read/write for /dev/mem access is ridiculous. I for sure
do not object to this, but AFAICT it's not that simple to unify them and it's not
"directly" related to the PTE corruption. Please note that there are other
archs(ia64, arm) using these check. But I for sure can be missing something.

What this patch set does is using the existing interface to implement x86 specific
check in the least invasive way.

Anyway I tried to remove the high_memory check with a follow-up patch set 2)

2) x86: allow read/write /dev/mem to access non-system RAM above high_memory
x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
x86: remove high_memory check from valid_phys_addr_range

This is an attempt to remove the high_memory limit for the read/write access to
/dev/mem. IMHO there is no reason for this limit on x86. It is presented in
the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC
it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the
direct kernel mapping __va translation. Since the valid_phys_addr_range is
called as the first check in (read|write)_mem, it basically does not allow to
access anything above high_memory on x86.

The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so
the direct kernel mapping can be safely used for system RAM bellow high_memory.
This is IMHO the only valid reason to use high_memory check in (read|write)_mem.

The second patch removes the high_memory check from valid_phys_addr_range,
allowing read/write to access non-system RAM above high_memory. So far this
was possible only by using mmap.

I hope I haven't overlooked something.

Many thanks

Frantisek Hrbata (4):
  x86: add arch_pfn_possible helper
  x86: add phys addr validity check for /dev/mem mmap
  x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  x86: remove high_memory check from valid_phys_addr_range

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/ioremap.c     |  9 ++++++---
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 arch/x86/mm/physaddr.h    |  9 +++++++--
 4 files changed, 29 insertions(+), 5 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RESEND PATCH 0/4] x86: /dev/mem fixes
@ 2014-09-29 11:32 ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Hi all,

this is a combination of two patch sets I sent a while ago
1) Prevent possible PTE corruption with /dev/mem mmap
2) x86: allow read/write /dev/mem to access non-system RAM above high_memory

The original thread with both patch sets can be found here
   https://lkml.org/lkml/2014/8/14/229
   lkml: <1408025927-16826-1-git-send-email-fhrbata@redhat.com>

1) Prevent possible PTE corruption with /dev/mem mmap
x86: add arch_pfn_possible helper
x86: add phys addr validity check for /dev/mem mmap

Many thanks goes to Dave Hansen, who helped with the "final" check. Other than
that it did not get much attention, except H. Peter Anvin's complain that having
two checks for mmap and read/write for /dev/mem access is ridiculous. I for sure
do not object to this, but AFAICT it's not that simple to unify them and it's not
"directly" related to the PTE corruption. Please note that there are other
archs(ia64, arm) using these check. But I for sure can be missing something.

What this patch set does is using the existing interface to implement x86 specific
check in the least invasive way.

Anyway I tried to remove the high_memory check with a follow-up patch set 2)

2) x86: allow read/write /dev/mem to access non-system RAM above high_memory
x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
x86: remove high_memory check from valid_phys_addr_range

This is an attempt to remove the high_memory limit for the read/write access to
/dev/mem. IMHO there is no reason for this limit on x86. It is presented in
the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC
it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the
direct kernel mapping __va translation. Since the valid_phys_addr_range is
called as the first check in (read|write)_mem, it basically does not allow to
access anything above high_memory on x86.

The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so
the direct kernel mapping can be safely used for system RAM bellow high_memory.
This is IMHO the only valid reason to use high_memory check in (read|write)_mem.

The second patch removes the high_memory check from valid_phys_addr_range,
allowing read/write to access non-system RAM above high_memory. So far this
was possible only by using mmap.

I hope I haven't overlooked something.

Many thanks

Frantisek Hrbata (4):
  x86: add arch_pfn_possible helper
  x86: add phys addr validity check for /dev/mem mmap
  x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  x86: remove high_memory check from valid_phys_addr_range

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/ioremap.c     |  9 ++++++---
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 arch/x86/mm/physaddr.h    |  9 +++++++--
 4 files changed, 29 insertions(+), 5 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RESEND PATCH 1/4] x86: add arch_pfn_possible helper
  2014-09-29 11:32 ` Frantisek Hrbata
@ 2014-09-29 11:32   ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Add helper to check maximum possible pfn on x86. Also make the current
phys_addr_valid helper using it internally.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/physaddr.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h
index a3cd5a0..9df8e3a 100644
--- a/arch/x86/mm/physaddr.h
+++ b/arch/x86/mm/physaddr.h
@@ -1,10 +1,15 @@
 #include <asm/processor.h>
 
-static inline int phys_addr_valid(resource_size_t addr)
+static inline int arch_pfn_possible(unsigned long pfn)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
-	return !(addr >> boot_cpu_data.x86_phys_bits);
+	return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT));
 #else
 	return 1;
 #endif
 }
+
+static inline int phys_addr_valid(resource_size_t addr)
+{
+	return arch_pfn_possible(addr >> PAGE_SHIFT);
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RESEND PATCH 1/4] x86: add arch_pfn_possible helper
@ 2014-09-29 11:32   ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Add helper to check maximum possible pfn on x86. Also make the current
phys_addr_valid helper using it internally.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/physaddr.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h
index a3cd5a0..9df8e3a 100644
--- a/arch/x86/mm/physaddr.h
+++ b/arch/x86/mm/physaddr.h
@@ -1,10 +1,15 @@
 #include <asm/processor.h>
 
-static inline int phys_addr_valid(resource_size_t addr)
+static inline int arch_pfn_possible(unsigned long pfn)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
-	return !(addr >> boot_cpu_data.x86_phys_bits);
+	return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT));
 #else
 	return 1;
 #endif
 }
+
+static inline int phys_addr_valid(resource_size_t addr)
+{
+	return arch_pfn_possible(addr >> PAGE_SHIFT);
+}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
  2014-09-29 11:32 ` Frantisek Hrbata
@ 2014-09-29 11:33   ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..55c59d5 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 919b912..c8acb10 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct va_alignment __read_mostly va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
+{
+	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
@ 2014-09-29 11:33   ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..55c59d5 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 919b912..c8acb10 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct va_alignment __read_mostly va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
+{
+	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
+}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RESEND PATCH 3/4] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  2014-09-29 11:32 ` Frantisek Hrbata
@ 2014-09-29 11:33   ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic
high_memory check in valid_phys_addr_range(), which does not allow to access any
memory above high_memory whatsoever. By adding the high_memory check to
(xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for
kernel mapped memory and it will also allow read/write to access non-system RAM
above high_memory once the high_memory check is removed from
valid_phys_addr_range.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/ioremap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..1ae7323 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	void *addr;
 	unsigned long start = phys & PAGE_MASK;
 
-	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	/*
+	 * If page is RAM and is mapped by kernel, we can use __va.
+	 * Otherwise ioremap and unmap.
+	 */
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RESEND PATCH 3/4] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
@ 2014-09-29 11:33   ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic
high_memory check in valid_phys_addr_range(), which does not allow to access any
memory above high_memory whatsoever. By adding the high_memory check to
(xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for
kernel mapped memory and it will also allow read/write to access non-system RAM
above high_memory once the high_memory check is removed from
valid_phys_addr_range.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/ioremap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..1ae7323 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	void *addr;
 	unsigned long start = phys & PAGE_MASK;
 
-	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	/*
+	 * If page is RAM and is mapped by kernel, we can use __va.
+	 * Otherwise ioremap and unmap.
+	 */
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RESEND PATCH 4/4] x86: remove high_memory check from valid_phys_addr_range
  2014-09-29 11:32 ` Frantisek Hrbata
@ 2014-09-29 11:33   ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

There is no need to block read/write access to /dev/mem for phys. addr. above
high_memory for non-system RAM. The only limitation should be
boot_cpu_data.x86_phys_bits(max phys. addr. size).

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c8acb10..7fa0242 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 int valid_phys_addr_range(phys_addr_t addr, size_t count)
 {
-	return addr + count <= __pa(high_memory);
+	return arch_pfn_possible((addr + count) >> PAGE_SHIFT);
 }
 
 int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RESEND PATCH 4/4] x86: remove high_memory check from valid_phys_addr_range
@ 2014-09-29 11:33   ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-29 11:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

There is no need to block read/write access to /dev/mem for phys. addr. above
high_memory for non-system RAM. The only limitation should be
boot_cpu_data.x86_phys_bits(max phys. addr. size).

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c8acb10..7fa0242 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 int valid_phys_addr_range(phys_addr_t addr, size_t count)
 {
-	return addr + count <= __pa(high_memory);
+	return arch_pfn_possible((addr + count) >> PAGE_SHIFT);
 }
 
 int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
  2014-09-29 11:33   ` Frantisek Hrbata
@ 2014-09-29 20:15     ` Thomas Gleixner
  -1 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2014-09-29 20:15 UTC (permalink / raw)
  To: Frantisek Hrbata
  Cc: linux-kernel, linux-mm, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman,
	hannsj_uhl, torvalds

On Mon, 29 Sep 2014, Frantisek Hrbata wrote:
> V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

AFAICT, Dave also asked you to change size_t count into something more
intuitive, i.e. nr_bytes or such.

> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)

And right he is. I really had to look twice to see that count is
actually number of bytes and not number of pages, which is what you
expect after pfn.

> +{
> +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> +}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
@ 2014-09-29 20:15     ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2014-09-29 20:15 UTC (permalink / raw)
  To: Frantisek Hrbata
  Cc: linux-kernel, linux-mm, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman,
	hannsj_uhl, torvalds

On Mon, 29 Sep 2014, Frantisek Hrbata wrote:
> V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

AFAICT, Dave also asked you to change size_t count into something more
intuitive, i.e. nr_bytes or such.

> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)

And right he is. I really had to look twice to see that count is
actually number of bytes and not number of pages, which is what you
expect after pfn.

> +{
> +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> +}

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
  2014-09-29 20:15     ` Thomas Gleixner
@ 2014-09-30 12:41       ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 12:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-mm, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman,
	hannsj_uhl, torvalds

On Mon, Sep 29, 2014 at 10:15:28PM +0200, Thomas Gleixner wrote:
> On Mon, 29 Sep 2014, Frantisek Hrbata wrote:
> > V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen
> 
> AFAICT, Dave also asked you to change size_t count into something more
> intuitive, i.e. nr_bytes or such.

Hi,

mea culpa, I for unknown reason changed it from "size" to "count". I guess some
cut&paste mess. The correct prototype used elsewhere in kernel is

int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)

Does it make sense to replace "count" with "size" so it's consistent with the
rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?

I will fix this and I'm sorry Dave I did not change it as discussed. It totally
slipped my mind.

Many thanks Thomas.

> 
> > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> 
> And right he is. I really had to look twice to see that count is
> actually number of bytes and not number of pages, which is what you
> expect after pfn.
> 
> > +{
> > +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> > +}
> 
> Thanks,
> 
> 	tglx

-- 
Frantisek Hrbata

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
@ 2014-09-30 12:41       ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 12:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-mm, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman,
	hannsj_uhl, torvalds

On Mon, Sep 29, 2014 at 10:15:28PM +0200, Thomas Gleixner wrote:
> On Mon, 29 Sep 2014, Frantisek Hrbata wrote:
> > V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen
> 
> AFAICT, Dave also asked you to change size_t count into something more
> intuitive, i.e. nr_bytes or such.

Hi,

mea culpa, I for unknown reason changed it from "size" to "count". I guess some
cut&paste mess. The correct prototype used elsewhere in kernel is

int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)

Does it make sense to replace "count" with "size" so it's consistent with the
rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?

I will fix this and I'm sorry Dave I did not change it as discussed. It totally
slipped my mind.

Many thanks Thomas.

> 
> > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> 
> And right he is. I really had to look twice to see that count is
> actually number of bytes and not number of pages, which is what you
> expect after pfn.
> 
> > +{
> > +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> > +}
> 
> Thanks,
> 
> 	tglx

-- 
Frantisek Hrbata

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
  2014-09-30 12:41       ` Frantisek Hrbata
@ 2014-09-30 14:27         ` Dave Hansen
  -1 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2014-09-30 14:27 UTC (permalink / raw)
  To: Frantisek Hrbata, Thomas Gleixner
  Cc: linux-kernel, linux-mm, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

On 09/30/2014 05:41 AM, Frantisek Hrbata wrote:
> Does it make sense to replace "count" with "size" so it's consistent with the
> rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?

I don't care what it is as long as it has a unit in it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
@ 2014-09-30 14:27         ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2014-09-30 14:27 UTC (permalink / raw)
  To: Frantisek Hrbata, Thomas Gleixner
  Cc: linux-kernel, linux-mm, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

On 09/30/2014 05:41 AM, Frantisek Hrbata wrote:
> Does it make sense to replace "count" with "size" so it's consistent with the
> rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?

I don't care what it is as long as it has a unit in it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/4] x86: /dev/mem fixes
  2014-09-29 11:32 ` Frantisek Hrbata
@ 2014-09-30 16:24   ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

This is a second version of the patch set. The only change is in

2/4 x86: add phys addr validity check for /dev/mem mmap

where the "count" was replaced with "len_bytes" for better readability.

The rest is just a refresh because of this change.

The original message follows ...

Hi all,

this is a combination of two patch sets I sent a while ago
1) Prevent possible PTE corruption with /dev/mem mmap
2) x86: allow read/write /dev/mem to access non-system RAM above high_memory

The original thread with both patch sets can be found here
   https://lkml.org/lkml/2014/8/14/229
   lkml: <1408025927-16826-1-git-send-email-fhrbata@redhat.com>

1) Prevent possible PTE corruption with /dev/mem mmap
x86: add arch_pfn_possible helper
x86: add phys addr validity check for /dev/mem mmap

Many thanks goes to Dave Hansen, who helped with the "final" check. Other than
that it did not get much attention, except H. Peter Anvin's complain that having
two checks for mmap and read/write for /dev/mem access is ridiculous. I for sure
do not object to this, but AFAICT it's not that simple to unify them and it's not
"directly" related to the PTE corruption. Please note that there are other
archs(ia64, arm) using these check. But I for sure can be missing something.

What this patch set does is using the existing interface to implement x86 specific
check in the least invasive way.

Anyway I tried to remove the high_memory check with a follow-up patch set 2)

2) x86: allow read/write /dev/mem to access non-system RAM above high_memory
x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
x86: remove high_memory check from valid_phys_addr_range

This is an attempt to remove the high_memory limit for the read/write access to
/dev/mem. IMHO there is no reason for this limit on x86. It is presented in
the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC
it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the
direct kernel mapping __va translation. Since the valid_phys_addr_range is
called as the first check in (read|write)_mem, it basically does not allow to
access anything above high_memory on x86.

The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so
the direct kernel mapping can be safely used for system RAM bellow high_memory.
This is IMHO the only valid reason to use high_memory check in (read|write)_mem.

The second patch removes the high_memory check from valid_phys_addr_range,
allowing read/write to access non-system RAM above high_memory. So far this
was possible only by using mmap.

I hope I haven't overlooked something.

Many thanks

Frantisek Hrbata (4):
  x86: add arch_pfn_possible helper
  x86: add phys addr validity check for /dev/mem mmap
  x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  x86: remove high_memory check from valid_phys_addr_range

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/ioremap.c     |  9 ++++++---
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 arch/x86/mm/physaddr.h    |  9 +++++++--
 4 files changed, 29 insertions(+), 5 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/4] x86: /dev/mem fixes
@ 2014-09-30 16:24   ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

This is a second version of the patch set. The only change is in

2/4 x86: add phys addr validity check for /dev/mem mmap

where the "count" was replaced with "len_bytes" for better readability.

The rest is just a refresh because of this change.

The original message follows ...

Hi all,

this is a combination of two patch sets I sent a while ago
1) Prevent possible PTE corruption with /dev/mem mmap
2) x86: allow read/write /dev/mem to access non-system RAM above high_memory

The original thread with both patch sets can be found here
   https://lkml.org/lkml/2014/8/14/229
   lkml: <1408025927-16826-1-git-send-email-fhrbata@redhat.com>

1) Prevent possible PTE corruption with /dev/mem mmap
x86: add arch_pfn_possible helper
x86: add phys addr validity check for /dev/mem mmap

Many thanks goes to Dave Hansen, who helped with the "final" check. Other than
that it did not get much attention, except H. Peter Anvin's complain that having
two checks for mmap and read/write for /dev/mem access is ridiculous. I for sure
do not object to this, but AFAICT it's not that simple to unify them and it's not
"directly" related to the PTE corruption. Please note that there are other
archs(ia64, arm) using these check. But I for sure can be missing something.

What this patch set does is using the existing interface to implement x86 specific
check in the least invasive way.

Anyway I tried to remove the high_memory check with a follow-up patch set 2)

2) x86: allow read/write /dev/mem to access non-system RAM above high_memory
x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
x86: remove high_memory check from valid_phys_addr_range

This is an attempt to remove the high_memory limit for the read/write access to
/dev/mem. IMHO there is no reason for this limit on x86. It is presented in
the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC
it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the
direct kernel mapping __va translation. Since the valid_phys_addr_range is
called as the first check in (read|write)_mem, it basically does not allow to
access anything above high_memory on x86.

The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so
the direct kernel mapping can be safely used for system RAM bellow high_memory.
This is IMHO the only valid reason to use high_memory check in (read|write)_mem.

The second patch removes the high_memory check from valid_phys_addr_range,
allowing read/write to access non-system RAM above high_memory. So far this
was possible only by using mmap.

I hope I haven't overlooked something.

Many thanks

Frantisek Hrbata (4):
  x86: add arch_pfn_possible helper
  x86: add phys addr validity check for /dev/mem mmap
  x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  x86: remove high_memory check from valid_phys_addr_range

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/ioremap.c     |  9 ++++++---
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 arch/x86/mm/physaddr.h    |  9 +++++++--
 4 files changed, 29 insertions(+), 5 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] x86: add arch_pfn_possible helper
  2014-09-30 16:24   ` Frantisek Hrbata
@ 2014-09-30 16:25     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Add helper to check maximum possible pfn on x86. Also make the current
phys_addr_valid helper using it internally.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/physaddr.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h
index a3cd5a0..9df8e3a 100644
--- a/arch/x86/mm/physaddr.h
+++ b/arch/x86/mm/physaddr.h
@@ -1,10 +1,15 @@
 #include <asm/processor.h>
 
-static inline int phys_addr_valid(resource_size_t addr)
+static inline int arch_pfn_possible(unsigned long pfn)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
-	return !(addr >> boot_cpu_data.x86_phys_bits);
+	return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT));
 #else
 	return 1;
 #endif
 }
+
+static inline int phys_addr_valid(resource_size_t addr)
+{
+	return arch_pfn_possible(addr >> PAGE_SHIFT);
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] x86: add arch_pfn_possible helper
@ 2014-09-30 16:25     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Add helper to check maximum possible pfn on x86. Also make the current
phys_addr_valid helper using it internally.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/physaddr.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h
index a3cd5a0..9df8e3a 100644
--- a/arch/x86/mm/physaddr.h
+++ b/arch/x86/mm/physaddr.h
@@ -1,10 +1,15 @@
 #include <asm/processor.h>
 
-static inline int phys_addr_valid(resource_size_t addr)
+static inline int arch_pfn_possible(unsigned long pfn)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
-	return !(addr >> boot_cpu_data.x86_phys_bits);
+	return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT));
 #else
 	return 1;
 #endif
 }
+
+static inline int phys_addr_valid(resource_size_t addr)
+{
+	return arch_pfn_possible(addr >> PAGE_SHIFT);
+}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] x86: add phys addr validity check for /dev/mem mmap
  2014-09-30 16:24   ` Frantisek Hrbata
@ 2014-09-30 16:25     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

V3: use len_bytes instead of count, thanks to Dave Hansen and Thomas Gleixner
V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..49ede3c 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t len_bytes);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 919b912..77a13f8 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct va_alignment __read_mostly va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t len_bytes)
+{
+	return arch_pfn_possible(pfn + (len_bytes >> PAGE_SHIFT));
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] x86: add phys addr validity check for /dev/mem mmap
@ 2014-09-30 16:25     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

V3: use len_bytes instead of count, thanks to Dave Hansen and Thomas Gleixner
V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..49ede3c 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t len_bytes);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 919b912..77a13f8 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct va_alignment __read_mostly va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t len_bytes)
+{
+	return arch_pfn_possible(pfn + (len_bytes >> PAGE_SHIFT));
+}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/4] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  2014-09-30 16:24   ` Frantisek Hrbata
@ 2014-09-30 16:25     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic
high_memory check in valid_phys_addr_range(), which does not allow to access any
memory above high_memory whatsoever. By adding the high_memory check to
(xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for
kernel mapped memory and it will also allow read/write to access non-system RAM
above high_memory once the high_memory check is removed from
valid_phys_addr_range.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/ioremap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..1ae7323 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	void *addr;
 	unsigned long start = phys & PAGE_MASK;
 
-	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	/*
+	 * If page is RAM and is mapped by kernel, we can use __va.
+	 * Otherwise ioremap and unmap.
+	 */
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/4] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
@ 2014-09-30 16:25     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic
high_memory check in valid_phys_addr_range(), which does not allow to access any
memory above high_memory whatsoever. By adding the high_memory check to
(xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for
kernel mapped memory and it will also allow read/write to access non-system RAM
above high_memory once the high_memory check is removed from
valid_phys_addr_range.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/ioremap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..1ae7323 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	void *addr;
 	unsigned long start = phys & PAGE_MASK;
 
-	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	/*
+	 * If page is RAM and is mapped by kernel, we can use __va.
+	 * Otherwise ioremap and unmap.
+	 */
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] x86: remove high_memory check from valid_phys_addr_range
  2014-09-30 16:24   ` Frantisek Hrbata
@ 2014-09-30 16:25     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

There is no need to block read/write access to /dev/mem for phys. addr. above
high_memory for non-system RAM. The only limitation should be
boot_cpu_data.x86_phys_bits(max phys. addr. size).

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 77a13f8..788f2b7 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 int valid_phys_addr_range(phys_addr_t addr, size_t count)
 {
-	return addr + count <= __pa(high_memory);
+	return arch_pfn_possible((addr + count) >> PAGE_SHIFT);
 }
 
 int valid_mmap_phys_addr_range(unsigned long pfn, size_t len_bytes)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] x86: remove high_memory check from valid_phys_addr_range
@ 2014-09-30 16:25     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl, torvalds

There is no need to block read/write access to /dev/mem for phys. addr. above
high_memory for non-system RAM. The only limitation should be
boot_cpu_data.x86_phys_bits(max phys. addr. size).

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 77a13f8..788f2b7 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 int valid_phys_addr_range(phys_addr_t addr, size_t count)
 {
-	return addr + count <= __pa(high_memory);
+	return arch_pfn_possible((addr + count) >> PAGE_SHIFT);
 }
 
 int valid_mmap_phys_addr_range(unsigned long pfn, size_t len_bytes)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
  2014-09-30 14:27         ` Dave Hansen
@ 2014-09-30 16:33           ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:33 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, linux-kernel, linux-mm, mingo, hpa, x86, oleg,
	kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl,
	torvalds

On Tue, Sep 30, 2014 at 07:27:51AM -0700, Dave Hansen wrote:
> On 09/30/2014 05:41 AM, Frantisek Hrbata wrote:
> > Does it make sense to replace "count" with "size" so it's consistent with the
> > rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?
> 
> I don't care what it is as long as it has a unit in it.

Hi Dave/Thomas,

I sent v2 of this patch set, which uses the "len_bytes". Again, I'm sorry I
forgot to fix this the first time.

Many thanks

-- 
Frantisek Hrbata

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
@ 2014-09-30 16:33           ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-09-30 16:33 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, linux-kernel, linux-mm, mingo, hpa, x86, oleg,
	kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl,
	torvalds

On Tue, Sep 30, 2014 at 07:27:51AM -0700, Dave Hansen wrote:
> On 09/30/2014 05:41 AM, Frantisek Hrbata wrote:
> > Does it make sense to replace "count" with "size" so it's consistent with the
> > rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?
> 
> I don't care what it is as long as it has a unit in it.

Hi Dave/Thomas,

I sent v2 of this patch set, which uses the "len_bytes". Again, I'm sorry I
forgot to fix this the first time.

Many thanks

-- 
Frantisek Hrbata

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
  2014-09-30 16:33           ` Frantisek Hrbata
@ 2014-09-30 21:57             ` Thomas Gleixner
  -1 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2014-09-30 21:57 UTC (permalink / raw)
  To: Frantisek Hrbata
  Cc: Dave Hansen, linux-kernel, linux-mm, mingo, hpa, x86, oleg,
	kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl,
	torvalds

On Tue, 30 Sep 2014, Frantisek Hrbata wrote:
> On Tue, Sep 30, 2014 at 07:27:51AM -0700, Dave Hansen wrote:
> > On 09/30/2014 05:41 AM, Frantisek Hrbata wrote:
> > > Does it make sense to replace "count" with "size" so it's consistent with the
> > > rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?
> > 
> > I don't care what it is as long as it has a unit in it.
> 
> Hi Dave/Thomas,
> 
> I sent v2 of this patch set, which uses the "len_bytes". Again, I'm sorry I
> forgot to fix this the first time.

No problem. Shit happens.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap
@ 2014-09-30 21:57             ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2014-09-30 21:57 UTC (permalink / raw)
  To: Frantisek Hrbata
  Cc: Dave Hansen, linux-kernel, linux-mm, mingo, hpa, x86, oleg,
	kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl,
	torvalds

On Tue, 30 Sep 2014, Frantisek Hrbata wrote:
> On Tue, Sep 30, 2014 at 07:27:51AM -0700, Dave Hansen wrote:
> > On 09/30/2014 05:41 AM, Frantisek Hrbata wrote:
> > > Does it make sense to replace "count" with "size" so it's consistent with the
> > > rest or do you prefer "nr_bytes" or as Dave proposed "len_bytes"?
> > 
> > I don't care what it is as long as it has a unit in it.
> 
> Hi Dave/Thomas,
> 
> I sent v2 of this patch set, which uses the "len_bytes". Again, I'm sorry I
> forgot to fix this the first time.

No problem. Shit happens.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-09-30 21:57 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-29 11:32 [RESEND PATCH 0/4] x86: /dev/mem fixes Frantisek Hrbata
2014-09-29 11:32 ` Frantisek Hrbata
2014-09-29 11:32 ` [RESEND PATCH 1/4] x86: add arch_pfn_possible helper Frantisek Hrbata
2014-09-29 11:32   ` Frantisek Hrbata
2014-09-29 11:33 ` [RESEND PATCH 2/4] x86: add phys addr validity check for /dev/mem mmap Frantisek Hrbata
2014-09-29 11:33   ` Frantisek Hrbata
2014-09-29 20:15   ` Thomas Gleixner
2014-09-29 20:15     ` Thomas Gleixner
2014-09-30 12:41     ` Frantisek Hrbata
2014-09-30 12:41       ` Frantisek Hrbata
2014-09-30 14:27       ` Dave Hansen
2014-09-30 14:27         ` Dave Hansen
2014-09-30 16:33         ` Frantisek Hrbata
2014-09-30 16:33           ` Frantisek Hrbata
2014-09-30 21:57           ` Thomas Gleixner
2014-09-30 21:57             ` Thomas Gleixner
2014-09-29 11:33 ` [RESEND PATCH 3/4] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr Frantisek Hrbata
2014-09-29 11:33   ` Frantisek Hrbata
2014-09-29 11:33 ` [RESEND PATCH 4/4] x86: remove high_memory check from valid_phys_addr_range Frantisek Hrbata
2014-09-29 11:33   ` Frantisek Hrbata
2014-09-30 16:24 ` [PATCH v2 0/4] x86: /dev/mem fixes Frantisek Hrbata
2014-09-30 16:24   ` Frantisek Hrbata
2014-09-30 16:25   ` [PATCH v2 1/4] x86: add arch_pfn_possible helper Frantisek Hrbata
2014-09-30 16:25     ` Frantisek Hrbata
2014-09-30 16:25   ` [PATCH v2 2/4] x86: add phys addr validity check for /dev/mem mmap Frantisek Hrbata
2014-09-30 16:25     ` Frantisek Hrbata
2014-09-30 16:25   ` [PATCH v2 3/4] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr Frantisek Hrbata
2014-09-30 16:25     ` Frantisek Hrbata
2014-09-30 16:25   ` [PATCH v2 4/4] x86: remove high_memory check from valid_phys_addr_range Frantisek Hrbata
2014-09-30 16:25     ` Frantisek Hrbata

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.