All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Prevent possible PTE corruption with /dev/mem mmap
@ 2014-08-14 14:18 ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Hi all,

after some time this issue popped up again. Please note that the patch was send
to lkml two times.

https://lkml.org/lkml/2013/4/2/297
  lkml: <1364905733-23937-1-git-send-email-fhrbata@redhat.com>
https://lkml.org/lkml/2013/10/2/359
  lkml: <20131002160514.GA25471@localhost.localdomain>

It did not get much attention, except H. Peter Anvin's complain that having two
checks for mmap and read/write for /dev/mem access is ridiculous. I for sure do
not object to this, but AFAICT it's not that simple to unify them and it's not
"directly" related to the PTE corruption. Please note that there are other
archs(ia64, arm) using these check. But I for sure can be missing something.

What the patch does is using the existing interface to implement x86 specific
check in the least invasive way.

Peter: I by no means want to be pushy. Just that after I looked into this a
little bit more, I don't see a better and more straightforward way how to fix
this. I will be grateful for any suggestions and help. If we want/need to fix
this in a different way, I can for sure try, but I will need at least some
guidance.

So I'm posting this once more with a hope it will get more attention or at least
to start the discussion how/if this should be fixed.

The patch is the same except I added a check for phys addr overflow before
calling phys_addr_valid. Maybe this check should be in do_mmap_pgoff.

Many thanks

Frantisek Hrbata (1):
  x86: add phys addr validity check for /dev/mem mmap

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/1] Prevent possible PTE corruption with /dev/mem mmap
@ 2014-08-14 14:18 ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Hi all,

after some time this issue popped up again. Please note that the patch was send
to lkml two times.

https://lkml.org/lkml/2013/4/2/297
  lkml: <1364905733-23937-1-git-send-email-fhrbata@redhat.com>
https://lkml.org/lkml/2013/10/2/359
  lkml: <20131002160514.GA25471@localhost.localdomain>

It did not get much attention, except H. Peter Anvin's complain that having two
checks for mmap and read/write for /dev/mem access is ridiculous. I for sure do
not object to this, but AFAICT it's not that simple to unify them and it's not
"directly" related to the PTE corruption. Please note that there are other
archs(ia64, arm) using these check. But I for sure can be missing something.

What the patch does is using the existing interface to implement x86 specific
check in the least invasive way.

Peter: I by no means want to be pushy. Just that after I looked into this a
little bit more, I don't see a better and more straightforward way how to fix
this. I will be grateful for any suggestions and help. If we want/need to fix
this in a different way, I can for sure try, but I will need at least some
guidance.

So I'm posting this once more with a hope it will get more attention or at least
to start the discussion how/if this should be fixed.

The patch is the same except I added a check for phys addr overflow before
calling phys_addr_valid. Maybe this check should be in do_mmap_pgoff.

Many thanks

Frantisek Hrbata (1):
  x86: add phys addr validity check for /dev/mem mmap

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
  2014-08-14 14:18 ` Frantisek Hrbata
@ 2014-08-14 14:18   ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..55c59d5 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 25e7e13..b5be2ad 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct __read_mostly va_alignment va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,19 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
+{
+	/* pgoff + count overflow is checked in do_mmap_pgoff */
+	pfn += count >> PAGE_SHIFT;
+
+	if (pfn >> BITS_PER_LONG - PAGE_SHIFT)
+		return -EOVERFLOW;
+
+	return phys_addr_valid(pfn << PAGE_SHIFT);
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-14 14:18   ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..55c59d5 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 25e7e13..b5be2ad 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct __read_mostly va_alignment va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,19 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
+{
+	/* pgoff + count overflow is checked in do_mmap_pgoff */
+	pfn += count >> PAGE_SHIFT;
+
+	if (pfn >> BITS_PER_LONG - PAGE_SHIFT)
+		return -EOVERFLOW;
+
+	return phys_addr_valid(pfn << PAGE_SHIFT);
+}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
  2014-08-14 14:18   ` Frantisek Hrbata
@ 2014-08-14 16:36     ` Dave Hansen
  -1 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2014-08-14 16:36 UTC (permalink / raw)
  To: Frantisek Hrbata, linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dvlasenk, prarit, lwoodman, hannsj_uhl

Thanks for dredging this back up!

On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> +{
> +	return addr + count <= __pa(high_memory);
> +}

Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.

> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> +{

Nit: please add units to things like "count".  len_bytes would be nice
for this kind of thing, especially since it's passed *with* a pfn it
would be easy to think it is a count in pages.

> +	/* pgoff + count overflow is checked in do_mmap_pgoff */
> +	pfn += count >> PAGE_SHIFT;
> +
> +	if (pfn >> BITS_PER_LONG - PAGE_SHIFT)
> +		return -EOVERFLOW;

Is this -EOVERFLOW correct?  It is called like this:

> static int mmap_mem(struct file *file, struct vm_area_struct *vma)
> {
>         if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))
>                 return -EINVAL;

So I think we need to return true/false:0/1.  -EOVERFLOW would be true,
and that if() would pass.

> +	return phys_addr_valid(pfn << PAGE_SHIFT);
> +}

Maybe I'm dumb, but it took me a minute to figure out what you were
trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)".  In any
case, I think it is wrong on 32-bit.

On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000
or pfn=0x100000 (4GB) is perfectly valid with PAE enabled.  But, this
code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW.

I think something like this would be easier to read and actually work on
32-bit:

static inline int arch_pfn_possible(unsigned long pfn)
{
 	unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits -
PAGE_SHIFT);
	return pfn < max_arch_pfn;
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-14 16:36     ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2014-08-14 16:36 UTC (permalink / raw)
  To: Frantisek Hrbata, linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dvlasenk, prarit, lwoodman, hannsj_uhl

Thanks for dredging this back up!

On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> +{
> +	return addr + count <= __pa(high_memory);
> +}

Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.

> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> +{

Nit: please add units to things like "count".  len_bytes would be nice
for this kind of thing, especially since it's passed *with* a pfn it
would be easy to think it is a count in pages.

> +	/* pgoff + count overflow is checked in do_mmap_pgoff */
> +	pfn += count >> PAGE_SHIFT;
> +
> +	if (pfn >> BITS_PER_LONG - PAGE_SHIFT)
> +		return -EOVERFLOW;

Is this -EOVERFLOW correct?  It is called like this:

> static int mmap_mem(struct file *file, struct vm_area_struct *vma)
> {
>         if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))
>                 return -EINVAL;

So I think we need to return true/false:0/1.  -EOVERFLOW would be true,
and that if() would pass.

> +	return phys_addr_valid(pfn << PAGE_SHIFT);
> +}

Maybe I'm dumb, but it took me a minute to figure out what you were
trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)".  In any
case, I think it is wrong on 32-bit.

On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000
or pfn=0x100000 (4GB) is perfectly valid with PAE enabled.  But, this
code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW.

I think something like this would be easier to read and actually work on
32-bit:

static inline int arch_pfn_possible(unsigned long pfn)
{
 	unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits -
PAGE_SHIFT);
	return pfn < max_arch_pfn;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
  2014-08-14 16:36     ` Dave Hansen
@ 2014-08-14 17:20       ` H. Peter Anvin
  -1 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2014-08-14 17:20 UTC (permalink / raw)
  To: Dave Hansen, Frantisek Hrbata, linux-kernel
  Cc: linux-mm, tglx, mingo, x86, oleg, kamaleshb, hechjie, akpm,
	dvlasenk, prarit, lwoodman, hannsj_uhl

On 08/14/2014 09:36 AM, Dave Hansen wrote:
> Thanks for dredging this back up!
> 
> On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
>> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
>> +{
>> +	return addr + count <= __pa(high_memory);
>> +}
> 
> Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.
> 

Last I checked, /dev/mem was already broken for highmem... which is
actually a problem.  I would prefer to see it fixed.

	-hpa


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-14 17:20       ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2014-08-14 17:20 UTC (permalink / raw)
  To: Dave Hansen, Frantisek Hrbata, linux-kernel
  Cc: linux-mm, tglx, mingo, x86, oleg, kamaleshb, hechjie, akpm,
	dvlasenk, prarit, lwoodman, hannsj_uhl

On 08/14/2014 09:36 AM, Dave Hansen wrote:
> Thanks for dredging this back up!
> 
> On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
>> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
>> +{
>> +	return addr + count <= __pa(high_memory);
>> +}
> 
> Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.
> 

Last I checked, /dev/mem was already broken for highmem... which is
actually a problem.  I would prefer to see it fixed.

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
  2014-08-14 16:36     ` Dave Hansen
@ 2014-08-14 17:40       ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 17:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl

On Thu, Aug 14, 2014 at 09:36:03AM -0700, Dave Hansen wrote:
> Thanks for dredging this back up!
> 
> On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
> > +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> > +{
> > +	return addr + count <= __pa(high_memory);
> > +}
> 
> Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.

Unfortunatelly this is how it works right now. Please note that at this moment
x86 is using the default checks from drivers/char/mem.c. The
valid_phys_addr_range is used just for read/write. Meaning yes, you cannot access
/dev/mem above high_memory via read/write, which is 896MB on x86_32.

I simply copied this generic check. There is no change compared to the current
behaviour.

BTW I think this can be simply fixed by moving the high_memory check directly to
the xlate_dev_mem_ptr function.

Something like the following. Please note this is not even compile tested.

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..7ebc241 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -321,7 +321,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	unsigned long start = phys & PAGE_MASK;
 
 	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +333,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));


IIUIC the whole high_memory check is here only because of the kernel identity
mapping and as a generic check because of the generic xlate_dev_mem_ptr.

include/asm-generic/io.h: #define xlate_dev_mem_ptr(p)    __va(p)

> 
> > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> > +{
> 
> Nit: please add units to things like "count".  len_bytes would be nice
> for this kind of thing, especially since it's passed *with* a pfn it
> would be easy to think it is a count in pages.

Sure I have no problem with this. But please note that I just took the already
used/presented interface from drivers/char/mem.c.

> 
> > +	/* pgoff + count overflow is checked in do_mmap_pgoff */
> > +	pfn += count >> PAGE_SHIFT;
> > +
> > +	if (pfn >> BITS_PER_LONG - PAGE_SHIFT)
> > +		return -EOVERFLOW;
> 
> Is this -EOVERFLOW correct?  It is called like this:
> 
> > static int mmap_mem(struct file *file, struct vm_area_struct *vma)
> > {
> >         if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))
> >                 return -EINVAL;
> 
> So I think we need to return true/false:0/1.  -EOVERFLOW would be true,
> and that if() would pass.

Facepalm, sure this is completely wrong. We of course need to return zero. I
thought it would be more descriptive to use -EOVERFLOW, even though we get
-EINVAL in the end. I will fix this. Many thanks for pointing this out.

> 
> > +	return phys_addr_valid(pfn << PAGE_SHIFT);
> > +}
> 
> Maybe I'm dumb, but it took me a minute to figure out what you were
> trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)".  In any
> case, I think it is wrong on 32-bit.
> 
> On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000
> or pfn=0x100000 (4GB) is perfectly valid with PAE enabled.  But, this
> code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW.

Right, I did not realized this.

> 
> I think something like this would be easier to read and actually work on
> 32-bit:
> 
> static inline int arch_pfn_possible(unsigned long pfn)
> {
>  	unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits -
> PAGE_SHIFT);
> 	return pfn < max_arch_pfn;
> }

Actually I wanted to use exactly this instead of calling phys_addr_valid, because
we can avoid the whole overflow check, but it seemed natural to use what is
already available. But you are right for sure. This needs to be fixed also.

Dave, many thanks for your feedback, it's really appreciated!

-- 
Frantisek Hrbata

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-14 17:40       ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 17:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl

On Thu, Aug 14, 2014 at 09:36:03AM -0700, Dave Hansen wrote:
> Thanks for dredging this back up!
> 
> On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
> > +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> > +{
> > +	return addr + count <= __pa(high_memory);
> > +}
> 
> Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.

Unfortunatelly this is how it works right now. Please note that at this moment
x86 is using the default checks from drivers/char/mem.c. The
valid_phys_addr_range is used just for read/write. Meaning yes, you cannot access
/dev/mem above high_memory via read/write, which is 896MB on x86_32.

I simply copied this generic check. There is no change compared to the current
behaviour.

BTW I think this can be simply fixed by moving the high_memory check directly to
the xlate_dev_mem_ptr function.

Something like the following. Please note this is not even compile tested.

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..7ebc241 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -321,7 +321,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	unsigned long start = phys & PAGE_MASK;
 
 	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +333,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));


IIUIC the whole high_memory check is here only because of the kernel identity
mapping and as a generic check because of the generic xlate_dev_mem_ptr.

include/asm-generic/io.h: #define xlate_dev_mem_ptr(p)    __va(p)

> 
> > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> > +{
> 
> Nit: please add units to things like "count".  len_bytes would be nice
> for this kind of thing, especially since it's passed *with* a pfn it
> would be easy to think it is a count in pages.

Sure I have no problem with this. But please note that I just took the already
used/presented interface from drivers/char/mem.c.

> 
> > +	/* pgoff + count overflow is checked in do_mmap_pgoff */
> > +	pfn += count >> PAGE_SHIFT;
> > +
> > +	if (pfn >> BITS_PER_LONG - PAGE_SHIFT)
> > +		return -EOVERFLOW;
> 
> Is this -EOVERFLOW correct?  It is called like this:
> 
> > static int mmap_mem(struct file *file, struct vm_area_struct *vma)
> > {
> >         if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))
> >                 return -EINVAL;
> 
> So I think we need to return true/false:0/1.  -EOVERFLOW would be true,
> and that if() would pass.

Facepalm, sure this is completely wrong. We of course need to return zero. I
thought it would be more descriptive to use -EOVERFLOW, even though we get
-EINVAL in the end. I will fix this. Many thanks for pointing this out.

> 
> > +	return phys_addr_valid(pfn << PAGE_SHIFT);
> > +}
> 
> Maybe I'm dumb, but it took me a minute to figure out what you were
> trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)".  In any
> case, I think it is wrong on 32-bit.
> 
> On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000
> or pfn=0x100000 (4GB) is perfectly valid with PAE enabled.  But, this
> code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW.

Right, I did not realized this.

> 
> I think something like this would be easier to read and actually work on
> 32-bit:
> 
> static inline int arch_pfn_possible(unsigned long pfn)
> {
>  	unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits -
> PAGE_SHIFT);
> 	return pfn < max_arch_pfn;
> }

Actually I wanted to use exactly this instead of calling phys_addr_valid, because
we can avoid the whole overflow check, but it seemed natural to use what is
already available. But you are right for sure. This needs to be fixed also.

Dave, many thanks for your feedback, it's really appreciated!

-- 
Frantisek Hrbata

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
  2014-08-14 17:20       ` H. Peter Anvin
@ 2014-08-14 17:53         ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, linux-kernel, linux-mm, tglx, mingo, x86, oleg,
	kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl

On Thu, Aug 14, 2014 at 10:20:53AM -0700, H. Peter Anvin wrote:
> On 08/14/2014 09:36 AM, Dave Hansen wrote:
> > Thanks for dredging this back up!
> > 
> > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
> >> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> >> +{
> >> +	return addr + count <= __pa(high_memory);
> >> +}
> > 
> > Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.
> > 
> 
> Last I checked, /dev/mem was already broken for highmem... which is
> actually a problem.  I would prefer to see it fixed.
> 
> 	-hpa
> 

Hi Peter,

thank you for jumping in again. I'm not going to pretent I fully understand this
code, as proven by Dave :), but wouldn't it help just to move the high_memory
check directly to the xlate_dev_mem_ptr. Meaning no high_memory check in
valid_phys_addr_range for x86. I sent more info in my reply to Dave's mail.

Again many thanks.


-- 
Frantisek Hrbata

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-14 17:53         ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-14 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, linux-kernel, linux-mm, tglx, mingo, x86, oleg,
	kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl

On Thu, Aug 14, 2014 at 10:20:53AM -0700, H. Peter Anvin wrote:
> On 08/14/2014 09:36 AM, Dave Hansen wrote:
> > Thanks for dredging this back up!
> > 
> > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote:
> >> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> >> +{
> >> +	return addr + count <= __pa(high_memory);
> >> +}
> > 
> > Is this correct on 32-bit?  It would limit /dev/mem to memory below 896MB.
> > 
> 
> Last I checked, /dev/mem was already broken for highmem... which is
> actually a problem.  I would prefer to see it fixed.
> 
> 	-hpa
> 

Hi Peter,

thank you for jumping in again. I'm not going to pretent I fully understand this
code, as proven by Dave :), but wouldn't it help just to move the high_memory
check directly to the xlate_dev_mem_ptr. Meaning no high_memory check in
valid_phys_addr_range for x86. I sent more info in my reply to Dave's mail.

Again many thanks.


-- 
Frantisek Hrbata

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
  2014-08-14 14:18   ` Frantisek Hrbata
@ 2014-08-15 10:17     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 10:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

self-nack

As pointed by Dave Hansen, the check is just wrong. I will post V2.

Many thanks Dave!

-- 
Frantisek Hrbata

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-15 10:17     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 10:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

self-nack

As pointed by Dave Hansen, the check is just wrong. I will post V2.

Many thanks Dave!

-- 
Frantisek Hrbata

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V2 0/2] Prevent possible PTE corruption with /dev/mem mmap
  2014-08-14 14:18 ` Frantisek Hrbata
@ 2014-08-15 11:44   ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Thanks to Dave Hansen for pointing out problems related to the pfn check in the
first version. I tried to fix it by adding a new arch_pfn_possible helper to
arch/x86/mm/physaddr.h. Please note that I'm not quite sure about the name and
the location(physaddr.h). Maybe we can keep the check directly in the
valid_mmap_phys_addr_range. I will leave this to a discussion and fix it if
required.

Question: Do we need the CONFIG_PHYS_ADDR_T_64BIT ifdef? The
boot_cpu_data.x86_phys_bits is set for all x86. So at this point it seems to me
more like an "optimization" for x86_32 or something kept from historic
reasons. I'm just curious and I of course may be missing something.

Many thanks

Frantisek Hrbata (2):
  x86: add arch_pfn_possible helper
  x86: add phys addr validity check for /dev/mem mmap

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 arch/x86/mm/physaddr.h    |  9 +++++++--
 3 files changed, 23 insertions(+), 2 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V2 0/2] Prevent possible PTE corruption with /dev/mem mmap
@ 2014-08-15 11:44   ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Thanks to Dave Hansen for pointing out problems related to the pfn check in the
first version. I tried to fix it by adding a new arch_pfn_possible helper to
arch/x86/mm/physaddr.h. Please note that I'm not quite sure about the name and
the location(physaddr.h). Maybe we can keep the check directly in the
valid_mmap_phys_addr_range. I will leave this to a discussion and fix it if
required.

Question: Do we need the CONFIG_PHYS_ADDR_T_64BIT ifdef? The
boot_cpu_data.x86_phys_bits is set for all x86. So at this point it seems to me
more like an "optimization" for x86_32 or something kept from historic
reasons. I'm just curious and I of course may be missing something.

Many thanks

Frantisek Hrbata (2):
  x86: add arch_pfn_possible helper
  x86: add phys addr validity check for /dev/mem mmap

 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 arch/x86/mm/physaddr.h    |  9 +++++++--
 3 files changed, 23 insertions(+), 2 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V2 1/2] x86: add arch_pfn_possible helper
  2014-08-15 11:44   ` Frantisek Hrbata
@ 2014-08-15 11:44     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Add helper to check maximum possible pfn on x86. Also make the current
phys_addr_valid helper using it internally.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/physaddr.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h
index a3cd5a0..9df8e3a 100644
--- a/arch/x86/mm/physaddr.h
+++ b/arch/x86/mm/physaddr.h
@@ -1,10 +1,15 @@
 #include <asm/processor.h>
 
-static inline int phys_addr_valid(resource_size_t addr)
+static inline int arch_pfn_possible(unsigned long pfn)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
-	return !(addr >> boot_cpu_data.x86_phys_bits);
+	return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT));
 #else
 	return 1;
 #endif
 }
+
+static inline int phys_addr_valid(resource_size_t addr)
+{
+	return arch_pfn_possible(addr >> PAGE_SHIFT);
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 1/2] x86: add arch_pfn_possible helper
@ 2014-08-15 11:44     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Add helper to check maximum possible pfn on x86. Also make the current
phys_addr_valid helper using it internally.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/physaddr.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h
index a3cd5a0..9df8e3a 100644
--- a/arch/x86/mm/physaddr.h
+++ b/arch/x86/mm/physaddr.h
@@ -1,10 +1,15 @@
 #include <asm/processor.h>
 
-static inline int phys_addr_valid(resource_size_t addr)
+static inline int arch_pfn_possible(unsigned long pfn)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
-	return !(addr >> boot_cpu_data.x86_phys_bits);
+	return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT));
 #else
 	return 1;
 #endif
 }
+
+static inline int phys_addr_valid(resource_size_t addr)
+{
+	return arch_pfn_possible(addr >> PAGE_SHIFT);
+}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
  2014-08-15 11:44   ` Frantisek Hrbata
@ 2014-08-15 11:44     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..55c59d5 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 25e7e13..110473e 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct __read_mostly va_alignment va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
+{
+	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-15 11:44     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Prevent possible PTE corruption while calling mmap on /dev/mem with large
offset.

oops info, please note the PTE value 8008000000000225.
---------------------------------8<--------------------------------------
[85739.124496] rep: Corrupted page table at address 7f63852f8000
[85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225
[85739.136941] Bad pagetable: 000d [#1] SMP
[85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp
kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support
ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd
sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
[85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1
[85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012
[85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000
[85739.194988] RIP: 0033:[<0000000000400773>]  [<0000000000400773>] 0x400773
[85739.201799] RSP: 002b:00007fffe4ca3c80  EFLAGS: 00010213
[85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca
[85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000
[85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0
[85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000
[85739.242835] FS:  00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000
[85739.250925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0
---------------------------------8<--------------------------------------

According to [1] Chapter 4 Paging, some higher bits in 64bit
PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example,
for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR)
are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to
be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions,
the #PF is generated with RSVD error code.

<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear address because a
reserved bit was set in one of the paging-structure entries used to translate
that address. (Because reserved bits are not checked in a paging-structure entry
whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also
set.)
</quote>

In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always
returns 1 for x86. So it's possible to use any pgoff we want and
to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a
possibility to use mmap on /dev/mem and cause system panic. It's probably
not that serious, because access to /dev/mem is limited and the system has
to have the panic_on_oops set, but still I think we should check this and
return error.

The path for this problem is:
mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd flag

This patch adds check for x86. With this fix mmap returns -EINVAL if the
requested phys addr is larger then the supported phys addr width.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A

x86_64 reproducer
---------------------------------8<--------------------------------------
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 #define OFFSET 0x8000000000000LL

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	long pgoff;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY);
	if (fd == -1)
		die("cannot open /dev/mem");

	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);

	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

x86_32_PAE reproducer
---------------------------------8<--------------------------------------
 #define _GNU_SOURCE
 #define _LARGEFILE64_SOURCE
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <err.h>
 #include <stdlib.h>
 #include <sys/mman.h>

 #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)

 /* 37th bit in PTE */
 #define OFFSET 0x2000000

int main(int argc, char *argv[])
{
	int fd;
	long ps;
	char *map;
	char c;

	ps = sysconf(_SC_PAGE_SIZE);
	if (ps == -1)
		die("cannot get page size");

	fd = open("/dev/mem", O_RDONLY|O_LARGEFILE);
	if (fd == -1)
		die("cannot open /dev/mem");

	map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET);
	if (map == MAP_FAILED)
		die("cannot mmap");

	c = map[0];

	if (munmap(map, ps) == -1)
		die("cannot munmap");

	if (close(fd) == -1)
		die("cannot close");

	return 0;
}
---------------------------------8<--------------------------------------

V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/include/asm/io.h |  4 ++++
 arch/x86/mm/mmap.c        | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index b8237d8..55c59d5 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -243,6 +243,10 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count);
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 25e7e13..110473e 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+#include "physaddr.h"
+
 struct __read_mostly va_alignment va_align = {
 	.flags = -1,
 };
@@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 	}
 }
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+	return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
+{
+	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
+}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
  2014-08-15 11:44     ` Frantisek Hrbata
@ 2014-08-15 18:10       ` Dave Hansen
  -1 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2014-08-15 18:10 UTC (permalink / raw)
  To: Frantisek Hrbata, linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dvlasenk, prarit, lwoodman, hannsj_uhl

On 08/15/2014 04:44 AM, Frantisek Hrbata wrote:
> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> +{
> +	return addr + count <= __pa(high_memory);
> +}
> +
> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> +{
> +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> +}

It definitely fixes the issue as you described it.

It's a bit unfortunate that the highmem check isn't tied in to the
_existing_ /dev/mem limitations in some way, but it's not a deal breaker
for me.

The only other thing is to make sure this doesn't add some limitation to
64-bit where we can't map things above the end of memory (end of memory
== high_memory on 64-bit).  As long as you've done this, I can't see a
downside.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-15 18:10       ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2014-08-15 18:10 UTC (permalink / raw)
  To: Frantisek Hrbata, linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dvlasenk, prarit, lwoodman, hannsj_uhl

On 08/15/2014 04:44 AM, Frantisek Hrbata wrote:
> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> +{
> +	return addr + count <= __pa(high_memory);
> +}
> +
> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> +{
> +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> +}

It definitely fixes the issue as you described it.

It's a bit unfortunate that the highmem check isn't tied in to the
_existing_ /dev/mem limitations in some way, but it's not a deal breaker
for me.

The only other thing is to make sure this doesn't add some limitation to
64-bit where we can't map things above the end of memory (end of memory
== high_memory on 64-bit).  As long as you've done this, I can't see a
downside.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
  2014-08-15 18:10       ` Dave Hansen
@ 2014-08-18 11:26         ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-18 11:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl

On Fri, Aug 15, 2014 at 11:10:25AM -0700, Dave Hansen wrote:
> On 08/15/2014 04:44 AM, Frantisek Hrbata wrote:
> > +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> > +{
> > +	return addr + count <= __pa(high_memory);
> > +}
> > +
> > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> > +{
> > +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> > +}
> 
> It definitely fixes the issue as you described it.

Hi Dave,

many thanks for your time and help with this!

> 
> It's a bit unfortunate that the highmem check isn't tied in to the
> _existing_ /dev/mem limitations in some way, but it's not a deal breaker
> for me.

Agreed, I will do some more testing with the "patch" I proposed earlier in our
discussion. Meaning the one moving the high_memory check out of the
valid_phys_addr_range() to the xlate_dev_mem_ptr() for x86. IMHO this should
work fine and it should remove the high_memory limitation. But I for sure can be
missing something. If the testing goes well I will post the patch.

> 
> The only other thing is to make sure this doesn't add some limitation to
> 64-bit where we can't map things above the end of memory (end of memory
> == high_memory on 64-bit).  As long as you've done this, I can't see a
> downside.

Yes, from what I have tested, this patch should not introduce any new
limitation, except fixing the PTE problem. Also please note
that this kind of check is already done in ioremap by calling the
phys_addr_valid(). Again, I hope I haven't overlooked something.

Peter and others: Could you please consider including this fix? Of course only
if you do not have any other objections or problems with it.

Many thanks!

-- 
Frantisek Hrbata

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
@ 2014-08-18 11:26         ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-18 11:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb,
	hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl

On Fri, Aug 15, 2014 at 11:10:25AM -0700, Dave Hansen wrote:
> On 08/15/2014 04:44 AM, Frantisek Hrbata wrote:
> > +int valid_phys_addr_range(phys_addr_t addr, size_t count)
> > +{
> > +	return addr + count <= __pa(high_memory);
> > +}
> > +
> > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
> > +{
> > +	return arch_pfn_possible(pfn + (count >> PAGE_SHIFT));
> > +}
> 
> It definitely fixes the issue as you described it.

Hi Dave,

many thanks for your time and help with this!

> 
> It's a bit unfortunate that the highmem check isn't tied in to the
> _existing_ /dev/mem limitations in some way, but it's not a deal breaker
> for me.

Agreed, I will do some more testing with the "patch" I proposed earlier in our
discussion. Meaning the one moving the high_memory check out of the
valid_phys_addr_range() to the xlate_dev_mem_ptr() for x86. IMHO this should
work fine and it should remove the high_memory limitation. But I for sure can be
missing something. If the testing goes well I will post the patch.

> 
> The only other thing is to make sure this doesn't add some limitation to
> 64-bit where we can't map things above the end of memory (end of memory
> == high_memory on 64-bit).  As long as you've done this, I can't see a
> downside.

Yes, from what I have tested, this patch should not introduce any new
limitation, except fixing the PTE problem. Also please note
that this kind of check is already done in ioremap by calling the
phys_addr_valid(). Again, I hope I haven't overlooked something.

Peter and others: Could you please consider including this fix? Of course only
if you do not have any other objections or problems with it.

Many thanks!

-- 
Frantisek Hrbata

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] x86: allow read/write /dev/mem to access non-system RAM above high_memory
  2014-08-15 11:44   ` Frantisek Hrbata
@ 2014-08-20 15:25     ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Hi,

this patch set depends on the following patches posted earlier.

[PATCH V2 1/2] x86: add arch_pfn_possible helper
   lkml: Message-Id: <1408103043-31015-2-git-send-email-fhrbata@redhat.com>
[PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
   lkml: Message-Id: <1408103043-31015-3-git-send-email-fhrbata@redhat.com>

This is an attempt to remove the high_memory limit for the read/write access to
/dev/mem. IMHO there is no reason for this limit on x86. It is presented in 
the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC
it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the
direct kernel mapping __va translation. Since the valid_phys_addr_range is
called as the first check in (read|write)_mem, it basically does not allow to
access anything above high_memory on x86.

The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so
the direct kernel mapping can be safely used for system RAM bellow high_memory.
This is IMHO the only valid reason to use high_memory check in (read|write)_mem.

The second patch removes the high_memory check from valid_phys_addr_range,
allowing read/write to access non-system RAM above high_memory. So far this was
possible only by using mmap.

I hope I haven't overlooked something.

Many thanks

Frantisek Hrbata (2):
  x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  x86: remove high_memory check from valid_phys_addr_range

 arch/x86/mm/ioremap.c | 9 ++++++---
 arch/x86/mm/mmap.c    | 2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] x86: allow read/write /dev/mem to access non-system RAM above high_memory
@ 2014-08-20 15:25     ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

Hi,

this patch set depends on the following patches posted earlier.

[PATCH V2 1/2] x86: add arch_pfn_possible helper
   lkml: Message-Id: <1408103043-31015-2-git-send-email-fhrbata@redhat.com>
[PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap
   lkml: Message-Id: <1408103043-31015-3-git-send-email-fhrbata@redhat.com>

This is an attempt to remove the high_memory limit for the read/write access to
/dev/mem. IMHO there is no reason for this limit on x86. It is presented in 
the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC
it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the
direct kernel mapping __va translation. Since the valid_phys_addr_range is
called as the first check in (read|write)_mem, it basically does not allow to
access anything above high_memory on x86.

The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so
the direct kernel mapping can be safely used for system RAM bellow high_memory.
This is IMHO the only valid reason to use high_memory check in (read|write)_mem.

The second patch removes the high_memory check from valid_phys_addr_range,
allowing read/write to access non-system RAM above high_memory. So far this was
possible only by using mmap.

I hope I haven't overlooked something.

Many thanks

Frantisek Hrbata (2):
  x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  x86: remove high_memory check from valid_phys_addr_range

 arch/x86/mm/ioremap.c | 9 ++++++---
 arch/x86/mm/mmap.c    | 2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
  2014-08-20 15:25     ` Frantisek Hrbata
@ 2014-08-20 15:25       ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic
high_memory check in valid_phys_addr_range(), which does not allow to access any
memory above high_memory whatsoever. By adding the high_memory check to
(xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for
kernel mapped memory and it will also allow read/write to access non-system RAM
above high_memory once the high_memory check is removed from
valid_phys_addr_range.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/ioremap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..1ae7323 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	void *addr;
 	unsigned long start = phys & PAGE_MASK;
 
-	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	/*
+	 * If page is RAM and is mapped by kernel, we can use __va.
+	 * Otherwise ioremap and unmap.
+	 */
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/2] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr
@ 2014-08-20 15:25       ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic
high_memory check in valid_phys_addr_range(), which does not allow to access any
memory above high_memory whatsoever. By adding the high_memory check to
(xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for
kernel mapped memory and it will also allow read/write to access non-system RAM
above high_memory once the high_memory check is removed from
valid_phys_addr_range.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/ioremap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index baff1da..1ae7323 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 	void *addr;
 	unsigned long start = phys & PAGE_MASK;
 
-	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
-	if (page_is_ram(start >> PAGE_SHIFT))
+	/*
+	 * If page is RAM and is mapped by kernel, we can use __va.
+	 * Otherwise ioremap and unmap.
+	 */
+	if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return __va(phys);
 
 	addr = (void __force *)ioremap_cache(start, PAGE_SIZE);
@@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys)
 
 void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
 {
-	if (page_is_ram(phys >> PAGE_SHIFT))
+	if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory))
 		return;
 
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] x86: remove high_memory check from valid_phys_addr_range
  2014-08-20 15:25     ` Frantisek Hrbata
@ 2014-08-20 15:25       ` Frantisek Hrbata
  -1 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

There is no need to block read/write access to /dev/mem for phys. addr. above
high_memory for non-system RAM. The only limitation should be
boot_cpu_data.x86_phys_bits(max phys. addr. size).

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 110473e..16f222d 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 int valid_phys_addr_range(phys_addr_t addr, size_t count)
 {
-	return addr + count <= __pa(high_memory);
+	return arch_pfn_possible((addr + count) >> PAGE_SHIFT);
 }
 
 int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] x86: remove high_memory check from valid_phys_addr_range
@ 2014-08-20 15:25       ` Frantisek Hrbata
  0 siblings, 0 replies; 30+ messages in thread
From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm,
	dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl

There is no need to block read/write access to /dev/mem for phys. addr. above
high_memory for non-system RAM. The only limitation should be
boot_cpu_data.x86_phys_bits(max phys. addr. size).

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
---
 arch/x86/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 110473e..16f222d 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 int valid_phys_addr_range(phys_addr_t addr, size_t count)
 {
-	return addr + count <= __pa(high_memory);
+	return arch_pfn_possible((addr + count) >> PAGE_SHIFT);
 }
 
 int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-08-20 15:26 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-14 14:18 [PATCH 0/1] Prevent possible PTE corruption with /dev/mem mmap Frantisek Hrbata
2014-08-14 14:18 ` Frantisek Hrbata
2014-08-14 14:18 ` [PATCH 1/1] x86: add phys addr validity check for " Frantisek Hrbata
2014-08-14 14:18   ` Frantisek Hrbata
2014-08-14 16:36   ` Dave Hansen
2014-08-14 16:36     ` Dave Hansen
2014-08-14 17:20     ` H. Peter Anvin
2014-08-14 17:20       ` H. Peter Anvin
2014-08-14 17:53       ` Frantisek Hrbata
2014-08-14 17:53         ` Frantisek Hrbata
2014-08-14 17:40     ` Frantisek Hrbata
2014-08-14 17:40       ` Frantisek Hrbata
2014-08-15 10:17   ` Frantisek Hrbata
2014-08-15 10:17     ` Frantisek Hrbata
2014-08-15 11:44 ` [PATCH V2 0/2] Prevent possible PTE corruption with " Frantisek Hrbata
2014-08-15 11:44   ` Frantisek Hrbata
2014-08-15 11:44   ` [PATCH V2 1/2] x86: add arch_pfn_possible helper Frantisek Hrbata
2014-08-15 11:44     ` Frantisek Hrbata
2014-08-15 11:44   ` [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap Frantisek Hrbata
2014-08-15 11:44     ` Frantisek Hrbata
2014-08-15 18:10     ` Dave Hansen
2014-08-15 18:10       ` Dave Hansen
2014-08-18 11:26       ` Frantisek Hrbata
2014-08-18 11:26         ` Frantisek Hrbata
2014-08-20 15:25   ` [PATCH 0/2] x86: allow read/write /dev/mem to access non-system RAM above high_memory Frantisek Hrbata
2014-08-20 15:25     ` Frantisek Hrbata
2014-08-20 15:25     ` [PATCH 1/2] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr Frantisek Hrbata
2014-08-20 15:25       ` Frantisek Hrbata
2014-08-20 15:25     ` [PATCH 2/2] x86: remove high_memory check from valid_phys_addr_range Frantisek Hrbata
2014-08-20 15:25       ` Frantisek Hrbata

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.