* [PATCH 0/1] Prevent possible PTE corruption with /dev/mem mmap @ 2014-08-14 14:18 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Hi all, after some time this issue popped up again. Please note that the patch was send to lkml two times. https://lkml.org/lkml/2013/4/2/297 lkml: <1364905733-23937-1-git-send-email-fhrbata@redhat.com> https://lkml.org/lkml/2013/10/2/359 lkml: <20131002160514.GA25471@localhost.localdomain> It did not get much attention, except H. Peter Anvin's complain that having two checks for mmap and read/write for /dev/mem access is ridiculous. I for sure do not object to this, but AFAICT it's not that simple to unify them and it's not "directly" related to the PTE corruption. Please note that there are other archs(ia64, arm) using these check. But I for sure can be missing something. What the patch does is using the existing interface to implement x86 specific check in the least invasive way. Peter: I by no means want to be pushy. Just that after I looked into this a little bit more, I don't see a better and more straightforward way how to fix this. I will be grateful for any suggestions and help. If we want/need to fix this in a different way, I can for sure try, but I will need at least some guidance. So I'm posting this once more with a hope it will get more attention or at least to start the discussion how/if this should be fixed. The patch is the same except I added a check for phys addr overflow before calling phys_addr_valid. Maybe this check should be in do_mmap_pgoff. Many thanks Frantisek Hrbata (1): x86: add phys addr validity check for /dev/mem mmap arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) -- 1.9.3 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 0/1] Prevent possible PTE corruption with /dev/mem mmap @ 2014-08-14 14:18 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Hi all, after some time this issue popped up again. Please note that the patch was send to lkml two times. https://lkml.org/lkml/2013/4/2/297 lkml: <1364905733-23937-1-git-send-email-fhrbata@redhat.com> https://lkml.org/lkml/2013/10/2/359 lkml: <20131002160514.GA25471@localhost.localdomain> It did not get much attention, except H. Peter Anvin's complain that having two checks for mmap and read/write for /dev/mem access is ridiculous. I for sure do not object to this, but AFAICT it's not that simple to unify them and it's not "directly" related to the PTE corruption. Please note that there are other archs(ia64, arm) using these check. But I for sure can be missing something. What the patch does is using the existing interface to implement x86 specific check in the least invasive way. Peter: I by no means want to be pushy. Just that after I looked into this a little bit more, I don't see a better and more straightforward way how to fix this. I will be grateful for any suggestions and help. If we want/need to fix this in a different way, I can for sure try, but I will need at least some guidance. So I'm posting this once more with a hope it will get more attention or at least to start the discussion how/if this should be fixed. The patch is the same except I added a check for phys addr overflow before calling phys_addr_valid. Maybe this check should be in do_mmap_pgoff. Many thanks Frantisek Hrbata (1): x86: add phys addr validity check for /dev/mem mmap arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap 2014-08-14 14:18 ` Frantisek Hrbata @ 2014-08-14 14:18 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Prevent possible PTE corruption while calling mmap on /dev/mem with large offset. oops info, please note the PTE value 8008000000000225. ---------------------------------8<-------------------------------------- [85739.124496] rep: Corrupted page table at address 7f63852f8000 [85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225 [85739.136941] Bad pagetable: 000d [#1] SMP [85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core [85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1 [85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012 [85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000 [85739.194988] RIP: 0033:[<0000000000400773>] [<0000000000400773>] 0x400773 [85739.201799] RSP: 002b:00007fffe4ca3c80 EFLAGS: 00010213 [85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca [85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000 [85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000 [85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0 [85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000 [85739.242835] FS: 00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000 [85739.250925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0 ---------------------------------8<-------------------------------------- According to [1] Chapter 4 Paging, some higher bits in 64bit PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example, for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR) are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions, the #PF is generated with RSVD error code. <quote> RSVD flag (bit 3). This flag is 1 if there is no valid translation for the linear address because a reserved bit was set in one of the paging-structure entries used to translate that address. (Because reserved bits are not checked in a paging-structure entry whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also set.) </quote> In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always returns 1 for x86. So it's possible to use any pgoff we want and to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a possibility to use mmap on /dev/mem and cause system panic. It's probably not that serious, because access to /dev/mem is limited and the system has to have the panic_on_oops set, but still I think we should check this and return error. The path for this problem is: mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss => walk through paging structures => reserved bit set => #pf with rsvd flag This patch adds check for x86. With this fix mmap returns -EINVAL if the requested phys addr is larger then the supported phys addr width. [1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A x86_64 reproducer ---------------------------------8<-------------------------------------- #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) #define OFFSET 0x8000000000000LL int main(int argc, char *argv[]) { int fd; long ps; long pgoff; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY); if (fd == -1) die("cannot open /dev/mem"); pgoff = (OFFSET + (ps - 1)) & ~(ps - 1); map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- x86_32_PAE reproducer ---------------------------------8<-------------------------------------- #define _GNU_SOURCE #define _LARGEFILE64_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) /* 37th bit in PTE */ #define OFFSET 0x2000000 int main(int argc, char *argv[]) { int fd; long ps; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY|O_LARGEFILE); if (fd == -1) die("cannot open /dev/mem"); map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index b8237d8..55c59d5 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -243,6 +243,10 @@ static inline void flush_write_buffers(void) #endif } +#define ARCH_HAS_VALID_PHYS_ADDR_RANGE +extern int valid_phys_addr_range(phys_addr_t addr, size_t count); +extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count); + #endif /* __KERNEL__ */ extern void native_io_delay(void); diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 25e7e13..b5be2ad 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -31,6 +31,8 @@ #include <linux/sched.h> #include <asm/elf.h> +#include "physaddr.h" + struct __read_mostly va_alignment va_align = { .flags = -1, }; @@ -122,3 +124,19 @@ void arch_pick_mmap_layout(struct mm_struct *mm) mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } + +int valid_phys_addr_range(phys_addr_t addr, size_t count) +{ + return addr + count <= __pa(high_memory); +} + +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) +{ + /* pgoff + count overflow is checked in do_mmap_pgoff */ + pfn += count >> PAGE_SHIFT; + + if (pfn >> BITS_PER_LONG - PAGE_SHIFT) + return -EOVERFLOW; + + return phys_addr_valid(pfn << PAGE_SHIFT); +} -- 1.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-14 14:18 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 14:18 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Prevent possible PTE corruption while calling mmap on /dev/mem with large offset. oops info, please note the PTE value 8008000000000225. ---------------------------------8<-------------------------------------- [85739.124496] rep: Corrupted page table at address 7f63852f8000 [85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225 [85739.136941] Bad pagetable: 000d [#1] SMP [85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core [85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1 [85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012 [85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000 [85739.194988] RIP: 0033:[<0000000000400773>] [<0000000000400773>] 0x400773 [85739.201799] RSP: 002b:00007fffe4ca3c80 EFLAGS: 00010213 [85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca [85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000 [85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000 [85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0 [85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000 [85739.242835] FS: 00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000 [85739.250925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0 ---------------------------------8<-------------------------------------- According to [1] Chapter 4 Paging, some higher bits in 64bit PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example, for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR) are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions, the #PF is generated with RSVD error code. <quote> RSVD flag (bit 3). This flag is 1 if there is no valid translation for the linear address because a reserved bit was set in one of the paging-structure entries used to translate that address. (Because reserved bits are not checked in a paging-structure entry whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also set.) </quote> In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always returns 1 for x86. So it's possible to use any pgoff we want and to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a possibility to use mmap on /dev/mem and cause system panic. It's probably not that serious, because access to /dev/mem is limited and the system has to have the panic_on_oops set, but still I think we should check this and return error. The path for this problem is: mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss => walk through paging structures => reserved bit set => #pf with rsvd flag This patch adds check for x86. With this fix mmap returns -EINVAL if the requested phys addr is larger then the supported phys addr width. [1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A x86_64 reproducer ---------------------------------8<-------------------------------------- #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) #define OFFSET 0x8000000000000LL int main(int argc, char *argv[]) { int fd; long ps; long pgoff; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY); if (fd == -1) die("cannot open /dev/mem"); pgoff = (OFFSET + (ps - 1)) & ~(ps - 1); map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- x86_32_PAE reproducer ---------------------------------8<-------------------------------------- #define _GNU_SOURCE #define _LARGEFILE64_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) /* 37th bit in PTE */ #define OFFSET 0x2000000 int main(int argc, char *argv[]) { int fd; long ps; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY|O_LARGEFILE); if (fd == -1) die("cannot open /dev/mem"); map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index b8237d8..55c59d5 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -243,6 +243,10 @@ static inline void flush_write_buffers(void) #endif } +#define ARCH_HAS_VALID_PHYS_ADDR_RANGE +extern int valid_phys_addr_range(phys_addr_t addr, size_t count); +extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count); + #endif /* __KERNEL__ */ extern void native_io_delay(void); diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 25e7e13..b5be2ad 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -31,6 +31,8 @@ #include <linux/sched.h> #include <asm/elf.h> +#include "physaddr.h" + struct __read_mostly va_alignment va_align = { .flags = -1, }; @@ -122,3 +124,19 @@ void arch_pick_mmap_layout(struct mm_struct *mm) mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } + +int valid_phys_addr_range(phys_addr_t addr, size_t count) +{ + return addr + count <= __pa(high_memory); +} + +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) +{ + /* pgoff + count overflow is checked in do_mmap_pgoff */ + pfn += count >> PAGE_SHIFT; + + if (pfn >> BITS_PER_LONG - PAGE_SHIFT) + return -EOVERFLOW; + + return phys_addr_valid(pfn << PAGE_SHIFT); +} -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap 2014-08-14 14:18 ` Frantisek Hrbata @ 2014-08-14 16:36 ` Dave Hansen -1 siblings, 0 replies; 30+ messages in thread From: Dave Hansen @ 2014-08-14 16:36 UTC (permalink / raw) To: Frantisek Hrbata, linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl Thanks for dredging this back up! On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > +{ > + return addr + count <= __pa(high_memory); > +} Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > +{ Nit: please add units to things like "count". len_bytes would be nice for this kind of thing, especially since it's passed *with* a pfn it would be easy to think it is a count in pages. > + /* pgoff + count overflow is checked in do_mmap_pgoff */ > + pfn += count >> PAGE_SHIFT; > + > + if (pfn >> BITS_PER_LONG - PAGE_SHIFT) > + return -EOVERFLOW; Is this -EOVERFLOW correct? It is called like this: > static int mmap_mem(struct file *file, struct vm_area_struct *vma) > { > if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size)) > return -EINVAL; So I think we need to return true/false:0/1. -EOVERFLOW would be true, and that if() would pass. > + return phys_addr_valid(pfn << PAGE_SHIFT); > +} Maybe I'm dumb, but it took me a minute to figure out what you were trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)". In any case, I think it is wrong on 32-bit. On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000 or pfn=0x100000 (4GB) is perfectly valid with PAE enabled. But, this code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW. I think something like this would be easier to read and actually work on 32-bit: static inline int arch_pfn_possible(unsigned long pfn) { unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits - PAGE_SHIFT); return pfn < max_arch_pfn; } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-14 16:36 ` Dave Hansen 0 siblings, 0 replies; 30+ messages in thread From: Dave Hansen @ 2014-08-14 16:36 UTC (permalink / raw) To: Frantisek Hrbata, linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl Thanks for dredging this back up! On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > +{ > + return addr + count <= __pa(high_memory); > +} Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > +{ Nit: please add units to things like "count". len_bytes would be nice for this kind of thing, especially since it's passed *with* a pfn it would be easy to think it is a count in pages. > + /* pgoff + count overflow is checked in do_mmap_pgoff */ > + pfn += count >> PAGE_SHIFT; > + > + if (pfn >> BITS_PER_LONG - PAGE_SHIFT) > + return -EOVERFLOW; Is this -EOVERFLOW correct? It is called like this: > static int mmap_mem(struct file *file, struct vm_area_struct *vma) > { > if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size)) > return -EINVAL; So I think we need to return true/false:0/1. -EOVERFLOW would be true, and that if() would pass. > + return phys_addr_valid(pfn << PAGE_SHIFT); > +} Maybe I'm dumb, but it took me a minute to figure out what you were trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)". In any case, I think it is wrong on 32-bit. On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000 or pfn=0x100000 (4GB) is perfectly valid with PAE enabled. But, this code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW. I think something like this would be easier to read and actually work on 32-bit: static inline int arch_pfn_possible(unsigned long pfn) { unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits - PAGE_SHIFT); return pfn < max_arch_pfn; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap 2014-08-14 16:36 ` Dave Hansen @ 2014-08-14 17:20 ` H. Peter Anvin -1 siblings, 0 replies; 30+ messages in thread From: H. Peter Anvin @ 2014-08-14 17:20 UTC (permalink / raw) To: Dave Hansen, Frantisek Hrbata, linux-kernel Cc: linux-mm, tglx, mingo, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On 08/14/2014 09:36 AM, Dave Hansen wrote: > Thanks for dredging this back up! > > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: >> +int valid_phys_addr_range(phys_addr_t addr, size_t count) >> +{ >> + return addr + count <= __pa(high_memory); >> +} > > Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. > Last I checked, /dev/mem was already broken for highmem... which is actually a problem. I would prefer to see it fixed. -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-14 17:20 ` H. Peter Anvin 0 siblings, 0 replies; 30+ messages in thread From: H. Peter Anvin @ 2014-08-14 17:20 UTC (permalink / raw) To: Dave Hansen, Frantisek Hrbata, linux-kernel Cc: linux-mm, tglx, mingo, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On 08/14/2014 09:36 AM, Dave Hansen wrote: > Thanks for dredging this back up! > > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: >> +int valid_phys_addr_range(phys_addr_t addr, size_t count) >> +{ >> + return addr + count <= __pa(high_memory); >> +} > > Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. > Last I checked, /dev/mem was already broken for highmem... which is actually a problem. I would prefer to see it fixed. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap 2014-08-14 17:20 ` H. Peter Anvin @ 2014-08-14 17:53 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 17:53 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Hansen, linux-kernel, linux-mm, tglx, mingo, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On Thu, Aug 14, 2014 at 10:20:53AM -0700, H. Peter Anvin wrote: > On 08/14/2014 09:36 AM, Dave Hansen wrote: > > Thanks for dredging this back up! > > > > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: > >> +int valid_phys_addr_range(phys_addr_t addr, size_t count) > >> +{ > >> + return addr + count <= __pa(high_memory); > >> +} > > > > Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. > > > > Last I checked, /dev/mem was already broken for highmem... which is > actually a problem. I would prefer to see it fixed. > > -hpa > Hi Peter, thank you for jumping in again. I'm not going to pretent I fully understand this code, as proven by Dave :), but wouldn't it help just to move the high_memory check directly to the xlate_dev_mem_ptr. Meaning no high_memory check in valid_phys_addr_range for x86. I sent more info in my reply to Dave's mail. Again many thanks. -- Frantisek Hrbata ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-14 17:53 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 17:53 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Hansen, linux-kernel, linux-mm, tglx, mingo, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On Thu, Aug 14, 2014 at 10:20:53AM -0700, H. Peter Anvin wrote: > On 08/14/2014 09:36 AM, Dave Hansen wrote: > > Thanks for dredging this back up! > > > > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: > >> +int valid_phys_addr_range(phys_addr_t addr, size_t count) > >> +{ > >> + return addr + count <= __pa(high_memory); > >> +} > > > > Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. > > > > Last I checked, /dev/mem was already broken for highmem... which is > actually a problem. I would prefer to see it fixed. > > -hpa > Hi Peter, thank you for jumping in again. I'm not going to pretent I fully understand this code, as proven by Dave :), but wouldn't it help just to move the high_memory check directly to the xlate_dev_mem_ptr. Meaning no high_memory check in valid_phys_addr_range for x86. I sent more info in my reply to Dave's mail. Again many thanks. -- Frantisek Hrbata -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap 2014-08-14 16:36 ` Dave Hansen @ 2014-08-14 17:40 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 17:40 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On Thu, Aug 14, 2014 at 09:36:03AM -0700, Dave Hansen wrote: > Thanks for dredging this back up! > > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: > > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > > +{ > > + return addr + count <= __pa(high_memory); > > +} > > Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. Unfortunatelly this is how it works right now. Please note that at this moment x86 is using the default checks from drivers/char/mem.c. The valid_phys_addr_range is used just for read/write. Meaning yes, you cannot access /dev/mem above high_memory via read/write, which is 896MB on x86_32. I simply copied this generic check. There is no change compared to the current behaviour. BTW I think this can be simply fixed by moving the high_memory check directly to the xlate_dev_mem_ptr function. Something like the following. Please note this is not even compile tested. diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index baff1da..7ebc241 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -321,7 +321,7 @@ void *xlate_dev_mem_ptr(unsigned long phys) unsigned long start = phys & PAGE_MASK; /* If page is RAM, we can use __va. Otherwise ioremap and unmap. */ - if (page_is_ram(start >> PAGE_SHIFT)) + if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory)) return __va(phys); addr = (void __force *)ioremap_cache(start, PAGE_SIZE); @@ -333,7 +333,7 @@ void *xlate_dev_mem_ptr(unsigned long phys) void unxlate_dev_mem_ptr(unsigned long phys, void *addr) { - if (page_is_ram(phys >> PAGE_SHIFT)) + if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory)) return; iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK)); IIUIC the whole high_memory check is here only because of the kernel identity mapping and as a generic check because of the generic xlate_dev_mem_ptr. include/asm-generic/io.h: #define xlate_dev_mem_ptr(p) __va(p) > > > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > > +{ > > Nit: please add units to things like "count". len_bytes would be nice > for this kind of thing, especially since it's passed *with* a pfn it > would be easy to think it is a count in pages. Sure I have no problem with this. But please note that I just took the already used/presented interface from drivers/char/mem.c. > > > + /* pgoff + count overflow is checked in do_mmap_pgoff */ > > + pfn += count >> PAGE_SHIFT; > > + > > + if (pfn >> BITS_PER_LONG - PAGE_SHIFT) > > + return -EOVERFLOW; > > Is this -EOVERFLOW correct? It is called like this: > > > static int mmap_mem(struct file *file, struct vm_area_struct *vma) > > { > > if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size)) > > return -EINVAL; > > So I think we need to return true/false:0/1. -EOVERFLOW would be true, > and that if() would pass. Facepalm, sure this is completely wrong. We of course need to return zero. I thought it would be more descriptive to use -EOVERFLOW, even though we get -EINVAL in the end. I will fix this. Many thanks for pointing this out. > > > + return phys_addr_valid(pfn << PAGE_SHIFT); > > +} > > Maybe I'm dumb, but it took me a minute to figure out what you were > trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)". In any > case, I think it is wrong on 32-bit. > > On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000 > or pfn=0x100000 (4GB) is perfectly valid with PAE enabled. But, this > code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW. Right, I did not realized this. > > I think something like this would be easier to read and actually work on > 32-bit: > > static inline int arch_pfn_possible(unsigned long pfn) > { > unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits - > PAGE_SHIFT); > return pfn < max_arch_pfn; > } Actually I wanted to use exactly this instead of calling phys_addr_valid, because we can avoid the whole overflow check, but it seemed natural to use what is already available. But you are right for sure. This needs to be fixed also. Dave, many thanks for your feedback, it's really appreciated! -- Frantisek Hrbata ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-14 17:40 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-14 17:40 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On Thu, Aug 14, 2014 at 09:36:03AM -0700, Dave Hansen wrote: > Thanks for dredging this back up! > > On 08/14/2014 07:18 AM, Frantisek Hrbata wrote: > > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > > +{ > > + return addr + count <= __pa(high_memory); > > +} > > Is this correct on 32-bit? It would limit /dev/mem to memory below 896MB. Unfortunatelly this is how it works right now. Please note that at this moment x86 is using the default checks from drivers/char/mem.c. The valid_phys_addr_range is used just for read/write. Meaning yes, you cannot access /dev/mem above high_memory via read/write, which is 896MB on x86_32. I simply copied this generic check. There is no change compared to the current behaviour. BTW I think this can be simply fixed by moving the high_memory check directly to the xlate_dev_mem_ptr function. Something like the following. Please note this is not even compile tested. diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index baff1da..7ebc241 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -321,7 +321,7 @@ void *xlate_dev_mem_ptr(unsigned long phys) unsigned long start = phys & PAGE_MASK; /* If page is RAM, we can use __va. Otherwise ioremap and unmap. */ - if (page_is_ram(start >> PAGE_SHIFT)) + if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory)) return __va(phys); addr = (void __force *)ioremap_cache(start, PAGE_SIZE); @@ -333,7 +333,7 @@ void *xlate_dev_mem_ptr(unsigned long phys) void unxlate_dev_mem_ptr(unsigned long phys, void *addr) { - if (page_is_ram(phys >> PAGE_SHIFT)) + if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory)) return; iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK)); IIUIC the whole high_memory check is here only because of the kernel identity mapping and as a generic check because of the generic xlate_dev_mem_ptr. include/asm-generic/io.h: #define xlate_dev_mem_ptr(p) __va(p) > > > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > > +{ > > Nit: please add units to things like "count". len_bytes would be nice > for this kind of thing, especially since it's passed *with* a pfn it > would be easy to think it is a count in pages. Sure I have no problem with this. But please note that I just took the already used/presented interface from drivers/char/mem.c. > > > + /* pgoff + count overflow is checked in do_mmap_pgoff */ > > + pfn += count >> PAGE_SHIFT; > > + > > + if (pfn >> BITS_PER_LONG - PAGE_SHIFT) > > + return -EOVERFLOW; > > Is this -EOVERFLOW correct? It is called like this: > > > static int mmap_mem(struct file *file, struct vm_area_struct *vma) > > { > > if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size)) > > return -EINVAL; > > So I think we need to return true/false:0/1. -EOVERFLOW would be true, > and that if() would pass. Facepalm, sure this is completely wrong. We of course need to return zero. I thought it would be more descriptive to use -EOVERFLOW, even though we get -EINVAL in the end. I will fix this. Many thanks for pointing this out. > > > + return phys_addr_valid(pfn << PAGE_SHIFT); > > +} > > Maybe I'm dumb, but it took me a minute to figure out what you were > trying to do with the: "(pfn >> BITS_PER_LONG - PAGE_SHIFT)". In any > case, I think it is wrong on 32-bit. > > On 32-bit, BITS_PER_LONG=32, and PAGE_SIZE=12, and a paddr=0x100000000 > or pfn=0x100000 (4GB) is perfectly valid with PAE enabled. But, this > code pfn>>(32-12) would result in 0x1 and return -EOVERFLOW. Right, I did not realized this. > > I think something like this would be easier to read and actually work on > 32-bit: > > static inline int arch_pfn_possible(unsigned long pfn) > { > unsigned long max_arch_pfn = 1UL << (boot_cpu_data.x86_phys_bits - > PAGE_SHIFT); > return pfn < max_arch_pfn; > } Actually I wanted to use exactly this instead of calling phys_addr_valid, because we can avoid the whole overflow check, but it seemed natural to use what is already available. But you are right for sure. This needs to be fixed also. Dave, many thanks for your feedback, it's really appreciated! -- Frantisek Hrbata -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap 2014-08-14 14:18 ` Frantisek Hrbata @ 2014-08-15 10:17 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 10:17 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl self-nack As pointed by Dave Hansen, the check is just wrong. I will post V2. Many thanks Dave! -- Frantisek Hrbata ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/1] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-15 10:17 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 10:17 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl self-nack As pointed by Dave Hansen, the check is just wrong. I will post V2. Many thanks Dave! -- Frantisek Hrbata -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH V2 0/2] Prevent possible PTE corruption with /dev/mem mmap 2014-08-14 14:18 ` Frantisek Hrbata @ 2014-08-15 11:44 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Thanks to Dave Hansen for pointing out problems related to the pfn check in the first version. I tried to fix it by adding a new arch_pfn_possible helper to arch/x86/mm/physaddr.h. Please note that I'm not quite sure about the name and the location(physaddr.h). Maybe we can keep the check directly in the valid_mmap_phys_addr_range. I will leave this to a discussion and fix it if required. Question: Do we need the CONFIG_PHYS_ADDR_T_64BIT ifdef? The boot_cpu_data.x86_phys_bits is set for all x86. So at this point it seems to me more like an "optimization" for x86_32 or something kept from historic reasons. I'm just curious and I of course may be missing something. Many thanks Frantisek Hrbata (2): x86: add arch_pfn_possible helper x86: add phys addr validity check for /dev/mem mmap arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 12 ++++++++++++ arch/x86/mm/physaddr.h | 9 +++++++-- 3 files changed, 23 insertions(+), 2 deletions(-) -- 1.9.3 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH V2 0/2] Prevent possible PTE corruption with /dev/mem mmap @ 2014-08-15 11:44 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Thanks to Dave Hansen for pointing out problems related to the pfn check in the first version. I tried to fix it by adding a new arch_pfn_possible helper to arch/x86/mm/physaddr.h. Please note that I'm not quite sure about the name and the location(physaddr.h). Maybe we can keep the check directly in the valid_mmap_phys_addr_range. I will leave this to a discussion and fix it if required. Question: Do we need the CONFIG_PHYS_ADDR_T_64BIT ifdef? The boot_cpu_data.x86_phys_bits is set for all x86. So at this point it seems to me more like an "optimization" for x86_32 or something kept from historic reasons. I'm just curious and I of course may be missing something. Many thanks Frantisek Hrbata (2): x86: add arch_pfn_possible helper x86: add phys addr validity check for /dev/mem mmap arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 12 ++++++++++++ arch/x86/mm/physaddr.h | 9 +++++++-- 3 files changed, 23 insertions(+), 2 deletions(-) -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH V2 1/2] x86: add arch_pfn_possible helper 2014-08-15 11:44 ` Frantisek Hrbata @ 2014-08-15 11:44 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Add helper to check maximum possible pfn on x86. Also make the current phys_addr_valid helper using it internally. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/mm/physaddr.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h index a3cd5a0..9df8e3a 100644 --- a/arch/x86/mm/physaddr.h +++ b/arch/x86/mm/physaddr.h @@ -1,10 +1,15 @@ #include <asm/processor.h> -static inline int phys_addr_valid(resource_size_t addr) +static inline int arch_pfn_possible(unsigned long pfn) { #ifdef CONFIG_PHYS_ADDR_T_64BIT - return !(addr >> boot_cpu_data.x86_phys_bits); + return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT)); #else return 1; #endif } + +static inline int phys_addr_valid(resource_size_t addr) +{ + return arch_pfn_possible(addr >> PAGE_SHIFT); +} -- 1.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH V2 1/2] x86: add arch_pfn_possible helper @ 2014-08-15 11:44 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Add helper to check maximum possible pfn on x86. Also make the current phys_addr_valid helper using it internally. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/mm/physaddr.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/physaddr.h b/arch/x86/mm/physaddr.h index a3cd5a0..9df8e3a 100644 --- a/arch/x86/mm/physaddr.h +++ b/arch/x86/mm/physaddr.h @@ -1,10 +1,15 @@ #include <asm/processor.h> -static inline int phys_addr_valid(resource_size_t addr) +static inline int arch_pfn_possible(unsigned long pfn) { #ifdef CONFIG_PHYS_ADDR_T_64BIT - return !(addr >> boot_cpu_data.x86_phys_bits); + return !(pfn >> (boot_cpu_data.x86_phys_bits - PAGE_SHIFT)); #else return 1; #endif } + +static inline int phys_addr_valid(resource_size_t addr) +{ + return arch_pfn_possible(addr >> PAGE_SHIFT); +} -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap 2014-08-15 11:44 ` Frantisek Hrbata @ 2014-08-15 11:44 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Prevent possible PTE corruption while calling mmap on /dev/mem with large offset. oops info, please note the PTE value 8008000000000225. ---------------------------------8<-------------------------------------- [85739.124496] rep: Corrupted page table at address 7f63852f8000 [85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225 [85739.136941] Bad pagetable: 000d [#1] SMP [85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core [85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1 [85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012 [85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000 [85739.194988] RIP: 0033:[<0000000000400773>] [<0000000000400773>] 0x400773 [85739.201799] RSP: 002b:00007fffe4ca3c80 EFLAGS: 00010213 [85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca [85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000 [85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000 [85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0 [85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000 [85739.242835] FS: 00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000 [85739.250925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0 ---------------------------------8<-------------------------------------- According to [1] Chapter 4 Paging, some higher bits in 64bit PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example, for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR) are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions, the #PF is generated with RSVD error code. <quote> RSVD flag (bit 3). This flag is 1 if there is no valid translation for the linear address because a reserved bit was set in one of the paging-structure entries used to translate that address. (Because reserved bits are not checked in a paging-structure entry whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also set.) </quote> In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always returns 1 for x86. So it's possible to use any pgoff we want and to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a possibility to use mmap on /dev/mem and cause system panic. It's probably not that serious, because access to /dev/mem is limited and the system has to have the panic_on_oops set, but still I think we should check this and return error. The path for this problem is: mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss => walk through paging structures => reserved bit set => #pf with rsvd flag This patch adds check for x86. With this fix mmap returns -EINVAL if the requested phys addr is larger then the supported phys addr width. [1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A x86_64 reproducer ---------------------------------8<-------------------------------------- #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) #define OFFSET 0x8000000000000LL int main(int argc, char *argv[]) { int fd; long ps; long pgoff; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY); if (fd == -1) die("cannot open /dev/mem"); pgoff = (OFFSET + (ps - 1)) & ~(ps - 1); map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- x86_32_PAE reproducer ---------------------------------8<-------------------------------------- #define _GNU_SOURCE #define _LARGEFILE64_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) /* 37th bit in PTE */ #define OFFSET 0x2000000 int main(int argc, char *argv[]) { int fd; long ps; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY|O_LARGEFILE); if (fd == -1) die("cannot open /dev/mem"); map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 12 ++++++++++++ 2 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index b8237d8..55c59d5 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -243,6 +243,10 @@ static inline void flush_write_buffers(void) #endif } +#define ARCH_HAS_VALID_PHYS_ADDR_RANGE +extern int valid_phys_addr_range(phys_addr_t addr, size_t count); +extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count); + #endif /* __KERNEL__ */ extern void native_io_delay(void); diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 25e7e13..110473e 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -31,6 +31,8 @@ #include <linux/sched.h> #include <asm/elf.h> +#include "physaddr.h" + struct __read_mostly va_alignment va_align = { .flags = -1, }; @@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm) mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } + +int valid_phys_addr_range(phys_addr_t addr, size_t count) +{ + return addr + count <= __pa(high_memory); +} + +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) +{ + return arch_pfn_possible(pfn + (count >> PAGE_SHIFT)); +} -- 1.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-15 11:44 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-15 11:44 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Prevent possible PTE corruption while calling mmap on /dev/mem with large offset. oops info, please note the PTE value 8008000000000225. ---------------------------------8<-------------------------------------- [85739.124496] rep: Corrupted page table at address 7f63852f8000 [85739.130242] PGD ba2eb067 PUD b99c1067 PMD a2fa5067 PTE 8008000000000225 [85739.136941] Bad pagetable: 000d [#1] SMP [85739.141002] Modules linked in: cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm bnx2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode iTCO_wdt ipmi_si i2c_i801 iTCO_vendor_support ipmi_msghandler dcdbas shpchp lpc_ich mfd_core nfsd auth_rpcgss nfs_acl lockd sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core [85739.172620] CPU: 3 PID: 21900 Comm: rep Not tainted 3.15.8-200.fc20.x86_64 #1 [85739.179768] Hardware name: Dell Inc. PowerEdge R210 II/09T7VV, BIOS 2.0.4 02/29/2012 [85739.187512] task: ffff8800b9b3b160 ti: ffff8800ba270000 task.ti: ffff8800ba270000 [85739.194988] RIP: 0033:[<0000000000400773>] [<0000000000400773>] 0x400773 [85739.201799] RSP: 002b:00007fffe4ca3c80 EFLAGS: 00010213 [85739.207119] RAX: 00007f63852f8000 RBX: 0000000000000000 RCX: 00007f6384e0b8ca [85739.214249] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000 [85739.221407] RBP: 00007fffe4ca3cc0 R08: 0000000000000003 R09: 0008000000000000 [85739.228545] R10: 0000000000000001 R11: 0000000000000206 R12: 00000000004005b0 [85739.235676] R13: 00007fffe4ca3da0 R14: 0000000000000000 R15: 0000000000000000 [85739.242835] FS: 00007f63852ea740(0000) GS:ffff88013fcc0000(0000) knlGS:0000000000000000 [85739.250925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [85739.256669] CR2: 00007f63852f8000 CR3: 00000000b9ba0000 CR4: 00000000001407e0 ---------------------------------8<-------------------------------------- According to [1] Chapter 4 Paging, some higher bits in 64bit PTE(X86_64 || X86_32_PAE) are reserved and have to be set to zero. For example, for IA-32e and 4KB page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR) are reserved. So for a CPU with e.g. 48bit phys addr width, bits 51-48 have to be zero. If one of the reserved bits is set, [1] 4.7 Page-Fault Exceptions, the #PF is generated with RSVD error code. <quote> RSVD flag (bit 3). This flag is 1 if there is no valid translation for the linear address because a reserved bit was set in one of the paging-structure entries used to translate that address. (Because reserved bits are not checked in a paging-structure entry whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also set.) </quote> In mmap_mem() the first check is valid_mmap_phys_addr_range(), but it always returns 1 for x86. So it's possible to use any pgoff we want and to set the PTE's reserved bits in remap_pfn_range(). Meaning there is a possibility to use mmap on /dev/mem and cause system panic. It's probably not that serious, because access to /dev/mem is limited and the system has to have the panic_on_oops set, but still I think we should check this and return error. The path for this problem is: mmap_mem() => remap_pfn_range() => page present => touch page => tlb miss => walk through paging structures => reserved bit set => #pf with rsvd flag This patch adds check for x86. With this fix mmap returns -EINVAL if the requested phys addr is larger then the supported phys addr width. [1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A x86_64 reproducer ---------------------------------8<-------------------------------------- #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) #define OFFSET 0x8000000000000LL int main(int argc, char *argv[]) { int fd; long ps; long pgoff; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY); if (fd == -1) die("cannot open /dev/mem"); pgoff = (OFFSET + (ps - 1)) & ~(ps - 1); map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- x86_32_PAE reproducer ---------------------------------8<-------------------------------------- #define _GNU_SOURCE #define _LARGEFILE64_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) /* 37th bit in PTE */ #define OFFSET 0x2000000 int main(int argc, char *argv[]) { int fd; long ps; char *map; char c; ps = sysconf(_SC_PAGE_SIZE); if (ps == -1) die("cannot get page size"); fd = open("/dev/mem", O_RDONLY|O_LARGEFILE); if (fd == -1) die("cannot open /dev/mem"); map = (char *)syscall(SYS_mmap2, NULL, ps, PROT_READ, MAP_SHARED, fd, OFFSET); if (map == MAP_FAILED) die("cannot mmap"); c = map[0]; if (munmap(map, ps) == -1) die("cannot munmap"); if (close(fd) == -1) die("cannot close"); return 0; } ---------------------------------8<-------------------------------------- V2: fix pfn check in valid_mmap_phys_addr_range, thanks to Dave Hansen Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/include/asm/io.h | 4 ++++ arch/x86/mm/mmap.c | 12 ++++++++++++ 2 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index b8237d8..55c59d5 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -243,6 +243,10 @@ static inline void flush_write_buffers(void) #endif } +#define ARCH_HAS_VALID_PHYS_ADDR_RANGE +extern int valid_phys_addr_range(phys_addr_t addr, size_t count); +extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t count); + #endif /* __KERNEL__ */ extern void native_io_delay(void); diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 25e7e13..110473e 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -31,6 +31,8 @@ #include <linux/sched.h> #include <asm/elf.h> +#include "physaddr.h" + struct __read_mostly va_alignment va_align = { .flags = -1, }; @@ -122,3 +124,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm) mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } + +int valid_phys_addr_range(phys_addr_t addr, size_t count) +{ + return addr + count <= __pa(high_memory); +} + +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) +{ + return arch_pfn_possible(pfn + (count >> PAGE_SHIFT)); +} -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap 2014-08-15 11:44 ` Frantisek Hrbata @ 2014-08-15 18:10 ` Dave Hansen -1 siblings, 0 replies; 30+ messages in thread From: Dave Hansen @ 2014-08-15 18:10 UTC (permalink / raw) To: Frantisek Hrbata, linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On 08/15/2014 04:44 AM, Frantisek Hrbata wrote: > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > +{ > + return addr + count <= __pa(high_memory); > +} > + > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > +{ > + return arch_pfn_possible(pfn + (count >> PAGE_SHIFT)); > +} It definitely fixes the issue as you described it. It's a bit unfortunate that the highmem check isn't tied in to the _existing_ /dev/mem limitations in some way, but it's not a deal breaker for me. The only other thing is to make sure this doesn't add some limitation to 64-bit where we can't map things above the end of memory (end of memory == high_memory on 64-bit). As long as you've done this, I can't see a downside. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-15 18:10 ` Dave Hansen 0 siblings, 0 replies; 30+ messages in thread From: Dave Hansen @ 2014-08-15 18:10 UTC (permalink / raw) To: Frantisek Hrbata, linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On 08/15/2014 04:44 AM, Frantisek Hrbata wrote: > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > +{ > + return addr + count <= __pa(high_memory); > +} > + > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > +{ > + return arch_pfn_possible(pfn + (count >> PAGE_SHIFT)); > +} It definitely fixes the issue as you described it. It's a bit unfortunate that the highmem check isn't tied in to the _existing_ /dev/mem limitations in some way, but it's not a deal breaker for me. The only other thing is to make sure this doesn't add some limitation to 64-bit where we can't map things above the end of memory (end of memory == high_memory on 64-bit). As long as you've done this, I can't see a downside. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap 2014-08-15 18:10 ` Dave Hansen @ 2014-08-18 11:26 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-18 11:26 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On Fri, Aug 15, 2014 at 11:10:25AM -0700, Dave Hansen wrote: > On 08/15/2014 04:44 AM, Frantisek Hrbata wrote: > > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > > +{ > > + return addr + count <= __pa(high_memory); > > +} > > + > > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > > +{ > > + return arch_pfn_possible(pfn + (count >> PAGE_SHIFT)); > > +} > > It definitely fixes the issue as you described it. Hi Dave, many thanks for your time and help with this! > > It's a bit unfortunate that the highmem check isn't tied in to the > _existing_ /dev/mem limitations in some way, but it's not a deal breaker > for me. Agreed, I will do some more testing with the "patch" I proposed earlier in our discussion. Meaning the one moving the high_memory check out of the valid_phys_addr_range() to the xlate_dev_mem_ptr() for x86. IMHO this should work fine and it should remove the high_memory limitation. But I for sure can be missing something. If the testing goes well I will post the patch. > > The only other thing is to make sure this doesn't add some limitation to > 64-bit where we can't map things above the end of memory (end of memory > == high_memory on 64-bit). As long as you've done this, I can't see a > downside. Yes, from what I have tested, this patch should not introduce any new limitation, except fixing the PTE problem. Also please note that this kind of check is already done in ioremap by calling the phys_addr_valid(). Again, I hope I haven't overlooked something. Peter and others: Could you please consider including this fix? Of course only if you do not have any other objections or problems with it. Many thanks! -- Frantisek Hrbata ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap @ 2014-08-18 11:26 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-18 11:26 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dvlasenk, prarit, lwoodman, hannsj_uhl On Fri, Aug 15, 2014 at 11:10:25AM -0700, Dave Hansen wrote: > On 08/15/2014 04:44 AM, Frantisek Hrbata wrote: > > +int valid_phys_addr_range(phys_addr_t addr, size_t count) > > +{ > > + return addr + count <= __pa(high_memory); > > +} > > + > > +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) > > +{ > > + return arch_pfn_possible(pfn + (count >> PAGE_SHIFT)); > > +} > > It definitely fixes the issue as you described it. Hi Dave, many thanks for your time and help with this! > > It's a bit unfortunate that the highmem check isn't tied in to the > _existing_ /dev/mem limitations in some way, but it's not a deal breaker > for me. Agreed, I will do some more testing with the "patch" I proposed earlier in our discussion. Meaning the one moving the high_memory check out of the valid_phys_addr_range() to the xlate_dev_mem_ptr() for x86. IMHO this should work fine and it should remove the high_memory limitation. But I for sure can be missing something. If the testing goes well I will post the patch. > > The only other thing is to make sure this doesn't add some limitation to > 64-bit where we can't map things above the end of memory (end of memory > == high_memory on 64-bit). As long as you've done this, I can't see a > downside. Yes, from what I have tested, this patch should not introduce any new limitation, except fixing the PTE problem. Also please note that this kind of check is already done in ioremap by calling the phys_addr_valid(). Again, I hope I haven't overlooked something. Peter and others: Could you please consider including this fix? Of course only if you do not have any other objections or problems with it. Many thanks! -- Frantisek Hrbata -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 0/2] x86: allow read/write /dev/mem to access non-system RAM above high_memory 2014-08-15 11:44 ` Frantisek Hrbata @ 2014-08-20 15:25 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Hi, this patch set depends on the following patches posted earlier. [PATCH V2 1/2] x86: add arch_pfn_possible helper lkml: Message-Id: <1408103043-31015-2-git-send-email-fhrbata@redhat.com> [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap lkml: Message-Id: <1408103043-31015-3-git-send-email-fhrbata@redhat.com> This is an attempt to remove the high_memory limit for the read/write access to /dev/mem. IMHO there is no reason for this limit on x86. It is presented in the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the direct kernel mapping __va translation. Since the valid_phys_addr_range is called as the first check in (read|write)_mem, it basically does not allow to access anything above high_memory on x86. The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so the direct kernel mapping can be safely used for system RAM bellow high_memory. This is IMHO the only valid reason to use high_memory check in (read|write)_mem. The second patch removes the high_memory check from valid_phys_addr_range, allowing read/write to access non-system RAM above high_memory. So far this was possible only by using mmap. I hope I haven't overlooked something. Many thanks Frantisek Hrbata (2): x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr x86: remove high_memory check from valid_phys_addr_range arch/x86/mm/ioremap.c | 9 ++++++--- arch/x86/mm/mmap.c | 2 +- 2 files changed, 7 insertions(+), 4 deletions(-) -- 1.9.3 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 0/2] x86: allow read/write /dev/mem to access non-system RAM above high_memory @ 2014-08-20 15:25 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl Hi, this patch set depends on the following patches posted earlier. [PATCH V2 1/2] x86: add arch_pfn_possible helper lkml: Message-Id: <1408103043-31015-2-git-send-email-fhrbata@redhat.com> [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap lkml: Message-Id: <1408103043-31015-3-git-send-email-fhrbata@redhat.com> This is an attempt to remove the high_memory limit for the read/write access to /dev/mem. IMHO there is no reason for this limit on x86. It is presented in the generic valid_phys_addr_range, which is used only by (read|write)_mem. IIUIC it's main purpose is for the generic xlate_dev_mem_ptr, which is using only the direct kernel mapping __va translation. Since the valid_phys_addr_range is called as the first check in (read|write)_mem, it basically does not allow to access anything above high_memory on x86. The first patch adds high_memory check to x86's (xlate|unxlate)_dev_mem_ptr, so the direct kernel mapping can be safely used for system RAM bellow high_memory. This is IMHO the only valid reason to use high_memory check in (read|write)_mem. The second patch removes the high_memory check from valid_phys_addr_range, allowing read/write to access non-system RAM above high_memory. So far this was possible only by using mmap. I hope I haven't overlooked something. Many thanks Frantisek Hrbata (2): x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr x86: remove high_memory check from valid_phys_addr_range arch/x86/mm/ioremap.c | 9 ++++++--- arch/x86/mm/mmap.c | 2 +- 2 files changed, 7 insertions(+), 4 deletions(-) -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr 2014-08-20 15:25 ` Frantisek Hrbata @ 2014-08-20 15:25 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic high_memory check in valid_phys_addr_range(), which does not allow to access any memory above high_memory whatsoever. By adding the high_memory check to (xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for kernel mapped memory and it will also allow read/write to access non-system RAM above high_memory once the high_memory check is removed from valid_phys_addr_range. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/mm/ioremap.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index baff1da..1ae7323 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys) void *addr; unsigned long start = phys & PAGE_MASK; - /* If page is RAM, we can use __va. Otherwise ioremap and unmap. */ - if (page_is_ram(start >> PAGE_SHIFT)) + /* + * If page is RAM and is mapped by kernel, we can use __va. + * Otherwise ioremap and unmap. + */ + if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory)) return __va(phys); addr = (void __force *)ioremap_cache(start, PAGE_SIZE); @@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys) void unxlate_dev_mem_ptr(unsigned long phys, void *addr) { - if (page_is_ram(phys >> PAGE_SHIFT)) + if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory)) return; iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK)); -- 1.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 1/2] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr @ 2014-08-20 15:25 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl So far (xlate|unxlate)_dev_mem_ptr for read/write /dev/mem relies on a generic high_memory check in valid_phys_addr_range(), which does not allow to access any memory above high_memory whatsoever. By adding the high_memory check to (xlate|unxlate)_dev_mem_ptr, it still will be possible to use __va safely for kernel mapped memory and it will also allow read/write to access non-system RAM above high_memory once the high_memory check is removed from valid_phys_addr_range. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/mm/ioremap.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index baff1da..1ae7323 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -320,8 +320,11 @@ void *xlate_dev_mem_ptr(unsigned long phys) void *addr; unsigned long start = phys & PAGE_MASK; - /* If page is RAM, we can use __va. Otherwise ioremap and unmap. */ - if (page_is_ram(start >> PAGE_SHIFT)) + /* + * If page is RAM and is mapped by kernel, we can use __va. + * Otherwise ioremap and unmap. + */ + if (page_is_ram(start >> PAGE_SHIFT) && phys <= __pa(high_memory)) return __va(phys); addr = (void __force *)ioremap_cache(start, PAGE_SIZE); @@ -333,7 +336,7 @@ void *xlate_dev_mem_ptr(unsigned long phys) void unxlate_dev_mem_ptr(unsigned long phys, void *addr) { - if (page_is_ram(phys >> PAGE_SHIFT)) + if (page_is_ram(phys >> PAGE_SHIFT) && phys <= __pa(high_memory)) return; iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK)); -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] x86: remove high_memory check from valid_phys_addr_range 2014-08-20 15:25 ` Frantisek Hrbata @ 2014-08-20 15:25 ` Frantisek Hrbata -1 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl There is no need to block read/write access to /dev/mem for phys. addr. above high_memory for non-system RAM. The only limitation should be boot_cpu_data.x86_phys_bits(max phys. addr. size). Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 110473e..16f222d 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm) int valid_phys_addr_range(phys_addr_t addr, size_t count) { - return addr + count <= __pa(high_memory); + return arch_pfn_possible((addr + count) >> PAGE_SHIFT); } int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) -- 1.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] x86: remove high_memory check from valid_phys_addr_range @ 2014-08-20 15:25 ` Frantisek Hrbata 0 siblings, 0 replies; 30+ messages in thread From: Frantisek Hrbata @ 2014-08-20 15:25 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, tglx, mingo, hpa, x86, oleg, kamaleshb, hechjie, akpm, dave.hansen, dvlasenk, prarit, lwoodman, hannsj_uhl There is no need to block read/write access to /dev/mem for phys. addr. above high_memory for non-system RAM. The only limitation should be boot_cpu_data.x86_phys_bits(max phys. addr. size). Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> --- arch/x86/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 110473e..16f222d 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -127,7 +127,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm) int valid_phys_addr_range(phys_addr_t addr, size_t count) { - return addr + count <= __pa(high_memory); + return arch_pfn_possible((addr + count) >> PAGE_SHIFT); } int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 30+ messages in thread
end of thread, other threads:[~2014-08-20 15:26 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-14 14:18 [PATCH 0/1] Prevent possible PTE corruption with /dev/mem mmap Frantisek Hrbata 2014-08-14 14:18 ` Frantisek Hrbata 2014-08-14 14:18 ` [PATCH 1/1] x86: add phys addr validity check for " Frantisek Hrbata 2014-08-14 14:18 ` Frantisek Hrbata 2014-08-14 16:36 ` Dave Hansen 2014-08-14 16:36 ` Dave Hansen 2014-08-14 17:20 ` H. Peter Anvin 2014-08-14 17:20 ` H. Peter Anvin 2014-08-14 17:53 ` Frantisek Hrbata 2014-08-14 17:53 ` Frantisek Hrbata 2014-08-14 17:40 ` Frantisek Hrbata 2014-08-14 17:40 ` Frantisek Hrbata 2014-08-15 10:17 ` Frantisek Hrbata 2014-08-15 10:17 ` Frantisek Hrbata 2014-08-15 11:44 ` [PATCH V2 0/2] Prevent possible PTE corruption with " Frantisek Hrbata 2014-08-15 11:44 ` Frantisek Hrbata 2014-08-15 11:44 ` [PATCH V2 1/2] x86: add arch_pfn_possible helper Frantisek Hrbata 2014-08-15 11:44 ` Frantisek Hrbata 2014-08-15 11:44 ` [PATCH V2 2/2] x86: add phys addr validity check for /dev/mem mmap Frantisek Hrbata 2014-08-15 11:44 ` Frantisek Hrbata 2014-08-15 18:10 ` Dave Hansen 2014-08-15 18:10 ` Dave Hansen 2014-08-18 11:26 ` Frantisek Hrbata 2014-08-18 11:26 ` Frantisek Hrbata 2014-08-20 15:25 ` [PATCH 0/2] x86: allow read/write /dev/mem to access non-system RAM above high_memory Frantisek Hrbata 2014-08-20 15:25 ` Frantisek Hrbata 2014-08-20 15:25 ` [PATCH 1/2] x86: add high_memory check to (xlate|unxlate)_dev_mem_ptr Frantisek Hrbata 2014-08-20 15:25 ` Frantisek Hrbata 2014-08-20 15:25 ` [PATCH 2/2] x86: remove high_memory check from valid_phys_addr_range Frantisek Hrbata 2014-08-20 15:25 ` Frantisek Hrbata
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.