From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by kanga.kvack.org (Postfix) with ESMTP id E14A16B0036 for ; Mon, 19 May 2014 18:58:41 -0400 (EDT) Received: by mail-pa0-f51.google.com with SMTP id kq14so6445525pab.38 for ; Mon, 19 May 2014 15:58:41 -0700 (PDT) Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by mx.google.com with ESMTPS id ek4si10564211pbc.511.2014.05.19.15.58.40 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 15:58:40 -0700 (PDT) Received: by mail-pa0-f53.google.com with SMTP id kp14so6414133pab.12 for ; Mon, 19 May 2014 15:58:40 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH 0/4] x86,mm: vdso fixes for an OOPS and /proc/PID/maps Date: Mon, 19 May 2014 15:58:30 -0700 Message-Id: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org, Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones Cc: LKML , Cyrill Gorcunov , Pavel Emelyanov , "H. Peter Anvin" , Andy Lutomirski [This applies to tip/x86/vdso. Patch 1/4 is a resend.] This fixes an OOPS on systems without an HPET and incomplete information in /proc/PID/maps. The latter is done by adding a new vm_ops callback to replace arch_vma_name, which is inflexible and awkward to use correctly. With this series applied, calling mremap on the vdso results in sensible output in /proc/PID/maps and the vvar area shows up correctly. I don't want to guarantee that mremap on the vdso will do anything sensible right now, but that's unchanged from before. In fact, I suspect that mremapping the vdso on 32-bit tasks is rather broken right now due to sigreturn. In current kernels, mremapping the vdso blows away the name: badc0de0000-badc0de2000 r-xp 00000000 00:00 0 Now it doesn't: badc0de0000-badc0de1000 r-xp 00000000 00:00 0 [vdso] As a followup, it might pay to replace install_special_mapping with a new install_vdso_mapping function that hardcodes the "[vdso]" name, to separately fix all the other arch_vma_name users (maybe just ARM?) and then kill arch_vma_name completely. NB: This touches core mm code. I'd appreciate some review by the mm folks. Andy Lutomirski (4): x86,vdso: Fix an OOPS accessing the hpet mapping w/o an hpet mm,fs: Add vm_ops->name as an alternative to arch_vma_name x86,mm: Improve _install_special_mapping and fix x86 vdso naming x86,mm: Replace arch_vma_name with vm_ops->name for vsyscalls arch/x86/include/asm/vdso.h | 6 ++- arch/x86/mm/init_64.c | 20 +++++----- arch/x86/vdso/vdso2c.h | 5 ++- arch/x86/vdso/vdso32-setup.c | 7 ---- arch/x86/vdso/vma.c | 26 ++++++++----- fs/binfmt_elf.c | 8 ++++ fs/proc/task_mmu.c | 6 +++ include/linux/mm.h | 10 ++++- include/linux/mm_types.h | 6 +++ mm/mmap.c | 89 +++++++++++++++++++++++++++++--------------- 10 files changed, 124 insertions(+), 59 deletions(-) -- 1.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by kanga.kvack.org (Postfix) with ESMTP id 4747D6B0037 for ; Mon, 19 May 2014 18:58:43 -0400 (EDT) Received: by mail-pa0-f43.google.com with SMTP id hz1so6435716pad.30 for ; Mon, 19 May 2014 15:58:42 -0700 (PDT) Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by mx.google.com with ESMTPS id gl4si4839583pbb.46.2014.05.19.15.58.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 15:58:42 -0700 (PDT) Received: by mail-pa0-f41.google.com with SMTP id lj1so6406065pab.14 for ; Mon, 19 May 2014 15:58:42 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH 1/4] x86,vdso: Fix an OOPS accessing the hpet mapping w/o an hpet Date: Mon, 19 May 2014 15:58:31 -0700 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org, Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones Cc: LKML , Cyrill Gorcunov , Pavel Emelyanov , "H. Peter Anvin" , Andy Lutomirski , Stefani Seibold The oops can be triggered in qemu using -no-hpet (but not nohpet) by reading a couple of pages past the end of the vdso text. This should send SIGBUS instead of OOPSing. The bug was introduced by: commit 7a59ed415f5b57469e22e41fc4188d5399e0b194 Author: Stefani Seibold Date: Mon Mar 17 23:22:09 2014 +0100 x86, vdso: Add 32 bit VDSO time support for 32 bit kernel which is new in 3.15. This will be fixed separately in 3.15, but that patch will not apply to tip/x86/vdso. This is the equivalent fix for tip/x86/vdso and, presumably, 3.16. Cc: Stefani Seibold Reported-by: Sasha Levin Signed-off-by: Andy Lutomirski --- arch/x86/vdso/vma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c index e915eae..8ad0081 100644 --- a/arch/x86/vdso/vma.c +++ b/arch/x86/vdso/vma.c @@ -90,6 +90,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) struct vm_area_struct *vma; unsigned long addr; int ret = 0; + static struct page *no_pages[] = {NULL}; if (calculate_addr) { addr = vdso_addr(current->mm->start_stack, @@ -125,7 +126,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) addr + image->size, image->sym_end_mapping - image->size, VM_READ, - NULL); + no_pages); if (IS_ERR(vma)) { ret = PTR_ERR(vma); -- 1.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) by kanga.kvack.org (Postfix) with ESMTP id E67286B0039 for ; Mon, 19 May 2014 18:58:44 -0400 (EDT) Received: by mail-pa0-f52.google.com with SMTP id fa1so6461940pad.11 for ; Mon, 19 May 2014 15:58:44 -0700 (PDT) Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by mx.google.com with ESMTPS id hi6si21407969pac.69.2014.05.19.15.58.43 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 15:58:44 -0700 (PDT) Received: by mail-pa0-f49.google.com with SMTP id lj1so6400352pab.36 for ; Mon, 19 May 2014 15:58:43 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH 2/4] mm,fs: Add vm_ops->name as an alternative to arch_vma_name Date: Mon, 19 May 2014 15:58:32 -0700 Message-Id: <2eee21791bb36a0a408c5c2bdb382a9e6a41ca4a.1400538962.git.luto@amacapital.net> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org, Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones Cc: LKML , Cyrill Gorcunov , Pavel Emelyanov , "H. Peter Anvin" , Andy Lutomirski arch_vma_name sucks. It's a silly hack, and it's annoying to implement correctly. In fact, AFAICS, even the straightforward x86 implementation is incorrect (I suspect that it breaks if the vdso mapping is split or gets remapped). This adds a new vm_ops->name operation that can replace it. The followup patches will remove all uses of arch_vma_name on x86, fixing a couple of annoyances in the process. Signed-off-by: Andy Lutomirski --- fs/binfmt_elf.c | 8 ++++++++ fs/proc/task_mmu.c | 6 ++++++ include/linux/mm.h | 6 ++++++ 3 files changed, 20 insertions(+) diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index aa3cb62..df9ea41 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1108,6 +1108,14 @@ static bool always_dump_vma(struct vm_area_struct *vma) /* Any vsyscall mappings? */ if (vma == get_gate_vma(vma->vm_mm)) return true; + + /* + * Assume that all vmas with a .name op should always be dumped. + * If this changes, a new vm_ops field can easily be added. + */ + if (vma->vm_ops && vma->vm_ops->name && vma->vm_ops->name(vma)) + return true; + /* * arch_vma_name() returns non-NULL for special architecture mappings, * such as vDSO sections. diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 442177b..9b2f5d6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -300,6 +300,12 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) goto done; } + if (vma->vm_ops && vma->vm_ops->name) { + name = vma->vm_ops->name(vma); + if (name) + goto done; + } + name = arch_vma_name(vma); if (!name) { pid_t tid; diff --git a/include/linux/mm.h b/include/linux/mm.h index bf9811e..63f8d4e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -239,6 +239,12 @@ struct vm_operations_struct { */ int (*access)(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write); + + /* Called by the /proc/PID/maps code to ask the vma whether it + * has a special name. Returning non-NULL will also cause this + * vma to be dumped unconditionally. */ + const char *(*name)(struct vm_area_struct *vma); + #ifdef CONFIG_NUMA /* * set_policy() op must add a reference to any non-NULL @new mempolicy -- 1.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id EFF906B003A for ; Mon, 19 May 2014 18:58:46 -0400 (EDT) Received: by mail-pa0-f50.google.com with SMTP id fb1so6428513pad.37 for ; Mon, 19 May 2014 15:58:46 -0700 (PDT) Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by mx.google.com with ESMTPS id ab2si21396927pad.96.2014.05.19.15.58.45 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 15:58:46 -0700 (PDT) Received: by mail-pa0-f51.google.com with SMTP id kq14so6445678pab.38 for ; Mon, 19 May 2014 15:58:45 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Date: Mon, 19 May 2014 15:58:33 -0700 Message-Id: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org, Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones Cc: LKML , Cyrill Gorcunov , Pavel Emelyanov , "H. Peter Anvin" , Andy Lutomirski , Cyrill Gorcunov Using arch_vma_name to give special mappings a name is awkward. x86 currently implements it by comparing the start address of the vma to the expected address of the vdso. This requires tracking the start address of special mappings and is probably buggy if a special vma is split or moved. Improve _install_special_mapping to just name the vma directly. Use it to give the x86 vvar area a name, which should make CRIU's life easier. As a side effect, the vvar area will show up in core dumps. This could be considered weird and is fixable. Thoughts? Cc: Cyrill Gorcunov Cc: Pavel Emelyanov Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/vdso.h | 6 ++- arch/x86/mm/init_64.c | 3 -- arch/x86/vdso/vdso2c.h | 5 ++- arch/x86/vdso/vdso32-setup.c | 7 ---- arch/x86/vdso/vma.c | 25 ++++++++----- include/linux/mm.h | 4 +- include/linux/mm_types.h | 6 +++ mm/mmap.c | 89 +++++++++++++++++++++++++++++--------------- 8 files changed, 94 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index d0a2c90..30be253 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -7,10 +7,14 @@ #ifndef __ASSEMBLER__ +#include + struct vdso_image { void *data; unsigned long size; /* Always a multiple of PAGE_SIZE */ - struct page **pages; /* Big enough for data/size page pointers */ + + /* text_mapping.pages is big enough for data/size page pointers */ + struct vm_special_mapping text_mapping; unsigned long alt, alt_len; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 6f88184..9deb59b 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1223,9 +1223,6 @@ int in_gate_area_no_mm(unsigned long addr) const char *arch_vma_name(struct vm_area_struct *vma) { - if (vma->vm_mm && vma->vm_start == - (long __force)vma->vm_mm->context.vdso) - return "[vdso]"; if (vma == &gate_vma) return "[vsyscall]"; return NULL; diff --git a/arch/x86/vdso/vdso2c.h b/arch/x86/vdso/vdso2c.h index ed2e894..3dcc61e 100644 --- a/arch/x86/vdso/vdso2c.h +++ b/arch/x86/vdso/vdso2c.h @@ -136,7 +136,10 @@ static int GOFUNC(void *addr, size_t len, FILE *outfile, const char *name) fprintf(outfile, "const struct vdso_image %s = {\n", name); fprintf(outfile, "\t.data = raw_data,\n"); fprintf(outfile, "\t.size = %lu,\n", data_size); - fprintf(outfile, "\t.pages = pages,\n"); + fprintf(outfile, "\t.text_mapping = {\n"); + fprintf(outfile, "\t\t.name = \"[vdso]\",\n"); + fprintf(outfile, "\t\t.pages = pages,\n"); + fprintf(outfile, "\t},\n"); if (alt_sec) { fprintf(outfile, "\t.alt = %lu,\n", (unsigned long)alt_sec->sh_offset); diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c index c3ed708..e4f7781 100644 --- a/arch/x86/vdso/vdso32-setup.c +++ b/arch/x86/vdso/vdso32-setup.c @@ -119,13 +119,6 @@ __initcall(ia32_binfmt_init); #else /* CONFIG_X86_32 */ -const char *arch_vma_name(struct vm_area_struct *vma) -{ - if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso) - return "[vdso]"; - return NULL; -} - struct vm_area_struct *get_gate_vma(struct mm_struct *mm) { return NULL; diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c index 8ad0081..e1513c4 100644 --- a/arch/x86/vdso/vma.c +++ b/arch/x86/vdso/vma.c @@ -30,7 +30,8 @@ void __init init_vdso_image(const struct vdso_image *image) BUG_ON(image->size % PAGE_SIZE != 0); for (i = 0; i < npages; i++) - image->pages[i] = virt_to_page(image->data + i*PAGE_SIZE); + image->text_mapping.pages[i] = + virt_to_page(image->data + i*PAGE_SIZE); apply_alternatives((struct alt_instr *)(image->data + image->alt), (struct alt_instr *)(image->data + image->alt + @@ -91,6 +92,10 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) unsigned long addr; int ret = 0; static struct page *no_pages[] = {NULL}; + static struct vm_special_mapping vvar_mapping = { + .name = "[vvar]", + .pages = no_pages, + }; if (calculate_addr) { addr = vdso_addr(current->mm->start_stack, @@ -112,21 +117,23 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) /* * MAYWRITE to allow gdb to COW and set breakpoints */ - ret = install_special_mapping(mm, - addr, - image->size, - VM_READ|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, - image->pages); + vma = _install_special_mapping(mm, + addr, + image->size, + VM_READ|VM_EXEC| + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + &image->text_mapping); - if (ret) + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); goto up_fail; + } vma = _install_special_mapping(mm, addr + image->size, image->sym_end_mapping - image->size, VM_READ, - no_pages); + &vvar_mapping); if (IS_ERR(vma)) { ret = PTR_ERR(vma); diff --git a/include/linux/mm.h b/include/linux/mm.h index 63f8d4e..05aab09 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1782,7 +1782,9 @@ extern struct file *get_mm_exe_file(struct mm_struct *mm); extern int may_expand_vm(struct mm_struct *mm, unsigned long npages); extern struct vm_area_struct *_install_special_mapping(struct mm_struct *mm, unsigned long addr, unsigned long len, - unsigned long flags, struct page **pages); + unsigned long flags, + const struct vm_special_mapping *spec); +/* This is an obsolete alternative to _install_special_mapping. */ extern int install_special_mapping(struct mm_struct *mm, unsigned long addr, unsigned long len, unsigned long flags, struct page **pages); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8967e20..22c6f4e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -510,4 +510,10 @@ static inline void clear_tlb_flush_pending(struct mm_struct *mm) } #endif +struct vm_special_mapping +{ + const char *name; + struct page **pages; +}; + #endif /* _LINUX_MM_TYPES_H */ diff --git a/mm/mmap.c b/mm/mmap.c index b1202cf..52bbc95 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2872,6 +2872,31 @@ int may_expand_vm(struct mm_struct *mm, unsigned long npages) return 1; } +static int special_mapping_fault(struct vm_area_struct *vma, + struct vm_fault *vmf); + +/* + * Having a close hook prevents vma merging regardless of flags. + */ +static void special_mapping_close(struct vm_area_struct *vma) +{ +} + +static const char *special_mapping_name(struct vm_area_struct *vma) +{ + return ((struct vm_special_mapping *)vma->vm_private_data)->name; +} + +static const struct vm_operations_struct special_mapping_vmops = { + .close = special_mapping_close, + .fault = special_mapping_fault, + .name = special_mapping_name, +}; + +static const struct vm_operations_struct legacy_special_mapping_vmops = { + .close = special_mapping_close, + .fault = special_mapping_fault, +}; static int special_mapping_fault(struct vm_area_struct *vma, struct vm_fault *vmf) @@ -2887,7 +2912,13 @@ static int special_mapping_fault(struct vm_area_struct *vma, */ pgoff = vmf->pgoff - vma->vm_pgoff; - for (pages = vma->vm_private_data; pgoff && *pages; ++pages) + if (vma->vm_ops == &legacy_special_mapping_vmops) + pages = vma->vm_private_data; + else + pages = ((struct vm_special_mapping *)vma->vm_private_data)-> + pages; + + for (; pgoff && *pages; ++pages) pgoff--; if (*pages) { @@ -2900,30 +2931,11 @@ static int special_mapping_fault(struct vm_area_struct *vma, return VM_FAULT_SIGBUS; } -/* - * Having a close hook prevents vma merging regardless of flags. - */ -static void special_mapping_close(struct vm_area_struct *vma) -{ -} - -static const struct vm_operations_struct special_mapping_vmops = { - .close = special_mapping_close, - .fault = special_mapping_fault, -}; - -/* - * Called with mm->mmap_sem held for writing. - * Insert a new vma covering the given region, with the given flags. - * Its pages are supplied by the given array of struct page *. - * The array can be shorter than len >> PAGE_SHIFT if it's null-terminated. - * The region past the last page supplied will always produce SIGBUS. - * The array pointer and the pages it points to are assumed to stay alive - * for as long as this mapping might exist. - */ -struct vm_area_struct *_install_special_mapping(struct mm_struct *mm, - unsigned long addr, unsigned long len, - unsigned long vm_flags, struct page **pages) +static struct vm_area_struct *__install_special_mapping( + struct mm_struct *mm, + unsigned long addr, unsigned long len, + unsigned long vm_flags, const struct vm_operations_struct *ops, + void *priv) { int ret; struct vm_area_struct *vma; @@ -2940,8 +2952,8 @@ struct vm_area_struct *_install_special_mapping(struct mm_struct *mm, vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY; vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); - vma->vm_ops = &special_mapping_vmops; - vma->vm_private_data = pages; + vma->vm_ops = ops; + vma->vm_private_data = priv; ret = insert_vm_struct(mm, vma); if (ret) @@ -2958,12 +2970,31 @@ out: return ERR_PTR(ret); } +/* + * Called with mm->mmap_sem held for writing. + * Insert a new vma covering the given region, with the given flags. + * Its pages are supplied by the given array of struct page *. + * The array can be shorter than len >> PAGE_SHIFT if it's null-terminated. + * The region past the last page supplied will always produce SIGBUS. + * The array pointer and the pages it points to are assumed to stay alive + * for as long as this mapping might exist. + */ +struct vm_area_struct *_install_special_mapping( + struct mm_struct *mm, + unsigned long addr, unsigned long len, + unsigned long vm_flags, const struct vm_special_mapping *spec) +{ + return __install_special_mapping(mm, addr, len, vm_flags, + &special_mapping_vmops, (void *)spec); +} + int install_special_mapping(struct mm_struct *mm, unsigned long addr, unsigned long len, unsigned long vm_flags, struct page **pages) { - struct vm_area_struct *vma = _install_special_mapping(mm, - addr, len, vm_flags, pages); + struct vm_area_struct *vma = __install_special_mapping( + mm, addr, len, vm_flags, &legacy_special_mapping_vmops, + (void *)pages); if (IS_ERR(vma)) return PTR_ERR(vma); -- 1.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id CBAFE6B003B for ; Mon, 19 May 2014 18:58:48 -0400 (EDT) Received: by mail-pa0-f50.google.com with SMTP id fb1so6393692pad.23 for ; Mon, 19 May 2014 15:58:48 -0700 (PDT) Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) by mx.google.com with ESMTPS id dh1si10555948pbc.499.2014.05.19.15.58.47 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 15:58:48 -0700 (PDT) Received: by mail-pa0-f52.google.com with SMTP id fa1so6411500pad.25 for ; Mon, 19 May 2014 15:58:47 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH 4/4] x86,mm: Replace arch_vma_name with vm_ops->name for vsyscalls Date: Mon, 19 May 2014 15:58:34 -0700 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org, Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones Cc: LKML , Cyrill Gorcunov , Pavel Emelyanov , "H. Peter Anvin" , Andy Lutomirski arch_vma_name is now completely gone from x86. Good riddance. Signed-off-by: Andy Lutomirski --- arch/x86/mm/init_64.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 9deb59b..bdcde58 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1185,11 +1185,19 @@ int kern_addr_valid(unsigned long addr) * covers the 64bit vsyscall page now. 32bit has a real VMA now and does * not need special handling anymore: */ +static const char *gate_vma_name(struct vm_area_struct *vma) +{ + return "[vsyscall]"; +} +static struct vm_operations_struct gate_vma_ops = { + .name = gate_vma_name, +}; static struct vm_area_struct gate_vma = { .vm_start = VSYSCALL_ADDR, .vm_end = VSYSCALL_ADDR + PAGE_SIZE, .vm_page_prot = PAGE_READONLY_EXEC, - .vm_flags = VM_READ | VM_EXEC + .vm_flags = VM_READ | VM_EXEC, + .vm_ops = &gate_vma_ops, }; struct vm_area_struct *get_gate_vma(struct mm_struct *mm) @@ -1221,13 +1229,6 @@ int in_gate_area_no_mm(unsigned long addr) return (addr & PAGE_MASK) == VSYSCALL_ADDR; } -const char *arch_vma_name(struct vm_area_struct *vma) -{ - if (vma == &gate_vma) - return "[vsyscall]"; - return NULL; -} - #ifdef CONFIG_X86_UV unsigned long memory_block_size_bytes(void) { -- 1.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f41.google.com (mail-la0-f41.google.com [209.85.215.41]) by kanga.kvack.org (Postfix) with ESMTP id 19EA16B0036 for ; Tue, 20 May 2014 13:21:38 -0400 (EDT) Received: by mail-la0-f41.google.com with SMTP id e16so668864lan.14 for ; Tue, 20 May 2014 10:21:38 -0700 (PDT) Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [2a00:1450:4010:c04::235]) by mx.google.com with ESMTPS id og9si9703479lbb.87.2014.05.20.10.21.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 10:21:37 -0700 (PDT) Received: by mail-lb0-f181.google.com with SMTP id q8so650333lbi.40 for ; Tue, 20 May 2014 10:21:37 -0700 (PDT) Date: Tue, 20 May 2014 21:21:34 +0400 From: Cyrill Gorcunov Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Message-ID: <20140520172134.GJ2185@moon> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: x86@kernel.org, Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov , "H. Peter Anvin" On Mon, May 19, 2014 at 03:58:33PM -0700, Andy Lutomirski wrote: > Using arch_vma_name to give special mappings a name is awkward. x86 > currently implements it by comparing the start address of the vma to > the expected address of the vdso. This requires tracking the start > address of special mappings and is probably buggy if a special vma > is split or moved. > > Improve _install_special_mapping to just name the vma directly. Use > it to give the x86 vvar area a name, which should make CRIU's life > easier. > > As a side effect, the vvar area will show up in core dumps. This > could be considered weird and is fixable. Thoughts? > > Cc: Cyrill Gorcunov > Cc: Pavel Emelyanov > Signed-off-by: Andy Lutomirski Hi Andy, thanks a lot for this! I must confess I don't yet know how would we deal with compat tasks but this is 'must have' mark which allow us to detect vvar area! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vc0-f178.google.com (mail-vc0-f178.google.com [209.85.220.178]) by kanga.kvack.org (Postfix) with ESMTP id 11EE46B0037 for ; Tue, 20 May 2014 13:25:11 -0400 (EDT) Received: by mail-vc0-f178.google.com with SMTP id hq16so1017846vcb.23 for ; Tue, 20 May 2014 10:25:10 -0700 (PDT) Received: from mail-ve0-f174.google.com (mail-ve0-f174.google.com [209.85.128.174]) by mx.google.com with ESMTPS id up2si5181065vec.104.2014.05.20.10.25.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 10:25:10 -0700 (PDT) Received: by mail-ve0-f174.google.com with SMTP id jw12so1026001veb.33 for ; Tue, 20 May 2014 10:25:10 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20140520172134.GJ2185@moon> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> From: Andy Lutomirski Date: Tue, 20 May 2014 10:24:49 -0700 Message-ID: Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Cyrill Gorcunov Cc: X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov , "H. Peter Anvin" On Tue, May 20, 2014 at 10:21 AM, Cyrill Gorcunov wrote: > On Mon, May 19, 2014 at 03:58:33PM -0700, Andy Lutomirski wrote: >> Using arch_vma_name to give special mappings a name is awkward. x86 >> currently implements it by comparing the start address of the vma to >> the expected address of the vdso. This requires tracking the start >> address of special mappings and is probably buggy if a special vma >> is split or moved. >> >> Improve _install_special_mapping to just name the vma directly. Use >> it to give the x86 vvar area a name, which should make CRIU's life >> easier. >> >> As a side effect, the vvar area will show up in core dumps. This >> could be considered weird and is fixable. Thoughts? >> >> Cc: Cyrill Gorcunov >> Cc: Pavel Emelyanov >> Signed-off-by: Andy Lutomirski > > Hi Andy, thanks a lot for this! I must confess I don't yet know how > would we deal with compat tasks but this is 'must have' mark which > allow us to detect vvar area! Out of curiosity, how does CRIU currently handle checkpointing a restored task? In current kernels, the "[vdso]" name in maps goes away after mremapping the vdso. I suspect that you'll need kernel changes for compat tasks, since I think that mremapping the vdso on any reasonably modern hardware in a 32-bit task will cause sigreturn to blow up. This could be fixed by making mremap magical, although adding a new prctl or arch_prctl to reliably move the vdso might be a better bet. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f43.google.com (mail-la0-f43.google.com [209.85.215.43]) by kanga.kvack.org (Postfix) with ESMTP id E83E56B0036 for ; Tue, 20 May 2014 13:48:08 -0400 (EDT) Received: by mail-la0-f43.google.com with SMTP id mc6so695806lab.16 for ; Tue, 20 May 2014 10:48:08 -0700 (PDT) Received: from mail-lb0-x231.google.com (mail-lb0-x231.google.com [2a00:1450:4010:c04::231]) by mx.google.com with ESMTPS id o7si17800070lao.96.2014.05.20.10.48.06 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 10:48:07 -0700 (PDT) Received: by mail-lb0-f177.google.com with SMTP id s7so676178lbd.36 for ; Tue, 20 May 2014 10:48:06 -0700 (PDT) Date: Tue, 20 May 2014 21:47:59 +0400 From: Cyrill Gorcunov Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Message-ID: <20140520174759.GK2185@moon> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov , "H. Peter Anvin" On Tue, May 20, 2014 at 10:24:49AM -0700, Andy Lutomirski wrote: > On Tue, May 20, 2014 at 10:21 AM, Cyrill Gorcunov wrote: > > On Mon, May 19, 2014 at 03:58:33PM -0700, Andy Lutomirski wrote: > >> Using arch_vma_name to give special mappings a name is awkward. x86 > >> currently implements it by comparing the start address of the vma to > >> the expected address of the vdso. This requires tracking the start > >> address of special mappings and is probably buggy if a special vma > >> is split or moved. > >> > >> Improve _install_special_mapping to just name the vma directly. Use > >> it to give the x86 vvar area a name, which should make CRIU's life > >> easier. > >> > >> As a side effect, the vvar area will show up in core dumps. This > >> could be considered weird and is fixable. Thoughts? > >> > >> Cc: Cyrill Gorcunov > >> Cc: Pavel Emelyanov > >> Signed-off-by: Andy Lutomirski > > > > Hi Andy, thanks a lot for this! I must confess I don't yet know how > > would we deal with compat tasks but this is 'must have' mark which > > allow us to detect vvar area! > > Out of curiosity, how does CRIU currently handle checkpointing a > restored task? In current kernels, the "[vdso]" name in maps goes > away after mremapping the vdso. We use not only [vdso] mark to detect vdso area but also page frame number of the living vdso. If mark is not present in procfs output we examinate executable areas and check if pfn == vdso_pfn, it's a slow path because there migh be a bunch of executable areas and touching every of it is not that fast thing, but we simply have no choise. The situation get worse when task was dumped on one kernel and then restored on another kernel where vdso content is different from one save in image -- is such case as I mentioned we need that named vdso proxy which redirect calls to vdso of the system where task is restoring. And when such "restored" task get checkpointed second time we don't dump new living vdso but save only old vdso proxy on disk (detecting it is a different story, in short we inject a unique mark into elf header). > > I suspect that you'll need kernel changes for compat tasks, since I > think that mremapping the vdso on any reasonably modern hardware in a > 32-bit task will cause sigreturn to blow up. This could be fixed by > making mremap magical, although adding a new prctl or arch_prctl to > reliably move the vdso might be a better bet. Well, as far as I understand compat code uses abs addressing for vvar data and if vvar data position doesn't change we're safe, but same time because vvar addresses are not abi I fear one day we indeed hit the problems and the only solution would be to use kernel's help. But again, Andy, I didn't think much about implementing compat mode in criu yet so i might be missing some details. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ve0-f169.google.com (mail-ve0-f169.google.com [209.85.128.169]) by kanga.kvack.org (Postfix) with ESMTP id 16B426B0036 for ; Tue, 20 May 2014 13:53:12 -0400 (EDT) Received: by mail-ve0-f169.google.com with SMTP id jx11so1074035veb.28 for ; Tue, 20 May 2014 10:53:11 -0700 (PDT) Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com [209.85.220.182]) by mx.google.com with ESMTPS id cx1si5193535vdb.146.2014.05.20.10.53.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 10:53:11 -0700 (PDT) Received: by mail-vc0-f182.google.com with SMTP id la4so1089537vcb.27 for ; Tue, 20 May 2014 10:53:11 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20140520174759.GK2185@moon> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> <20140520174759.GK2185@moon> From: Andy Lutomirski Date: Tue, 20 May 2014 10:52:51 -0700 Message-ID: Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Cyrill Gorcunov Cc: X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov , "H. Peter Anvin" On Tue, May 20, 2014 at 10:47 AM, Cyrill Gorcunov wrote: > On Tue, May 20, 2014 at 10:24:49AM -0700, Andy Lutomirski wrote: >> On Tue, May 20, 2014 at 10:21 AM, Cyrill Gorcunov wrote: >> > On Mon, May 19, 2014 at 03:58:33PM -0700, Andy Lutomirski wrote: >> >> Using arch_vma_name to give special mappings a name is awkward. x86 >> >> currently implements it by comparing the start address of the vma to >> >> the expected address of the vdso. This requires tracking the start >> >> address of special mappings and is probably buggy if a special vma >> >> is split or moved. >> >> >> >> Improve _install_special_mapping to just name the vma directly. Use >> >> it to give the x86 vvar area a name, which should make CRIU's life >> >> easier. >> >> >> >> As a side effect, the vvar area will show up in core dumps. This >> >> could be considered weird and is fixable. Thoughts? >> >> >> >> Cc: Cyrill Gorcunov >> >> Cc: Pavel Emelyanov >> >> Signed-off-by: Andy Lutomirski >> > >> > Hi Andy, thanks a lot for this! I must confess I don't yet know how >> > would we deal with compat tasks but this is 'must have' mark which >> > allow us to detect vvar area! >> >> Out of curiosity, how does CRIU currently handle checkpointing a >> restored task? In current kernels, the "[vdso]" name in maps goes >> away after mremapping the vdso. > > We use not only [vdso] mark to detect vdso area but also page frame > number of the living vdso. If mark is not present in procfs output > we examinate executable areas and check if pfn == vdso_pfn, it's > a slow path because there migh be a bunch of executable areas and > touching every of it is not that fast thing, but we simply have no > choise. This patch should fix this issue, at least. If there's still a way to get a native vdso that doesn't say "[vdso]", please let me know/ > > The situation get worse when task was dumped on one kernel and > then restored on another kernel where vdso content is different > from one save in image -- is such case as I mentioned we need > that named vdso proxy which redirect calls to vdso of the system > where task is restoring. And when such "restored" task get checkpointed > second time we don't dump new living vdso but save only old vdso > proxy on disk (detecting it is a different story, in short we > inject a unique mark into elf header). Yuck. But I don't know whether the kernel can help much here. > >> >> I suspect that you'll need kernel changes for compat tasks, since I >> think that mremapping the vdso on any reasonably modern hardware in a >> 32-bit task will cause sigreturn to blow up. This could be fixed by >> making mremap magical, although adding a new prctl or arch_prctl to >> reliably move the vdso might be a better bet. > > Well, as far as I understand compat code uses abs addressing for > vvar data and if vvar data position doesn't change we're safe, > but same time because vvar addresses are not abi I fear one day > we indeed hit the problems and the only solution would be > to use kernel's help. But again, Andy, I didn't think much > about implementing compat mode in criu yet so i might be > missing some details. Prior to 3.15, the compat code didn't have vvar data at all. In 3.15 and up, the vvar data is accessed using PC-relative addressing, even in compat mode (using the usual call; mov trick to read EIP). --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f41.google.com (mail-la0-f41.google.com [209.85.215.41]) by kanga.kvack.org (Postfix) with ESMTP id 0E4D66B0035 for ; Tue, 20 May 2014 14:01:07 -0400 (EDT) Received: by mail-la0-f41.google.com with SMTP id e16so707719lan.14 for ; Tue, 20 May 2014 11:01:06 -0700 (PDT) Received: from mail-la0-x233.google.com (mail-la0-x233.google.com [2a00:1450:4010:c03::233]) by mx.google.com with ESMTPS id bq4si9759158lbb.85.2014.05.20.11.01.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 11:01:06 -0700 (PDT) Received: by mail-la0-f51.google.com with SMTP id gf5so697309lab.38 for ; Tue, 20 May 2014 11:01:05 -0700 (PDT) Date: Tue, 20 May 2014 22:01:04 +0400 From: Cyrill Gorcunov Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Message-ID: <20140520180104.GL2185@moon> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> <20140520174759.GK2185@moon> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov , "H. Peter Anvin" On Tue, May 20, 2014 at 10:52:51AM -0700, Andy Lutomirski wrote: > > > > We use not only [vdso] mark to detect vdso area but also page frame > > number of the living vdso. If mark is not present in procfs output > > we examinate executable areas and check if pfn == vdso_pfn, it's > > a slow path because there migh be a bunch of executable areas and > > touching every of it is not that fast thing, but we simply have no > > choise. > > This patch should fix this issue, at least. If there's still a way to > get a native vdso that doesn't say "[vdso]", please let me know/ Yes, having a native procfs way to detect vdso is much preferred! > > The situation get worse when task was dumped on one kernel and > > then restored on another kernel where vdso content is different > > from one save in image -- is such case as I mentioned we need > > that named vdso proxy which redirect calls to vdso of the system > > where task is restoring. And when such "restored" task get checkpointed > > second time we don't dump new living vdso but save only old vdso > > proxy on disk (detecting it is a different story, in short we > > inject a unique mark into elf header). > > Yuck. But I don't know whether the kernel can help much here. Some prctl which would tell kernel to put vdso at specifed address. We can live without it for now so not a big deal (yet ;) > >> I suspect that you'll need kernel changes for compat tasks, since I > >> think that mremapping the vdso on any reasonably modern hardware in a > >> 32-bit task will cause sigreturn to blow up. This could be fixed by > >> making mremap magical, although adding a new prctl or arch_prctl to > >> reliably move the vdso might be a better bet. > > > > Well, as far as I understand compat code uses abs addressing for > > vvar data and if vvar data position doesn't change we're safe, > > but same time because vvar addresses are not abi I fear one day > > we indeed hit the problems and the only solution would be > > to use kernel's help. But again, Andy, I didn't think much > > about implementing compat mode in criu yet so i might be > > missing some details. > > Prior to 3.15, the compat code didn't have vvar data at all. In 3.15 > and up, the vvar data is accessed using PC-relative addressing, even > in compat mode (using the usual call; mov trick to read EIP). i see. I'll ping you for help once I start implementing compat mode ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f47.google.com (mail-ee0-f47.google.com [74.125.83.47]) by kanga.kvack.org (Postfix) with ESMTP id 917C56B0035 for ; Tue, 20 May 2014 14:18:36 -0400 (EDT) Received: by mail-ee0-f47.google.com with SMTP id c13so820419eek.6 for ; Tue, 20 May 2014 11:18:36 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id j42si4420744eeo.122.2014.05.20.11.18.34 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 May 2014 11:18:35 -0700 (PDT) Message-ID: <537B9C6D.7010705@zytor.com> Date: Tue, 20 May 2014 11:18:21 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> <20140520174759.GK2185@moon> <20140520180104.GL2185@moon> In-Reply-To: <20140520180104.GL2185@moon> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Cyrill Gorcunov , Andy Lutomirski Cc: X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov On 05/20/2014 11:01 AM, Cyrill Gorcunov wrote: >> >> This patch should fix this issue, at least. If there's still a way to >> get a native vdso that doesn't say "[vdso]", please let me know/ > > Yes, having a native procfs way to detect vdso is much preferred! > Is there any path by which we can end up with [vdso] without a leading slash in /proc/self/maps? Otherwise, why is that not "native"? >>> The situation get worse when task was dumped on one kernel and >>> then restored on another kernel where vdso content is different >>> from one save in image -- is such case as I mentioned we need >>> that named vdso proxy which redirect calls to vdso of the system >>> where task is restoring. And when such "restored" task get checkpointed >>> second time we don't dump new living vdso but save only old vdso >>> proxy on disk (detecting it is a different story, in short we >>> inject a unique mark into elf header). >> >> Yuck. But I don't know whether the kernel can help much here. > > Some prctl which would tell kernel to put vdso at specifed address. > We can live without it for now so not a big deal (yet ;) mremap() will do this for you. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ve0-f177.google.com (mail-ve0-f177.google.com [209.85.128.177]) by kanga.kvack.org (Postfix) with ESMTP id D4D316B0035 for ; Tue, 20 May 2014 14:25:17 -0400 (EDT) Received: by mail-ve0-f177.google.com with SMTP id db11so1114805veb.36 for ; Tue, 20 May 2014 11:25:17 -0700 (PDT) Received: from mail-vc0-f172.google.com (mail-vc0-f172.google.com [209.85.220.172]) by mx.google.com with ESMTPS id tb4si2866828vdc.19.2014.05.20.11.25.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 11:25:17 -0700 (PDT) Received: by mail-vc0-f172.google.com with SMTP id hr9so1111796vcb.31 for ; Tue, 20 May 2014 11:25:17 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <537B9C6D.7010705@zytor.com> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> <20140520174759.GK2185@moon> <20140520180104.GL2185@moon> <537B9C6D.7010705@zytor.com> From: Andy Lutomirski Date: Tue, 20 May 2014 11:24:56 -0700 Message-ID: Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Cyrill Gorcunov , X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov On Tue, May 20, 2014 at 11:18 AM, H. Peter Anvin wrote: > On 05/20/2014 11:01 AM, Cyrill Gorcunov wrote: >>> >>> This patch should fix this issue, at least. If there's still a way to >>> get a native vdso that doesn't say "[vdso]", please let me know/ >> >> Yes, having a native procfs way to detect vdso is much preferred! >> > > Is there any path by which we can end up with [vdso] without a leading > slash in /proc/self/maps? Otherwise, why is that not "native"? Dunno. But before this patch the reverse was possible: we can end up with a vdso that doesn't say [vdso]. > >>>> The situation get worse when task was dumped on one kernel and >>>> then restored on another kernel where vdso content is different >>>> from one save in image -- is such case as I mentioned we need >>>> that named vdso proxy which redirect calls to vdso of the system >>>> where task is restoring. And when such "restored" task get checkpointed >>>> second time we don't dump new living vdso but save only old vdso >>>> proxy on disk (detecting it is a different story, in short we >>>> inject a unique mark into elf header). >>> >>> Yuck. But I don't know whether the kernel can help much here. >> >> Some prctl which would tell kernel to put vdso at specifed address. >> We can live without it for now so not a big deal (yet ;) > > mremap() will do this for you. Except that it's buggy: it doesn't change mm->context.vdso. For 64-bit tasks, the only consumer outside exec was arch_vma_name, and this patch removes even that. For 32-bit tasks, though, it's needed for signal delivery. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f48.google.com (mail-wg0-f48.google.com [74.125.82.48]) by kanga.kvack.org (Postfix) with ESMTP id 25D5A6B0035 for ; Tue, 20 May 2014 14:28:12 -0400 (EDT) Received: by mail-wg0-f48.google.com with SMTP id b13so920932wgh.19 for ; Tue, 20 May 2014 11:28:11 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id cf2si13706244wjc.124.2014.05.20.11.28.05 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 May 2014 11:28:06 -0700 (PDT) Message-ID: <537B9EA6.3030103@zytor.com> Date: Tue, 20 May 2014 11:27:50 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> <20140520174759.GK2185@moon> <20140520180104.GL2185@moon> <537B9C6D.7010705@zytor.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: Cyrill Gorcunov , X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov On 05/20/2014 11:24 AM, Andy Lutomirski wrote: > On Tue, May 20, 2014 at 11:18 AM, H. Peter Anvin wrote: >> On 05/20/2014 11:01 AM, Cyrill Gorcunov wrote: >>>> >>>> This patch should fix this issue, at least. If there's still a way to >>>> get a native vdso that doesn't say "[vdso]", please let me know/ >>> >>> Yes, having a native procfs way to detect vdso is much preferred! >>> >> >> Is there any path by which we can end up with [vdso] without a leading >> slash in /proc/self/maps? Otherwise, why is that not "native"? > > Dunno. But before this patch the reverse was possible: we can end up > with a vdso that doesn't say [vdso]. > That's a bug, which is being fixed. We can't go back in time and create new interfaces on old kernels. >> >>>>> The situation get worse when task was dumped on one kernel and >>>>> then restored on another kernel where vdso content is different >>>>> from one save in image -- is such case as I mentioned we need >>>>> that named vdso proxy which redirect calls to vdso of the system >>>>> where task is restoring. And when such "restored" task get checkpointed >>>>> second time we don't dump new living vdso but save only old vdso >>>>> proxy on disk (detecting it is a different story, in short we >>>>> inject a unique mark into elf header). >>>> >>>> Yuck. But I don't know whether the kernel can help much here. >>> >>> Some prctl which would tell kernel to put vdso at specifed address. >>> We can live without it for now so not a big deal (yet ;) >> >> mremap() will do this for you. > > Except that it's buggy: it doesn't change mm->context.vdso. For > 64-bit tasks, the only consumer outside exec was arch_vma_name, and > this patch removes even that. For 32-bit tasks, though, it's needed > for signal delivery. > Again, a bug, let's fix it rather than saying we need a new interface. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f173.google.com (mail-we0-f173.google.com [74.125.82.173]) by kanga.kvack.org (Postfix) with ESMTP id 3B5FC6B0036 for ; Tue, 20 May 2014 14:38:08 -0400 (EDT) Received: by mail-we0-f173.google.com with SMTP id u57so958591wes.32 for ; Tue, 20 May 2014 11:38:07 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id eu11si9985152wjc.119.2014.05.20.11.38.06 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 May 2014 11:38:06 -0700 (PDT) Message-ID: <537BA0FF.3000504@zytor.com> Date: Tue, 20 May 2014 11:37:51 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> In-Reply-To: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski , x86@kernel.org, Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones Cc: LKML , Cyrill Gorcunov , Pavel Emelyanov , Cyrill Gorcunov On 05/19/2014 03:58 PM, Andy Lutomirski wrote: > > As a side effect, the vvar area will show up in core dumps. This > could be considered weird and is fixable. Thoughts? > On this issue... I don't know if this is likely to break anything. My suggestion is that we accept it as-is but be prepared to deal with it if it breaks something. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com [209.85.217.180]) by kanga.kvack.org (Postfix) with ESMTP id ACA0B6B0036 for ; Tue, 20 May 2014 14:39:11 -0400 (EDT) Received: by mail-lb0-f180.google.com with SMTP id p9so718492lbv.25 for ; Tue, 20 May 2014 11:39:10 -0700 (PDT) Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com [2a00:1450:4010:c04::232]) by mx.google.com with ESMTPS id c3si17972509lae.9.2014.05.20.11.39.09 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 11:39:09 -0700 (PDT) Received: by mail-lb0-f178.google.com with SMTP id w7so720619lbi.23 for ; Tue, 20 May 2014 11:39:09 -0700 (PDT) Date: Tue, 20 May 2014 22:39:07 +0400 From: Cyrill Gorcunov Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Message-ID: <20140520183907.GM2185@moon> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> <20140520174759.GK2185@moon> <20140520180104.GL2185@moon> <537B9C6D.7010705@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: "H. Peter Anvin" , X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov On Tue, May 20, 2014 at 11:24:56AM -0700, Andy Lutomirski wrote: > On Tue, May 20, 2014 at 11:18 AM, H. Peter Anvin wrote: > > On 05/20/2014 11:01 AM, Cyrill Gorcunov wrote: > >>> > >>> This patch should fix this issue, at least. If there's still a way to > >>> get a native vdso that doesn't say "[vdso]", please let me know/ > >> > >> Yes, having a native procfs way to detect vdso is much preferred! > >> > > > > Is there any path by which we can end up with [vdso] without a leading > > slash in /proc/self/maps? Otherwise, why is that not "native"? > > Dunno. But before this patch the reverse was possible: we can end up > with a vdso that doesn't say [vdso]. I fear I don't understand the phrase "leading slash in /proc/self/maps". Peter could you rephrase please? > >>>> The situation get worse when task was dumped on one kernel and > >>>> then restored on another kernel where vdso content is different > >>>> from one save in image -- is such case as I mentioned we need > >>>> that named vdso proxy which redirect calls to vdso of the system > >>>> where task is restoring. And when such "restored" task get checkpointed > >>>> second time we don't dump new living vdso but save only old vdso > >>>> proxy on disk (detecting it is a different story, in short we > >>>> inject a unique mark into elf header). > >>> > >>> Yuck. But I don't know whether the kernel can help much here. > >> > >> Some prctl which would tell kernel to put vdso at specifed address. > >> We can live without it for now so not a big deal (yet ;) > > > > mremap() will do this for you. > > Except that it's buggy: it doesn't change mm->context.vdso. For > 64-bit tasks, the only consumer outside exec was arch_vma_name, and > this patch removes even that. For 32-bit tasks, though, it's needed > for signal delivery. yes, fwiw we can deal with it currently but i'm not sure yet about compat case simply because didn't look presicely. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com [209.85.220.182]) by kanga.kvack.org (Postfix) with ESMTP id 739126B0036 for ; Tue, 20 May 2014 14:39:13 -0400 (EDT) Received: by mail-vc0-f182.google.com with SMTP id la4so1137589vcb.41 for ; Tue, 20 May 2014 11:39:13 -0700 (PDT) Received: from mail-ve0-f177.google.com (mail-ve0-f177.google.com [209.85.128.177]) by mx.google.com with ESMTPS id ip6si4344449vec.165.2014.05.20.11.39.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 May 2014 11:39:12 -0700 (PDT) Received: by mail-ve0-f177.google.com with SMTP id db11so1124572veb.8 for ; Tue, 20 May 2014 11:39:12 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <537B9EA6.3030103@zytor.com> References: <276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net> <20140520172134.GJ2185@moon> <20140520174759.GK2185@moon> <20140520180104.GL2185@moon> <537B9C6D.7010705@zytor.com> <537B9EA6.3030103@zytor.com> From: Andy Lutomirski Date: Tue, 20 May 2014 11:38:52 -0700 Message-ID: Subject: Re: [PATCH 3/4] x86,mm: Improve _install_special_mapping and fix x86 vdso naming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Cyrill Gorcunov , X86 ML , Andrew Morton , Sasha Levin , "linux-mm@kvack.org" , Dave Jones , LKML , Pavel Emelyanov On Tue, May 20, 2014 at 11:27 AM, H. Peter Anvin wrote: > On 05/20/2014 11:24 AM, Andy Lutomirski wrote: >> On Tue, May 20, 2014 at 11:18 AM, H. Peter Anvin wrote: >>> On 05/20/2014 11:01 AM, Cyrill Gorcunov wrote: >>>>> >>>>> This patch should fix this issue, at least. If there's still a way to >>>>> get a native vdso that doesn't say "[vdso]", please let me know/ >>>> >>>> Yes, having a native procfs way to detect vdso is much preferred! >>>> >>> >>> Is there any path by which we can end up with [vdso] without a leading >>> slash in /proc/self/maps? Otherwise, why is that not "native"? >> >> Dunno. But before this patch the reverse was possible: we can end up >> with a vdso that doesn't say [vdso]. >> > > That's a bug, which is being fixed. We can't go back in time and create > new interfaces on old kernels. > >>> >>>>>> The situation get worse when task was dumped on one kernel and >>>>>> then restored on another kernel where vdso content is different >>>>>> from one save in image -- is such case as I mentioned we need >>>>>> that named vdso proxy which redirect calls to vdso of the system >>>>>> where task is restoring. And when such "restored" task get checkpointed >>>>>> second time we don't dump new living vdso but save only old vdso >>>>>> proxy on disk (detecting it is a different story, in short we >>>>>> inject a unique mark into elf header). >>>>> >>>>> Yuck. But I don't know whether the kernel can help much here. >>>> >>>> Some prctl which would tell kernel to put vdso at specifed address. >>>> We can live without it for now so not a big deal (yet ;) >>> >>> mremap() will do this for you. >> >> Except that it's buggy: it doesn't change mm->context.vdso. For >> 64-bit tasks, the only consumer outside exec was arch_vma_name, and >> this patch removes even that. For 32-bit tasks, though, it's needed >> for signal delivery. >> > > Again, a bug, let's fix it rather than saying we need a new interface. What happens if someone remaps just part of the vdso? Presumably we'd just track the position of the first page of the vdso, but this might be hard to implement: I don't think there's any callback from the core mm code for ths. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org