From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752110AbcDFQpm (ORCPT ); Wed, 6 Apr 2016 12:45:42 -0400 Received: from mail-db3on0144.outbound.protection.outlook.com ([157.55.234.144]:32402 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751569AbcDFQpi (ORCPT ); Wed, 6 Apr 2016 12:45:38 -0400 Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=none action=none header.from=virtuozzo.com; From: Dmitry Safonov To: CC: , , , , , , , , , , , , <0x7f454c46@gmail.com>, Dmitry Safonov Subject: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode Date: Wed, 6 Apr 2016 19:29:29 +0300 Message-ID: <1459960170-4454-2-git-send-email-dsafonov@virtuozzo.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1459960170-4454-1-git-send-email-dsafonov@virtuozzo.com> References: <1459960170-4454-1-git-send-email-dsafonov@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: AM4PR01CA0012.eurprd01.prod.exchangelabs.com (10.164.74.150) To DB5PR08MB0983.eurprd08.prod.outlook.com (10.166.13.154) X-MS-Office365-Filtering-Correlation-Id: 7f50fb8b-eb50-49d7-ec18-08d35e38c05f X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;2:ua2t50pw8tNdoboSMBG/m5UwIO37wE5uvM/tgtQKo2MfKMCDBDaNoHwMdWe9TS+AKLpV75Oa/Mt3nBWBl9dysv5rIv9KWXHqaeIirYl8VTv0Kdq5NGsxM2h92fyANTnhtZfEX5fziwL6rbxcIuYoKL4bYPlpQgjs6oEhW9RJ3+WMBuL4W55g8yiuzzuKxGGz;3:vE252G1LWfui7fkOB0M6ZWwdfxOudSoty7m/jEsXSyl+CLXIzIW0tWxu7KylLuBhaJmBA4FX0LUQbKbCDATOcMdJNqqDBN9wEU6QqrqttbZl0enllGX8MF1Qs8IJD294;25:pqOtsgxYuhhOz0X5/ruDx5QRG3avapf7zXvGGRvzlnHYNkFMABqKdGMxhTqIYnxX+cOEcAm4P1/OVnZUh+Uy2/taPl0txkDef/Xni/1cmxE9ozmzOoeKhVhdSBuSvoamGw9/eq7/NUJznWQFwxn4WgE92JKZK8U5IebfMzlntcr7LG9PntC+uynNMF5VTi8Z8Fu0MIg4rnaUY8s4cbs8F6Za12QH3Cb5KLax+J1M7122IOg+1R9++F6trWLKU3ZxPGnAYLPRlv2XV9hoaNn4t9HWfGNy0dm4ZB6Nb/sT6+tQFjvK7DTteDNwBwHaCpmmQRprBGmN9LryhVs/DZz11g== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0983; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040074)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041046)(6043046);SRVR:DB5PR08MB0983;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0983; X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;4:1kImi8AoZ+aMwCDC35ct2wpRslFoWKZVduPfBjxZyGFh4CzKTiUyW0tFfjkxDdh2r6RQ0VfVpfoSu5RZIEOA7YaCG+Js53jriIi4zW5N4b2mb7DPQS4YvX6c6RwF6LmU5Wkqo8Aucx35TZWaXivjYpSXmmAARK+D24oPEMSigL6NP0oGQiBK69HRIMhF4G9uyDBtXS9FpKRsHVXJidbgt5ZidIGRnykSo5DHCNSN+Dib2pgjSdD409zFjo+VHuteeL44Zw9u/qwcjWn1FLoJUN452o0kRKvEs3ddvNLeI/iUTB7pwF8sKALgoPKsK/iwW4Yt4elpQ9g1oHWHIMsy08A9aqAD3c0QXDvkOhrA7A2UlZnk/8nuL/4TLpPoYR5UajK7ZIf9KZvAR3xSHyax9n0bZ1pkcUdLGefVzt2WqEY= X-Forefront-PRVS: 0904004ECB X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(189998001)(48376002)(5004730100002)(575784001)(81166005)(50466002)(19580405001)(19580395003)(76176999)(86362001)(110136002)(42186005)(66066001)(107886002)(36756003)(4326007)(92566002)(53416004)(3846002)(2950100001)(2906002)(586003)(50986999)(77096005)(1096002)(6116002)(5008740100001)(229853001)(4001430100002)(5003940100001)(33646002)(50226001)(2351001)(142933001);DIR:OUT;SFP:1102;SCL:1;SRVR:DB5PR08MB0983;H:dsafonov.sw.ru;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DB5PR08MB0983;23:YN7MW8rWDXAAoBYym2v+VbLOXyXIdtOM5285dYeti?= =?us-ascii?Q?x6MOB33FMaDgxIxqQRdFWkljn3SnlyRniod1jw4uaY5YWlfdOOS9yfRI6OVQ?= =?us-ascii?Q?VprZRhm9csd7Gs2dDnqSVAZMGARVQNCTbQBIOYuzoEVDC4ACnBMXbXToy8pc?= =?us-ascii?Q?v965Zf8BwDlIaTv/2RcAN1jTZtq5pkJyRABP+3sINZjTwaf1CHYOcew3HKpy?= =?us-ascii?Q?bnkL8fTtfnD/Kt6+4gmQlBmfvc4hadJ4t7RdeI8YFdb0hSZ0Dt+pVv0bCLHL?= =?us-ascii?Q?cOm1mmgH+4ooVWpOeNUM2gPkwnzThJcWZLFfL4x9r1BGn6Za9dIGmiV2F5aZ?= =?us-ascii?Q?ld9WH6h5FnOVj6sytlS+fxqCCx/wEmLjxrnM4HSpdf/qNpunK3WvHjzvtrDL?= =?us-ascii?Q?gMSGaLj3/7HThJ9X8C15gcSPW+E2APznD90Ycsc3saiha4Tc4cIpAp35DN4u?= =?us-ascii?Q?WSTD4YUY4Y1AJfD5jhVSrk9a9svqv2/WXLjKQSG/rY4hn1fYg6PRHwwUsdfk?= =?us-ascii?Q?YbTcj/6GRhQjzTGEbZ9UUK5Owh1Cuug/wRE6+0nBMxDjDA/hVtMnlQMlReVh?= =?us-ascii?Q?dhS1PVb3IzYMPCYRDvv0grbevpDYI9WoiZ/IycUuoX9VLjEvyw4Yq6TxGrWV?= =?us-ascii?Q?ms70SH8uF5RaUcs5VkFhRLwODyPSsuQ0npwSZQUgoZz3WJMSxEvoTBMoki5d?= =?us-ascii?Q?mkPPrIbMDtFC702p001ewKJgPTUdzzdHlrYh6QHnZZPFbU87SWImh4gsPQ9N?= =?us-ascii?Q?iHMVT6uBy1pgGM6jsrNRyRvx2BB2ks//wQXf3kW3CY6CCVHunnFyOhm8Tk4j?= =?us-ascii?Q?DqOK2WNnRT7SOxB8d0ZM/ytWSFO/wEEsPeJ5vR9NB0B6eA5cYlzXpxC7hYLp?= =?us-ascii?Q?PhkNF42CX8qB4tAbUuCKHIR+hMytiH/vIJkUbbJBjSP/6spSRsvWlOPZoEfH?= =?us-ascii?Q?+wfbrPcx744l7X0t5KaKRQfbc5PT0MSNu24Oh+yjkLTnCURPYqCicaFwNoGf?= =?us-ascii?Q?pU=3D?= X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;5:PbrYPT3JtklNsbCwoilVrKv2nIByBfJ5AysPYciVerpffHIyjocu7jfI0nsrqwAomCLFp14rebr5dbRw4a6OlYRHQGwJ5rff0xdWCJrkWOm2lqfH2MnTO2sKkbNXiF5jM6CHbIKIp+r1pl2Oj3//FQ==;24:BPuxSRxRb+T+4dIrg2+v4MjIhkSIPcq4KS14wgfr0ZnQjcT+9CIrDh2pylSkYdtXEYUDlleFUBdweEfixC8Rac0uiGmzlFO9HhGqWCvRASc=;20:ynkNfP0ISI7tYEpzwul5ilyHPul9t3kJsdzbuNnEiRbnh073m4skdmb1JP7mA2s1r/w7KYzA2x2W23qqK6HoX7Z10BrO8heAzLfatUsKIyLPJvAKtsLieJs6XqMUZcq17UWWiQJzOGB2+QXtBTTcYdXZ8SwV7IW0mn/AkDCV51o= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2016 16:30:21.5719 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB0983 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now each process that runs natively on x86_64 may execute 32-bit code by proper setting it's CS selector: either from LDT or reuse Linux's USER32_CS. The vice-versa is also valid: running 64-bit code in compatible task is also possible by choosing USER_CS. So we may switch between 32 and 64 bit code execution in any process. Linux will choose the right syscall numbers in entries for those processes. But it still will consider them native/compat by the personality, that elf loader set on launch. This affects i.e., ptrace syscall on those tasks: PTRACE_GETREGSET will return 64/32-bit regset according to process's mode (that's how strace detect task's personality from 4.8 version). This patch adds arch_prctl calls for x86 that make possible to tell Linux kernel in which mode the application is running currently. Mainly, this is needed for CRIU: restoring compatible & native applications both from 64-bit restorer. By that reason I wrapped all the code in CONFIG_CHECKPOINT_RESTORE. This patch solves also a problem for running 64-bit code in 32-bit elf (and reverse), that you have only 32-bit elf vdso for fast syscalls. When switching between native <-> compat mode by arch_prctl, it will remap needed vdso binary blob for target mode. Cc: Cyrill Gorcunov Cc: Pavel Emelyanov Cc: Konstantin Khorenko CC: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Dmitry Safonov --- arch/x86/entry/vdso/vma.c | 76 ++++++++++++++++++++++++++-------- arch/x86/include/asm/vdso.h | 5 +++ arch/x86/include/uapi/asm/prctl.h | 6 +++ arch/x86/kernel/process_64.c | 87 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 157 insertions(+), 17 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 10f704584922..9a1561da0bad 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -156,22 +156,21 @@ static int vvar_fault(const struct vm_special_mapping *sm, return VM_FAULT_SIGBUS; } -static int map_vdso(const struct vdso_image *image, bool calculate_addr) +static int do_map_vdso(const struct vdso_image *image, bool calculate_addr, + unsigned long addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - unsigned long addr, text_start; + unsigned long text_start; int ret = 0; static const struct vm_special_mapping vvar_mapping = { .name = "[vvar]", .fault = vvar_fault, }; - if (calculate_addr) { + if (calculate_addr && !addr) { addr = vdso_addr(current->mm->start_stack, image->size - image->sym_vvar_start); - } else { - addr = 0; } down_write(&mm->mmap_sem); @@ -209,11 +208,11 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) VM_PFNMAP, &vvar_mapping); - if (IS_ERR(vma)) { + if (IS_ERR(vma)) ret = PTR_ERR(vma); - goto up_fail; - } + if (ret) + do_munmap(mm, addr, image->size - image->sym_vvar_start); up_fail: if (ret) current->mm->context.vdso = NULL; @@ -223,24 +222,28 @@ up_fail: } #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) -static int load_vdso32(void) +static int load_vdso32(unsigned long addr) { if (vdso32_enabled != 1) /* Other values all mean "disabled" */ return 0; - return map_vdso(&vdso_image_32, false); + return do_map_vdso(&vdso_image_32, false, addr); } #endif #ifdef CONFIG_X86_64 -int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) +static int load_vdso64(unsigned long addr) { if (!vdso64_enabled) return 0; - return map_vdso(&vdso_image_64, true); + return do_map_vdso(&vdso_image_64, true, addr); } +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) +{ + return load_vdso64(0); +} #ifdef CONFIG_COMPAT int compat_arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) @@ -250,20 +253,59 @@ int compat_arch_setup_additional_pages(struct linux_binprm *bprm, if (!vdso64_enabled) return 0; - return map_vdso(&vdso_image_x32, true); + return do_map_vdso(&vdso_image_x32, true, 0); } #endif #ifdef CONFIG_IA32_EMULATION - return load_vdso32(); + return load_vdso32(0); #else return 0; #endif } -#endif -#else +#endif /* CONFIG_COMPAT */ + +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) +unsigned long unmap_vdso(void) +{ + struct vm_area_struct *vma; + unsigned long addr = (unsigned long)current->mm->context.vdso; + + if (!addr) + return 0; + + /* vvar pages */ + vma = find_vma(current->mm, addr - 1); + if (vma) + vm_munmap(vma->vm_start, vma->vm_end - vma->vm_start); + + /* vdso pages */ + vma = find_vma(current->mm, addr); + if (vma) + vm_munmap(vma->vm_start, vma->vm_end - vma->vm_start); + + current->mm->context.vdso = NULL; + + return addr; +} +/* + * Maps needed vdso type: vdso_image_32/vdso_image_64 + * @compatible - true for compatible, false for native vdso image + * @addr - specify addr for vdso mapping (0 for random/searching) + * NOTE: be sure to set/clear thread-specific flags before + * calling this function. + */ +int map_vdso(bool compatible, unsigned long addr) +{ + if (compatible) + return load_vdso32(addr); + else + return load_vdso64(addr); +} +#endif /* CONFIG_IA32_EMULATION && CONFIG_CHECKPOINT_RESTORE */ +#else /* !CONFIG_X86_64 */ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) { - return load_vdso32(); + return load_vdso32(0); } #endif diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index 43dc55be524e..3ead7cc48a68 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -39,6 +39,11 @@ extern const struct vdso_image vdso_image_x32; extern const struct vdso_image vdso_image_32; #endif +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) +extern int map_vdso(bool to_compat, unsigned long addr); +extern unsigned long unmap_vdso(void); +#endif + extern void __init init_vdso_image(const struct vdso_image *image); #endif /* __ASSEMBLER__ */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 3ac5032fae09..455844f06485 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -6,4 +6,10 @@ #define ARCH_GET_FS 0x1003 #define ARCH_GET_GS 0x1004 +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) +#define ARCH_SET_COMPAT 0x2001 +#define ARCH_SET_NATIVE 0x2002 +#define ARCH_GET_PERSONALITY 0x2003 +#endif + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 6cbab31ac23a..e50660d59530 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -49,6 +49,7 @@ #include #include #include +#include asmlinkage extern void ret_from_fork(void); @@ -505,6 +506,83 @@ void set_personality_ia32(bool x32) } EXPORT_SYMBOL_GPL(set_personality_ia32); +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) +/* + * Check if there are still some vmas (except vdso) for current, + * which placed above compatible TASK_SIZE. + * Check also code, data, stack, args and env placements. + * Returns true if all mappings are compatible. + */ +static bool task_mappings_compatible(void) +{ + struct mm_struct *mm = current->mm; + unsigned long top_addr = IA32_PAGE_OFFSET; + struct vm_area_struct *vma = find_vma(mm, top_addr); + + if (mm->end_code > top_addr || + mm->end_data > top_addr || + mm->start_stack > top_addr || + mm->brk > top_addr || + mm->arg_end > top_addr || + mm->env_end > top_addr) + return false; + + while (vma) { + if ((vma->vm_start != (unsigned long)mm->context.vdso) && + (vma->vm_end != (unsigned long)mm->context.vdso)) + return false; + + top_addr = vma->vm_end; + vma = find_vma(mm, top_addr); + } + + return true; +} + +static int do_set_personality(bool compat, unsigned long addr) +{ + int ret; + unsigned long old_vdso_base; + unsigned long old_mmap_base = current->mm->mmap_base; + + if (test_thread_flag(TIF_IA32) == compat) /* nothing to do */ + return 0; + + if (compat && !task_mappings_compatible()) + return -EFAULT; + + /* + * We can't just remap vdso to needed location: + * vdso compatible and native images differs + */ + old_vdso_base = unmap_vdso(); + + if (compat) + set_personality_ia32(false); + else + set_personality_64bit(); + + /* + * Update mmap_base & get_unmapped_area helper, side effect: + * one may change get_unmapped_area or mmap_base with personality() + * or switching to and fro compatible mode + */ + arch_pick_mmap_layout(current->mm); + + ret = map_vdso(compat, addr); + if (ret) { + current->mm->mmap_base = old_mmap_base; + if (compat) + set_personality_64bit(); + else + set_personality_ia32(false); + WARN_ON(map_vdso(!compat, old_vdso_base)); + } + + return ret; +} +#endif + long do_arch_prctl(struct task_struct *task, int code, unsigned long addr) { int ret = 0; @@ -592,6 +670,15 @@ long do_arch_prctl(struct task_struct *task, int code, unsigned long addr) break; } +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) + case ARCH_SET_COMPAT: + return do_set_personality(true, addr); + case ARCH_SET_NATIVE: + return do_set_personality(false, addr); + case ARCH_GET_PERSONALITY: + return test_thread_flag(TIF_IA32); +#endif + default: ret = -EINVAL; break; -- 2.7.4