From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756157AbcDGMMV (ORCPT ); Thu, 7 Apr 2016 08:12:21 -0400 Received: from mail-db3on0148.outbound.protection.outlook.com ([157.55.234.148]:4828 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756059AbcDGMMS (ORCPT ); Thu, 7 Apr 2016 08:12:18 -0400 Authentication-Results: virtuozzo.com; dkim=none (message not signed) header.d=none;virtuozzo.com; dmarc=none action=none header.from=virtuozzo.com; Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode To: Andy Lutomirski References: <1459960170-4454-1-git-send-email-dsafonov@virtuozzo.com> <1459960170-4454-2-git-send-email-dsafonov@virtuozzo.com> CC: Shuah Khan , "H. Peter Anvin" , <0x7f454c46@gmail.com>, , "linux-kernel@vger.kernel.org" , Thomas Gleixner , Cyrill Gorcunov , Borislav Petkov , , , Andrew Morton , X86 ML , Ingo Molnar , Dave Hansen From: Dmitry Safonov Message-ID: <57064E6C.2030202@virtuozzo.com> Date: Thu, 7 Apr 2016 15:11:24 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: DB5PR06CA0017.eurprd06.prod.outlook.com (10.162.165.27) To DB5PR08MB0983.eurprd08.prod.outlook.com (10.166.13.154) X-MS-Office365-Filtering-Correlation-Id: 16ea1bc9-4411-440f-bae8-08d35eddd79d X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;2:LEdnBF69/g97JI5bOSaEJiMlOkDq3Zlm6VOV4OGNESKE7YFQfZW84nr61A9BQyO5zJI1BLFnH/eXcPxm4KywNQ4PfnQdUm5VSy/n0nGgXWLHBlWQf899h8fQE2qLbXi4b0tuM4JTN+oNM95yvnoCSxkoz1ydTBlq1Ilf4o3976DUpKDDZ1r4WD6MxXSI80hm;3:fcVfqHw+0//S3IF4Nd32mW8h53a7doIezlifsgO7aaSJDVZGzSLkb2+cumIClP+y9L+7+mSFzh7v3FyAEaz2fG+WHh7kMmCGbCgNjOJHfNi4Onp+UFQ07aGNRT0br1ED;25:l9xla1QZuCD+d64DvvGTuP5iR+xERrKWd4jOd+L1gMSQQ4d9+LepqVoSSJO84yTrZcx9WDh0rfr6dbVWu3fMY4IbEte1IK3x9zdIWGyzEn7yV+un+UFt2z+IMZsYAvAwgOpTMOVcWSi2g+RqYLEGO5Buauq3imRYtsD5kh8zwd1+TUkwkmB7osj/V65UDgMZZE3W+xariSDJtD40gKKV4jdhfUoZ4w8kDojIcbuNEJCkXXkyskjz/WJVmQyf9XInB7HXcVlHbmIUCAERZhUsOegDT1ySf8M0hK1rGhhKTEoqT3u5zjBvys85HeskwQUdqIqgK9kAxJAwRj2QYAOnqpPidtJk8HDt+k7sTBnGAh85pXxfd4Pc2SCdu9YczjXDXyB3BtGUBXnePvYE84a6hPG/qdnFK1q+nIzqrjgW5wPV0E8sujCfVMmfq9GHQqbLLLg30X+Dhnk12ug8G88Z2Rmw4B4NEdIf+2uOCXwrJAyd5ZKqG5r6Mnq6GJOUTjQlRq6j6RSCgIlp6tdtsRzHGsp9GqShhX9O5lCnGdh4Pi/26Wm3yYXETkSfgewNCV+MMHf6wfpVgLeKuwhRnrt6fw== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0983; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040074)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041046)(6043046);SRVR:DB5PR08MB0983;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0983; X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;4:bCZZfZ3fjL5nOGTAgKGWGa2lRli7DtL9g4j8y9s45K9vt3haQNeIYxCdZ6NTrpeoWuFQwLd3vpoQV/sybgJItmfIuvampXFcUOpWijBj+tYQFXZVs0acfPHGQ9kkNK08gxPH1kthtIQSlBKK/JC08OO5nlmCwA+FnpLdeoxxM5X1nDfdm3PmiKKMyL08P1COzM73rH06GBpzO+CzbxJsZOOc7az+9e8Z8IsWKzbKjekQh15IkO9xeP7VbmVnSF4470xx6lhK2nbeU10uJ5qLIvPB2c91TapV0JcVubJW0O4b9zluyOSjBsU4hkII9NPafUF3B63Ln4xCYIYoi+O7vhI5xtf0l42fNb76fF3MQjFyG+EwaIb399zh/QJX1geR2Y+S201o6U8Jg8UcULUcT01d8QHd86IWUg41eErgrgs= X-Forefront-PRVS: 0905A6B2C7 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(377454003)(24454002)(51444003)(2950100001)(586003)(50986999)(230700001)(2906002)(1096002)(77096005)(6116002)(64126003)(36756003)(33656002)(3846002)(4326007)(92566002)(47776003)(59896002)(164054004)(5008740100001)(81166005)(575784001)(83506001)(19580405001)(4001350100001)(189998001)(80316001)(23676002)(50466002)(5004730100002)(65956001)(65816999)(110136002)(19580395003)(76176999)(86362001)(66066001)(65806001)(54356999)(87266999)(42186005)(142933001);DIR:OUT;SFP:1102;SCL:1;SRVR:DB5PR08MB0983;H:[10.30.26.154];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjVQUjA4TUIwOTgzOzIzOkEwK1NuM2NXYVBLb1ljSGpZQm1pZXRiZVJt?= =?utf-8?B?aTdsWmxkTmt6NjFyWElJWUZZV01qR0FoSjNUeThSYjFFZlh1UzFkMnlDMVM0?= =?utf-8?B?MnZ0VUZSYW1RaDV3TGIwTHNPOFQ3NERhRFZybHFLSTY3SXdlSTRYalBRbTBL?= =?utf-8?B?QlJFN29uZ1dHUEVCMS9POTdiTjZ0bmNxT3VUUVRBS2FSTDJWV0dMYWlyeU5P?= =?utf-8?B?dFRJWEZCYU5OU0p0dkptcEd6Qng5M3F4allrY3IwbWhWZVFtOGxLUVdzY2xC?= =?utf-8?B?M3ZBaFQ0bzR2T3lqVnNWMmdoTy9zc2VaS1VhU0NhTkNLcVdPbHNmYWt2UG41?= =?utf-8?B?TzFaK2grcURsUnRxWHhEVWplNGNjTlBvS25XRzJSd1ZuVVd4V2UrTEc1UC9C?= =?utf-8?B?NXhlRjdkMlJweW1BVWtTbHo0OGF1MVp1aVByd1hpTDRRNjNZLytsNS9zR3Y5?= =?utf-8?B?eDlqQS9ITmZkRkVySFI0RnIwVGs4Vzd3WlBPd2gwT2VGNDl0Tm5KOUZNai8z?= =?utf-8?B?NEM0a3BFbE15SzlGdmRuNU5QK1c4bkduTFRrZGFPRUVMQ2NMdS9ob1FHWlp4?= =?utf-8?B?UUdiWm1UMDhOdVJ4M0IrTnZYMkx5elIxYW11Rlk5Y1MxOEpHSEEvTytLZlhl?= =?utf-8?B?aEFjbHlYVXZ3cXJxbXgwYkpFeFR0VEFQZlgyRlVYdDlmUjkrK0F3Vm5nSUls?= =?utf-8?B?Wm5XTXRSN3NHV29vVTkwL3l6OFAxaXJtem1yRzJ2NlVENHBWSVNQR1NudU9I?= =?utf-8?B?SWFDRnpoUWRNN1BPNG5ONUZJTWN0eHMraU9BVE5BZ2swc216dXBTbW5BekFK?= =?utf-8?B?cDQvNVRmS0pNYWhZK3JvcXNYYk5UN1FHRDlhZ3NtVk5xMVg1MnZycEVqb05H?= =?utf-8?B?MWdUQXRwUFJndVNNU1RvQlVXaFBKQnR5MGNzeXdJWVNic1JYVkhpVXhGbXdB?= =?utf-8?B?Q0lBZTF6aXUyMGo0OVRtRVVnTmxLV2crRmQzQk1mdzlDSjFlM1FCUFFyWVM3?= =?utf-8?B?STRlYytsdXdLc1BGU2d0T1Q5aEM1ZnJhTjJoVmlLVGxhcDNkbmZIQVhrRG1E?= =?utf-8?B?aTF6UjM0cEVybU5ia2tIWGNYK0ZZSnMyZ00zVSs5Z0JER0hKSG1KNzZpcXBV?= =?utf-8?B?ZjU3bUJ1MW5Db1J5aUgrQXlhMmhXM2RGLzh1Yis4S3Q0bFJpd1RScFVHWlli?= =?utf-8?B?b3VGN256b3pFeWNpcmZHUkZUMVhhb3UzdktDUXM4LzlvT3ZEeUhxKzl3Yllv?= =?utf-8?B?QWFZei9ZU25ZYVR0bnBhQTdCTWJ4SjFHeTlac1JnTFppWjYyY0pJVmZLbzRS?= =?utf-8?B?R2pQRzNQVzltRVRXdW9TcGo0YUJTb2NwOENTKzVEekdwWGNJWjNYUUpZU1Ru?= =?utf-8?B?NUsrMVF3eWpkMTdaTkVPZVpjbndDcFN2UUhIc2F6Y0NWN2VtaTdmdTZVOStU?= =?utf-8?B?ck10UmNGUmRadGozbXlCNk00T1I0QUFhMktqQUxwODJBN0lBbXAvTlVib2to?= =?utf-8?B?SUdLUmNTNWdObFIyQlhGNStEYXozbXFOa0tYSGFIVU9Cem1MbHZweFJPOVdt?= =?utf-8?B?Ny8wS2Y2NFUxMHd5cjl1R1M0L0lMNUorV2w4eUZFTGdKNm45OXV6bGxtaHBr?= =?utf-8?Q?T43ZRReqHnpO6CEpu4m4?= X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;5:tijnXFbcSuijjiD3VOoCQ9IT/qMUqFUsDNQixHkguhDXUIR0XPBssB4Fwgq3+LBLY4uGtOC+NcKMCFlLw3ou06TD10Pt+/wIW7KvpQAXZFBPHqxquCQn1lxRa74trqsKFuNL2nsy+rRIOjHFw4+5Pw==;24:1kXxHUEoe6A1yXO2MjJMSIbwlaP/EmiE+WF5VYVlxNC/zUjaTJaxgvSiFAfJKFdmsmGhHV/YQpD3n6uZEBMzaJWd91Gsq8U99KUNjSPue4c=;20:9Nw5hc8lKv0B0ixXlxG6xtTahmE7/FWeVY/wdxkZEplQ97AZc38OKDMAb2WeJVUOVuwm/UQ/K/o+QOb/H5T1vJWFKQW1/Jf38I0ho1q4p632CyKt93fmrdFaDmS/1eVKYsX5BSOmT1Cfv0zwzrtDThmg8FYor2sVczRVTHOLM+A= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Apr 2016 12:12:07.3843 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB0983 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/06/2016 09:04 PM, Andy Lutomirski wrote: > [cc Dave Hansen for MPX] > > On Apr 6, 2016 9:30 AM, "Dmitry Safonov" wrote: >> Now each process that runs natively on x86_64 may execute 32-bit code >> by proper setting it's CS selector: either from LDT or reuse Linux's >> USER32_CS. The vice-versa is also valid: running 64-bit code in >> compatible task is also possible by choosing USER_CS. >> So we may switch between 32 and 64 bit code execution in any process. >> Linux will choose the right syscall numbers in entries for those >> processes. But it still will consider them native/compat by the >> personality, that elf loader set on launch. This affects i.e., ptrace >> syscall on those tasks: PTRACE_GETREGSET will return 64/32-bit regset >> according to process's mode (that's how strace detect task's >> personality from 4.8 version). >> >> This patch adds arch_prctl calls for x86 that make possible to tell >> Linux kernel in which mode the application is running currently. >> Mainly, this is needed for CRIU: restoring compatible & native >> applications both from 64-bit restorer. By that reason I wrapped all >> the code in CONFIG_CHECKPOINT_RESTORE. >> This patch solves also a problem for running 64-bit code in 32-bit elf >> (and reverse), that you have only 32-bit elf vdso for fast syscalls. >> When switching between native <-> compat mode by arch_prctl, it will >> remap needed vdso binary blob for target mode. > General comments first: Thanks for your comments. > You forgot about x32. Will add x32 support for v2. > I think that you should separate vdso remapping from "personality". > vdso remapping should be available even on native 32-bit builds, which > means that either you can't use arch_prctl for it or you'll have to > wire up arch_prctl as a 32-bit syscall. I cant say, I got your point. Do you mean by vdso remapping mremap for vdso/vvar pages? I think, it should work now. I did remapping for vdso as blob for native x86_64 task differs to compatible task. So it's just changing blobs, address value is there for convenience - I may omit it and just remap different vdso blob at the same place where was previous vdso. I'm not sure, why do we need possibility to map 64-bit vdso blob on native 32-bit builds? > For "personality", someone needs to enumerate all of the various thigs > that try to track bitness and see how many of them even make sense. > On brief inspection: > > - TIF_IA32: affects signal format and does something to ptrace. I > suspect that whatever it does to ptrace is nonsensical, and I don't > know whether we're stuck with it. > > - TIF_ADDR32 affects TASK_SIZE and mmap behavior (and the latter > isn't even done in a sensible way). > > - is_64bit_mm affects MPX and uprobes. > > On even more brief inspection: > > - uprobes using is_64bit_mm is buggy. > > - I doubt that having TASK_SIZE vary serves any purpose. Does anyone > know why TASK_SIZE is different for different tasks? It would save > code size and speed things up if TASK_SIZE were always TASK_SIZE_MAX. > - Using TIF_IA32 for signal processing is IMO suboptimal. Instead, > we should record which syscall installed the signal handler and use > the corresponding frame format. Oh, I like it, will do. > - Using TIF_IA32 of the *target* for ptrace is nonsense. Having > strace figure out syscall type using that is actively buggy, and I ran > into that bug a few days ago and cursed at it. strace should inspect > TS_COMPAT (I don't know how, but that's what should happen). We may > be stuck with this for ABI reasons. ptrace may check seg_32bit for code selector, what do you think? > - For MPX, could we track which syscall called mpx_enable_management? > I.e. can we stash in_compat_syscall's return from > mpx_enable_management and use that instead of trying to determine the > MPX data structure format by the mm's initial type? > > If we make all of these changes, we're left with *only* the ptrace > thing, and you could certainly add a way for CRIU to switch that > around. That sounds really good! > >> Cc: Cyrill Gorcunov >> Cc: Pavel Emelyanov >> Cc: Konstantin Khorenko >> CC: Dmitry Safonov <0x7f454c46@gmail.com> >> Signed-off-by: Dmitry Safonov >> --- >> arch/x86/entry/vdso/vma.c | 76 ++++++++++++++++++++++++++-------- >> arch/x86/include/asm/vdso.h | 5 +++ >> arch/x86/include/uapi/asm/prctl.h | 6 +++ >> arch/x86/kernel/process_64.c | 87 +++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 157 insertions(+), 17 deletions(-) >> >> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c >> index 10f704584922..9a1561da0bad 100644 >> --- a/arch/x86/entry/vdso/vma.c >> +++ b/arch/x86/entry/vdso/vma.c >> @@ -156,22 +156,21 @@ static int vvar_fault(const struct vm_special_mapping *sm, >> return VM_FAULT_SIGBUS; >> } >> >> -static int map_vdso(const struct vdso_image *image, bool calculate_addr) >> +static int do_map_vdso(const struct vdso_image *image, bool calculate_addr, >> + unsigned long addr) >> { >> struct mm_struct *mm = current->mm; >> struct vm_area_struct *vma; >> - unsigned long addr, text_start; >> + unsigned long text_start; >> int ret = 0; >> static const struct vm_special_mapping vvar_mapping = { >> .name = "[vvar]", >> .fault = vvar_fault, >> }; >> >> - if (calculate_addr) { >> + if (calculate_addr && !addr) { >> addr = vdso_addr(current->mm->start_stack, >> image->size - image->sym_vvar_start); >> - } else { >> - addr = 0; >> } >> > This is overcomplicated. Just pass in the address and us it as supplied. Will do. > >> down_write(&mm->mmap_sem); >> @@ -209,11 +208,11 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr) >> VM_PFNMAP, >> &vvar_mapping); >> >> - if (IS_ERR(vma)) { >> + if (IS_ERR(vma)) >> ret = PTR_ERR(vma); >> - goto up_fail; >> - } >> >> + if (ret) >> + do_munmap(mm, addr, image->size - image->sym_vvar_start); >> up_fail: >> if (ret) >> current->mm->context.vdso = NULL; >> @@ -223,24 +222,28 @@ up_fail: >> } >> >> #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) >> -static int load_vdso32(void) >> +static int load_vdso32(unsigned long addr) >> { >> if (vdso32_enabled != 1) /* Other values all mean "disabled" */ >> return 0; >> >> - return map_vdso(&vdso_image_32, false); >> + return do_map_vdso(&vdso_image_32, false, addr); >> } >> #endif > I'd just make it one function do_map_vdso(type, addr). Sure, will do > >> #ifdef CONFIG_X86_64 >> -int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) >> +static int load_vdso64(unsigned long addr) >> { >> if (!vdso64_enabled) >> return 0; >> >> - return map_vdso(&vdso_image_64, true); >> + return do_map_vdso(&vdso_image_64, true, addr); >> } >> >> +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) >> +{ >> + return load_vdso64(0); >> +} >> #ifdef CONFIG_COMPAT >> int compat_arch_setup_additional_pages(struct linux_binprm *bprm, >> int uses_interp) >> @@ -250,20 +253,59 @@ int compat_arch_setup_additional_pages(struct linux_binprm *bprm, >> if (!vdso64_enabled) >> return 0; >> >> - return map_vdso(&vdso_image_x32, true); >> + return do_map_vdso(&vdso_image_x32, true, 0); >> } >> #endif >> #ifdef CONFIG_IA32_EMULATION >> - return load_vdso32(); >> + return load_vdso32(0); > No special 0 please. Yes, will use here vdso_addr(..) inplace. > >> #else >> return 0; >> #endif >> } >> -#endif >> -#else >> +#endif /* CONFIG_COMPAT */ >> + >> +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) >> +unsigned long unmap_vdso(void) >> +{ >> + struct vm_area_struct *vma; >> + unsigned long addr = (unsigned long)current->mm->context.vdso; >> + >> + if (!addr) >> + return 0; >> + >> + /* vvar pages */ >> + vma = find_vma(current->mm, addr - 1); >> + if (vma) >> + vm_munmap(vma->vm_start, vma->vm_end - vma->vm_start); >> + >> + /* vdso pages */ >> + vma = find_vma(current->mm, addr); >> + if (vma) >> + vm_munmap(vma->vm_start, vma->vm_end - vma->vm_start); >> + >> + current->mm->context.vdso = NULL; >> + >> + return addr; >> +} >> +/* >> + * Maps needed vdso type: vdso_image_32/vdso_image_64 >> + * @compatible - true for compatible, false for native vdso image >> + * @addr - specify addr for vdso mapping (0 for random/searching) >> + * NOTE: be sure to set/clear thread-specific flags before >> + * calling this function. >> + */ >> +int map_vdso(bool compatible, unsigned long addr) >> +{ >> + if (compatible) >> + return load_vdso32(addr); >> + else >> + return load_vdso64(addr); >> +} > This makes sense. But it can't be bool -- you forgot x32. Sure, will rework for x32 support. > >> +#endif /* CONFIG_IA32_EMULATION && CONFIG_CHECKPOINT_RESTORE */ >> +#else /* !CONFIG_X86_64 */ >> int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) >> { >> - return load_vdso32(); >> + return load_vdso32(0); >> } >> #endif >> >> diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h >> index 43dc55be524e..3ead7cc48a68 100644 >> --- a/arch/x86/include/asm/vdso.h >> +++ b/arch/x86/include/asm/vdso.h >> @@ -39,6 +39,11 @@ extern const struct vdso_image vdso_image_x32; >> extern const struct vdso_image vdso_image_32; >> #endif >> >> +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) >> +extern int map_vdso(bool to_compat, unsigned long addr); >> +extern unsigned long unmap_vdso(void); >> +#endif >> + >> extern void __init init_vdso_image(const struct vdso_image *image); >> >> #endif /* __ASSEMBLER__ */ >> diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h >> index 3ac5032fae09..455844f06485 100644 >> --- a/arch/x86/include/uapi/asm/prctl.h >> +++ b/arch/x86/include/uapi/asm/prctl.h >> @@ -6,4 +6,10 @@ >> #define ARCH_GET_FS 0x1003 >> #define ARCH_GET_GS 0x1004 >> >> +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) >> +#define ARCH_SET_COMPAT 0x2001 >> +#define ARCH_SET_NATIVE 0x2002 >> +#define ARCH_GET_PERSONALITY 0x2003 >> +#endif >> + >> #endif /* _ASM_X86_PRCTL_H */ >> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c >> index 6cbab31ac23a..e50660d59530 100644 >> --- a/arch/x86/kernel/process_64.c >> +++ b/arch/x86/kernel/process_64.c >> @@ -49,6 +49,7 @@ >> #include >> #include >> #include >> +#include >> >> asmlinkage extern void ret_from_fork(void); >> >> @@ -505,6 +506,83 @@ void set_personality_ia32(bool x32) >> } >> EXPORT_SYMBOL_GPL(set_personality_ia32); >> >> +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) >> +/* >> + * Check if there are still some vmas (except vdso) for current, >> + * which placed above compatible TASK_SIZE. >> + * Check also code, data, stack, args and env placements. >> + * Returns true if all mappings are compatible. >> + */ >> +static bool task_mappings_compatible(void) >> +{ >> + struct mm_struct *mm = current->mm; >> + unsigned long top_addr = IA32_PAGE_OFFSET; >> + struct vm_area_struct *vma = find_vma(mm, top_addr); >> + >> + if (mm->end_code > top_addr || >> + mm->end_data > top_addr || >> + mm->start_stack > top_addr || >> + mm->brk > top_addr || >> + mm->arg_end > top_addr || >> + mm->env_end > top_addr) >> + return false; >> + >> + while (vma) { >> + if ((vma->vm_start != (unsigned long)mm->context.vdso) && >> + (vma->vm_end != (unsigned long)mm->context.vdso)) >> + return false; >> + >> + top_addr = vma->vm_end; >> + vma = find_vma(mm, top_addr); >> + } >> + >> + return true; >> +} > What goes wrong if there are leftover high mappings? Nothing should. That's not expected by me that someone will "hide" mappings over 32-bit address space - that's the reason why I did it. I'll drop this part. > >> + >> +static int do_set_personality(bool compat, unsigned long addr) >> +{ >> + int ret; >> + unsigned long old_vdso_base; >> + unsigned long old_mmap_base = current->mm->mmap_base; >> + >> + if (test_thread_flag(TIF_IA32) == compat) /* nothing to do */ >> + return 0; > Please don't. Instead, remove TIF_IA32 entirely. Thanks, I will remove TIF_IA32. > Also, please separate out ARCH_REMAP_VDSO from any personality change API. Ok. > >> + >> + if (compat && !task_mappings_compatible()) >> + return -EFAULT; >> + >> + /* >> + * We can't just remap vdso to needed location: >> + * vdso compatible and native images differs >> + */ >> + old_vdso_base = unmap_vdso(); >> + >> + if (compat) >> + set_personality_ia32(false); >> + else >> + set_personality_64bit(); >> + >> + /* >> + * Update mmap_base & get_unmapped_area helper, side effect: >> + * one may change get_unmapped_area or mmap_base with personality() >> + * or switching to and fro compatible mode >> + */ >> + arch_pick_mmap_layout(current->mm); >> + >> + ret = map_vdso(compat, addr); >> + if (ret) { >> + current->mm->mmap_base = old_mmap_base; >> + if (compat) >> + set_personality_64bit(); >> + else >> + set_personality_ia32(false); >> + WARN_ON(map_vdso(!compat, old_vdso_base)); >> + } >> + >> + return ret; >> +} >> +#endif >> + >> long do_arch_prctl(struct task_struct *task, int code, unsigned long addr) >> { >> int ret = 0; >> @@ -592,6 +670,15 @@ long do_arch_prctl(struct task_struct *task, int code, unsigned long addr) >> break; >> } >> >> +#if defined(CONFIG_IA32_EMULATION) && defined(CONFIG_CHECKPOINT_RESTORE) >> + case ARCH_SET_COMPAT: >> + return do_set_personality(true, addr); >> + case ARCH_SET_NATIVE: >> + return do_set_personality(false, addr); >> + case ARCH_GET_PERSONALITY: >> + return test_thread_flag(TIF_IA32); >> +#endif >> + >> default: >> ret = -EINVAL; >> break; >> -- >> 2.7.4 >> -- Regards, Dmitry Safonov