From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756520AbcDGPT3 (ORCPT ); Thu, 7 Apr 2016 11:19:29 -0400 Received: from mail-db3on0142.outbound.protection.outlook.com ([157.55.234.142]:37504 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755688AbcDGPT1 (ORCPT ); Thu, 7 Apr 2016 11:19:27 -0400 Authentication-Results: virtuozzo.com; dkim=none (message not signed) header.d=none;virtuozzo.com; dmarc=none action=none header.from=virtuozzo.com; Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode To: Andy Lutomirski References: <1459960170-4454-1-git-send-email-dsafonov@virtuozzo.com> <1459960170-4454-2-git-send-email-dsafonov@virtuozzo.com> <57064E6C.2030202@virtuozzo.com> CC: Thomas Gleixner , Dmitry Safonov <0x7f454c46@gmail.com>, Dave Hansen , Ingo Molnar , Shuah Khan , Borislav Petkov , X86 ML , , Andrew Morton , , , Cyrill Gorcunov , "linux-kernel@vger.kernel.org" , "H. Peter Anvin" From: Dmitry Safonov Message-ID: <57067A4F.9090101@virtuozzo.com> Date: Thu, 7 Apr 2016 18:18:39 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: DB5PR02CA0034.eurprd02.prod.outlook.com (10.161.237.44) To DB5PR08MB0983.eurprd08.prod.outlook.com (10.166.13.154) X-MS-Office365-Filtering-Correlation-Id: 8481e34b-036e-49b1-b53e-08d35ef7ffb8 X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;2:g5psXwYS+RdTESvmlMQuKGycTIVnZydk9VITgzyw7TFzYLr/xqZoHHENRRLPbzQrFjDdl+wjkteQm/dmDdZ1PjI0YVEHFF3qXbJgBln1ZqdVB3aVpKEUbg2ABLj6+a5YJ3c1v2hT3gIRcMPwsy4z2XC3uTdgSamJToTVTLVh/0E0lnelP5WpZ/8Qek6vm3GF;3:Qs5FfKv+G3BL19sSdUHGe30iI/kJIl7rchDRKPAhlQqXHj/sRsZqCmfrtr9O6krhJRxZcWyynBJEb0WTYr66VeUMR6V8e0aUZsXXwJrkplrh9lTs+8hryKHbC03JC3IU;25:73pN4pTjqpkWji2J13rERbqIu2rzT+pIJ10tv2sobFq1ny6CLYpTkHSaHnQXPEddWacwa768i3YsDk7QZ+cP6l6aqKg4kQen09eI1gSdldSC/hh55j7GHQD0UMsNp+M5RCfy91Gta7KU8UL8Gvhk05I1toyYWnDNRSIL9RY4ZicjeXnM+AWv7p0/wQnt24yNkleIpFYWMdYTkuIMMtMhdTMiW/nweLuQ59G/oDbN50GIpInPpXDqTtDDkH2s4KKd1VGD44OVprXffOnAQHVy1/DMYK1S4EojXhNRhKVI/pKKgDdMx+V/1nnqZmxgjrhrhKzxwSgMhWCkVb4tZ005Vw/4FwO27hsf8sSvlJLlwBBmaMtwCKyBjbv0XyzEMOpJEU4rZUz95OK1b4sTyK0tsm/moAnu9PkG/ZzFSheLaCZRiZhQoca8JUL6fnl+RaF1lVzZ1D77Lacdx58iL2emwsC/t55sw6/5EBfqzQzbP2/UzlP5dQLP/W549TAznSTM8Q+TYdxwvkU5axcEkG1Mx2Xk8/VN2h58mg6XFzZIZJEYUAi0nlSgWz+pj0zzbOUwErdYJ/F4sG12Rbn7xVX5Fw== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0983; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040074)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041046)(6043046);SRVR:DB5PR08MB0983;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0983; X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;4:/TT4lQVPaFGr0knROttrZ+EIy9UTCILjKxL06m9Rb8y9aXRGAzDsQayc7sJqTw6S3bazAglkZqE0ij1ODjVltHMf2y7M14BWNuZIFhs/vZuFV6KvJ10O7rNypwT6CUbxMHm0apl8CoXenca6Q0+wrRMcNT9BvVSOcVItsR+PL8kTesLBVAsiDrX/RGtQa25RKG+Ijx/J/mydwHKgn6a5XLHhgGeuWX44md6N58wWlq8eKIf24SRoJU9+gYT1LFju4HeJoCXrPjXP1B/gWuwab38hepLJqDBPg0SSzmA5yBUFOIDIdPY+/ll3KiuRDzs3EE1zXsPxna7+v03b76D+fflEOTPHs2JHiKfb9LqnMm8h1fsn5qHR8H28J4JGLidyRfrji64w9LyCRyU4GVq6uSwihY3iHw2qw2wIe6iQfdU= X-Forefront-PRVS: 0905A6B2C7 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(377454003)(51914003)(24454002)(2950100001)(586003)(50986999)(2906002)(1096002)(230700001)(6116002)(33656002)(64126003)(36756003)(77096005)(3846002)(92566002)(47776003)(4326007)(5008740100001)(81166005)(19580405001)(83506001)(189998001)(4001350100001)(50466002)(80316001)(23676002)(5004730100002)(65816999)(110136002)(76176999)(93886004)(19580395003)(66066001)(86362001)(54356999)(87266999)(42186005)(142933001);DIR:OUT;SFP:1102;SCL:1;SRVR:DB5PR08MB0983;H:[10.30.26.154];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjVQUjA4TUIwOTgzOzIzOlV6elY4QXNDVTdobGZZYnFaNGlSM3JWaUhW?= =?utf-8?B?aHJwUXBWUGhqYVkzOVpZOUluekZ4enlKTDBmOVRZU0FzUEhlNzZCSmExbHNr?= =?utf-8?B?OWRTem50Q2lGc3pXSFJpSis1bVpMdjJtMno4cjRVNXZHVURmdk42SThGY2M1?= =?utf-8?B?MmJmKzRtUlJpQUZMRzhCT01QS1pmeWFUWVRkTCtBNWs1VlBYQ0EweWEwaWpp?= =?utf-8?B?RW1GbkRjNCtNdFlDNjNOT2g3RVdoY0o1b2JRbXNQMlN0cUtnOGZaUWlPNUlI?= =?utf-8?B?c0tQOFRTeEdjSzJKMHNML1ZoMWRhdlBXc3Z1Wktjb1FwNGd5bWlRODFhTzl6?= =?utf-8?B?dk5PM1JjWXpoNitRRUE4YTkzdXJBdmFBUkcvd3ZGdmRKTnpoUFA2NWRTVEhw?= =?utf-8?B?eWxobm5kQUpEZHRnSVFOalBlbDJHMVRGck9mTkcvcEc4eGF4TFZOYnVlcVlG?= =?utf-8?B?TlRVY2tOWElCU3dqWWttaUdWUTkzZ0NPVkpRbEdCQ1RQUnl1bnJ2SWZlUzBX?= =?utf-8?B?YnZmVWFleEQyTjhqazhoU3hDajR0VVE3alpBV3pXSmlkTXBqYzFXdjV6SFdm?= =?utf-8?B?TXQxcjdsL2VvNC9jRXQxNlliV2ttQU9KSEtQZUpaT0I3OTduY0VFWU1hR1dx?= =?utf-8?B?ZGQxWFJhYmZXeko3VTVwaXA0eG11Z20xNVVLaVR0SFVSQ3krYmRyVCtsYklu?= =?utf-8?B?TXVVemI5VUU1QkhaTVl6YXFUVXlqK3plMGtTNDhGTDVJT0orMkRFU21Dc2JW?= =?utf-8?B?cVJNdVg3NCsvbjlyVVlaTnVkTmhoTC9SMkVVVnZ1K3k2N2pKWk00cFhWa3ZP?= =?utf-8?B?MFhTWERXNFlVYStzaGo3TlFpMXlIRlVBU255Ui9yRGcxMmJmUkVKWWY3SU5B?= =?utf-8?B?eFYvWkprQWYyOUJha0pWYnhqMG1NL3VvalJ5ekZVWGUrMUtMczVsRlNPNXRm?= =?utf-8?B?NzM2TUMrTnl4aUExSU1vaW5NbzlFZmxhSlFLQityR1U2YXlWTUZYZHI3SmFV?= =?utf-8?B?eDRYZ1RNbVNnSGtUckVrTFo0VHVYNWg0WEd3R09jYjN5eE5WNjQ5VVNsa28z?= =?utf-8?B?KzZPZTd3Zk84VlorRDgvbW01Y2pxdFJTZmdPeU42cFVPbENlZjZCUmNyempH?= =?utf-8?B?NS90SUJPMGtGTWFBWmtkUjNtQkZRQXhHbGVqenRnZHVMVDVndHlTSUx6VWV5?= =?utf-8?B?MDBVUW8yL013ZlFoL1h1TmNPZnVaVC9LdW5kbkhKWkZGUjVRaXJLZVc0Mnpr?= =?utf-8?B?TmE3OWRlT1N6eHV6SlZoU3BhdnJHSFZZdmx0VmNrVE9FekcyUlpvRms4K3Mx?= =?utf-8?B?Tjc5eXU4b01VUVV5ZzhTeU9Pclc1blBPd1h4Q0dDOTVxMGNZc09SQllRcTIy?= =?utf-8?B?ekVoRjg2ZXN3WkkwU0pDdVM4QzFyWnR1ZWZKMWlpZkM0MXRtQ24ybklFNGNw?= =?utf-8?B?MFZtZXV6c011Mnc0d1RpK3FWK2tFNGFDbDBnZU40Rjd3YnFHUjI0K3dNR3Jx?= =?utf-8?B?bFMwUT09?= X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0983;5:n16P2mxaNSfH4XDMZRGKGbhV/Zh8L5J7V5c12yk+zZPoxaQ1tQ5HzWlH4sMxM85iQUJq2ZXkPkB9zqZFz+ALw5k8xo8uQg+O3t0MeTHiNDXPTK0nUmahMNrXhxV0FvYv9EmLZMrtLmcOd0xpeaJwmg==;24:4xaYyhpHAdpxFTwLNmHopmExQa+zm1aHjD04gsPEQEfIFj7mRKrF6rOfgb7ofNO6J3rLCwA374t4g+SwKICuEnNeXRpcdPllXrNgea+FtXw=;20:3oJAcqgHnuAUoG+oLP13D66yn5Cxb8SHLJd85xRkQ56nErwS4i5wW2vjM1eg8md8yGaVS9/DavtFJE7TciTiTpBe/Fa9BbGNjS27zRRvN+ZkhFjHJkdiC+N9fYThBZfrdCUCkCTz25b/Ml/mklbraRdtj1D6/1mcNrzL1iu2G2A= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Apr 2016 15:19:21.7412 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB0983 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/07/2016 05:39 PM, Andy Lutomirski wrote: > On Apr 7, 2016 5:12 AM, "Dmitry Safonov" wrote: >> On 04/06/2016 09:04 PM, Andy Lutomirski wrote: >>> [cc Dave Hansen for MPX] >>> >>> On Apr 6, 2016 9:30 AM, "Dmitry Safonov" wrote: >>>> Now each process that runs natively on x86_64 may execute 32-bit code >>>> by proper setting it's CS selector: either from LDT or reuse Linux's >>>> USER32_CS. The vice-versa is also valid: running 64-bit code in >>>> compatible task is also possible by choosing USER_CS. >>>> So we may switch between 32 and 64 bit code execution in any process. >>>> Linux will choose the right syscall numbers in entries for those >>>> processes. But it still will consider them native/compat by the >>>> personality, that elf loader set on launch. This affects i.e., ptrace >>>> syscall on those tasks: PTRACE_GETREGSET will return 64/32-bit regset >>>> according to process's mode (that's how strace detect task's >>>> personality from 4.8 version). >>>> >>>> This patch adds arch_prctl calls for x86 that make possible to tell >>>> Linux kernel in which mode the application is running currently. >>>> Mainly, this is needed for CRIU: restoring compatible & native >>>> applications both from 64-bit restorer. By that reason I wrapped all >>>> the code in CONFIG_CHECKPOINT_RESTORE. >>>> This patch solves also a problem for running 64-bit code in 32-bit elf >>>> (and reverse), that you have only 32-bit elf vdso for fast syscalls. >>>> When switching between native <-> compat mode by arch_prctl, it will >>>> remap needed vdso binary blob for target mode. >>> General comments first: >> Thanks for your comments. >>> You forgot about x32. >> Will add x32 support for v2. >> >>> I think that you should separate vdso remapping from "personality". >>> vdso remapping should be available even on native 32-bit builds, which >>> means that either you can't use arch_prctl for it or you'll have to >>> wire up arch_prctl as a 32-bit syscall. >> I cant say, I got your point. Do you mean by vdso remapping >> mremap for vdso/vvar pages? I think, it should work now. > For 32-bit, the vdso *must* exist in memory at the address that the > kernel thinks it's at. Even if you had a pure 32-bit restore stub, > you would still need vdso remap, because there's a chance the vdso > could land at an unusable address, say one page off from where you > want it. You couldn't map a wrapper because there wouldn't be any > space for it without moving the real vdso out of the way. > > Remember, you *cannot* mremap() the 32-bit vdso because you will > crash. It works by luck for 64-bit, but it's plausible that we'd want > to change that some day. (I have awful patches that speed a bunch of > things up at the cost of a vdso trampoline for 64-bit code and a bunch > of other hacks. Those patches will never go in for real, but > something else might want the ability to use 64-bit vdso trampolines.) Thanks for the elaboration, now I see. Signals and fast syscalls expect mm->context.vdso to be correct. > >> I did remapping for vdso as blob for native x86_64 task differs >> to compatible task. So it's just changing blobs, address value >> is there for convenience - I may omit it and just remap >> different vdso blob at the same place where was previous vdso. >> I'm not sure, why do we need possibility to map 64-bit vdso blob >> on native 32-bit builds? > That would fail, but I think the API should exist. But a native > 32-bit program should be able to remap the 32-bit vdso. > > IOW, I think you should be able to do, roughly: > > map_new_vdso(VDSO_32BIT, addr); > > on any kernel. > > Am I making sense? Yes. I will rework it for some API. > >>> For "personality", someone needs to enumerate all of the various thigs >>> that try to track bitness and see how many of them even make sense. >>> On brief inspection: >>> >>> - TIF_IA32: affects signal format and does something to ptrace. I >>> suspect that whatever it does to ptrace is nonsensical, and I don't >>> know whether we're stuck with it. >>> >>> - TIF_ADDR32 affects TASK_SIZE and mmap behavior (and the latter >>> isn't even done in a sensible way). >>> >>> - is_64bit_mm affects MPX and uprobes. >>> >>> On even more brief inspection: >>> >>> - uprobes using is_64bit_mm is buggy. >>> >>> - I doubt that having TASK_SIZE vary serves any purpose. Does anyone >>> know why TASK_SIZE is different for different tasks? It would save >>> code size and speed things up if TASK_SIZE were always TASK_SIZE_MAX. >>> - Using TIF_IA32 for signal processing is IMO suboptimal. Instead, >>> we should record which syscall installed the signal handler and use >>> the corresponding frame format. >> Oh, I like it, will do. >> >>> - Using TIF_IA32 of the *target* for ptrace is nonsense. Having >>> strace figure out syscall type using that is actively buggy, and I ran >>> into that bug a few days ago and cursed at it. strace should inspect >>> TS_COMPAT (I don't know how, but that's what should happen). We may >>> be stuck with this for ABI reasons. >> ptrace may check seg_32bit for code selector, what do you think? > Not sure. I have never fully wrapped my had around ptrace. Hm, I guess, it's better to check TS_COMPAT, after some thinking: It's set up on compatible syscall enter, so there is no need to check seg_32bit anyway. Huge thanks, will work on v2 according to your comments.