From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932731AbdDFSrc (ORCPT ); Thu, 6 Apr 2017 14:47:32 -0400 Received: from mail-eopbgr30090.outbound.protection.outlook.com ([40.107.3.90]:45380 "EHLO EUR03-AM5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754410AbdDFSrU (ORCPT ); Thu, 6 Apr 2017 14:47:20 -0400 Authentication-Results: virtuozzo.com; dkim=none (message not signed) header.d=none;virtuozzo.com; dmarc=none action=none header.from=virtuozzo.com; Subject: Re: [PATCH 8/8] x86/mm: Allow to have userspace mappings above 47-bits To: "Kirill A. Shutemov" References: <20170406140106.78087-1-kirill.shutemov@linux.intel.com> <20170406140106.78087-9-kirill.shutemov@linux.intel.com> CC: Linus Torvalds , Andrew Morton , , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , Andy Lutomirski , , , From: Dmitry Safonov Message-ID: <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> Date: Thu, 6 Apr 2017 21:43:41 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170406140106.78087-9-kirill.shutemov@linux.intel.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: DB6PR1001CA0033.EURPRD10.PROD.OUTLOOK.COM (10.168.69.147) To AM5PR0801MB1731.eurprd08.prod.outlook.com (10.169.247.9) X-MS-Office365-Filtering-Correlation-Id: 19ff3c02-f4a3-4512-06d0-08d47d1d593c X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:AM5PR0801MB1731; X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1731;3:4m0t9YqqlZIWz2H0EcLyiN97wHV610y8qVUn6FcUImq4ZwlU/QTkQbIxy4d7YD2QRF+KTQRA+hDJYwp1KfP3P3lXv38r13bLgJBDXWdP6+bkXNyKpVhdg+CRYCmJyY6ifrXieUZlXnjuT4l1cKMxSWZ2E8JGTxstvev1xE7iFZ7G9HnfwdVmtFL+xIDQQDFhq+0fyyKuJbRmnBAZs1x9QClJQ2d77y0hcrgVR9FYPJMsGE5HWfgwB4Gx3pU9KK7IX1aGwPiWhHc/F4XM67rctGtaJREwqBfN20UyE+JmmTuPkU2R+AQWZnjy/Oxg2ArTXWEZd5mPc/z0n2D2u1XK6A==;25:d2v/PxRlh5ZcY1UtoTBm5gl/QLIJWfNI2SrKmBVBTZUBxGdUG8DFWQS3+rJXNdKijFE9mLVRt1x/3hTWYxDASvETtY6kEIWBdMRrfTl+80VdPWZiZO+Jp6AQ+5e14DegrUWRqiZ/6utirw83EuNZGwPkglf+JvPcSdmDOLm6HXXSHJFai+ib3eUW3KECdWUookx09FJDz1OEqG34+GdiNc+1qG2m7gHlXRI+65GDR4IF0OAywcIUs3YTY7re25OSxD468RfKmAXuimJ+Og6rbgOxFnmw7fqzAgaC3yVAcCh09JEUkUB5LfQejLsefpdbo9UCdxLpGh8lBsc2hTK/hTq4sSdQydJnZe6oWSU9IzSA6cv6/Lk3BL0E0NP+qZ4adsDl4hue3+gwr7Wr1LCR10IA3O+UV8E/TnXjLvYZuP2f5e1Q8EXskoQrD4929SoHjrfu0y/7tSBRK9WZBA0yOQ== X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1731;31:5bS6i1qBLHcqhg1/MDSGLsSIZg12Szilz8h6gj+pe0Mc1v6mkiRA/IA3xmGxAEolHi/7cd4X9GgNy7/tTzj6I0n2YsbIGvOFr49dFpJiHtqgJZi1LV3u+E4sHZGtBw4wyjeRmIKJVN8VKmyjAVk8aVs4sn9JgSS//aK31XERFFSIJdKZblGB+7u0TSNnMrZHfmSZMZEu0R6k0EKg/HuNFWFhoNyrwOqlQZs94nTSV7Q=;20:YoQcGzJNHCNqWDQIacS0TznUzOO8WhvKF+ZG7gTSAG8uclSBmZ6Evf6SDNWlscKK6h2uqZ/yA3VPVtdsIPLBwQMlydDhdisDqk/6tj86wrg63IqcQy6Y6YnjDCpg7u2n0rvm1lXDy4NGlj7bb/dZCffPgu/YszXME+/NVoP3QtypV+ymlad2LanEF6CwrVdU+s9CkPmrixEjEJqnLBa+etI1WeHlr+BUcc7sxAli4bGNF5hgIrb2TgmUkAijR9oWGFAeXHznbboV+D3YD7H8uoSCeIeZtdSYtWz2XrL3OgpzHweG5fuWw5oKPesZbxpKljotL859veT5Io+hrNuYV7PmTRT3PYgrrBvkHLJOp3k5ltg4wnPm/WkyX3JZqNgpUXN8eu16r+TB28s5EjL/qXpo6CeHciU0o5DES4U9u30= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(265634631926514)(228905959029699); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(6041248)(20161123560025)(20161123564025)(20161123562025)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(6072148);SRVR:AM5PR0801MB1731;BCL:0;PCL:0;RULEID:;SRVR:AM5PR0801MB1731; X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1731;4:sqAWaZxLKbH2xu/uFAIcL+R4EYCrz+AuyU0WyjmwXzfekcIDaVkksxlwekm8tR9UPXI+Begaanh7iVFBqVcdJXyeUuIvIt3OGesBe0x+RHN3hRdCmnvhTpnKNi0pEiYjxIMhh+NqDLgxC6/BROYX0STVEnxK/GoifVhzsNx2CQ+wft9Urby6bMb/9a2d1N/icKRQZHcT1y2hTbKQKMYQwr2cEUpzYGFlLUX7daRAjK+9gidVwjJEoluMQxzqHLoICU/8vuByASXYfnUpBDiY+xKMc3Utddp6C6lgWs7gw0MDbxbP8ZSlIHiSmDNWKo03VoKmLUtod76/WOBXi6C0bP8ljrCZGrM6jETCzgqYMAb8qIdbkbbH8XlvfwIiOHR4TidosNKmtFvqhdDj5BiQneIVEe+F4sX1E//yIzQWgscCjxOZt57WZyAD8qv/ofmgNSoBQPX1P2RXdnE24Z6hJdq7ujt33YO4cgAatLFXjIUJfyr+F3Fw4CXVckKmUn8t8o71R0rCDcMP0Ls+e6K0eWEFu8jZcmbSM9gcdkoE42ttAZMDHq79by4WLqLKe39q/7UPXMQYFgxkRuDMEBoEsy1StnW8u/uNIXY88VLOG1R7sX402/cyaEAh2j1ys2b00D3QZwVluXbLA9OHAdLapVBnLdeHYJA0gJWHvyVkiQC4Das7oG6TyakVqrV8Zkdn4Md5yBdfTkxOF0YpnJK7f+3/J5naTrKq+IV3GeIn13Tmn52sGD7TzOLLHI2iZ+gE5BF2bdI3/gGwgtMIOzZPkbqQquw9wWPUJv2DUzHwT7I= X-Forefront-PRVS: 02698DF457 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(39410400002)(39400400002)(39830400002)(39450400003)(377454003)(24454002)(33646002)(229853002)(575784001)(64126003)(86362001)(230700001)(38730400002)(50466002)(31696002)(4326008)(110136004)(47776003)(189998001)(53936002)(4001350100001)(66066001)(53546009)(25786009)(6916009)(2950100002)(2906002)(6666003)(83506001)(65956001)(90366009)(7416002)(5660300001)(8676002)(305945005)(7736002)(54906002)(23746002)(6486002)(77096006)(54356999)(76176999)(50986999)(36756003)(6116002)(3846002)(81166006)(42186005)(31686004)(573474001);DIR:OUT;SFP:1102;SCL:1;SRVR:AM5PR0801MB1731;H:[172.16.25.13];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;AM5PR0801MB1731;23:RQYYmg6sJ2ylsxeJVQ2ZAWEjxxI5opwIPQR?= =?Windows-1252?Q?MPj8DG3YI5O/4B3fJw5pvqBw+RIlIuCqwR04U30sMH8qhc5wMCeqb77S?= =?Windows-1252?Q?cGAYXt+cx5ox+3VLXvOQpRuozkERyDFP5FsuI0pH9lYPvc+MV0OR88vQ?= =?Windows-1252?Q?4HbY8bZD5iMppc8CzOxOYDlBL+jhEmBJa1fKt0cWbDqCVeGCptxI7Ges?= =?Windows-1252?Q?x1yBzEgo5kxdr2/lYISEhN/vdh/k7i4XCWNV9K6wT7SXqqoH9wpC6lSp?= =?Windows-1252?Q?m4yANXtWEFsOtQgFlJmcpIEoI+rh4lns94S2tRZ5+X29XUlJGhWQFcjm?= =?Windows-1252?Q?P8EAie+iskI7Pinuz6KhJPu1yPZVgbslfQyyf063tO38O/w0B7fqEEdx?= =?Windows-1252?Q?Jixj9WTf0vGfvefNvtGrMY43wslh/RPzurhXxJp2JsplY2LgEkYv8otE?= =?Windows-1252?Q?poZVgpuKBCJByKCSDTrYfINqkpdxiGZsY5z/2mrmOVibkC0enRRbIQtk?= =?Windows-1252?Q?4xsoLzF2MV7FzTa4YzuvX2Vhem4LWepMq3UifLWK8wbRwjuxRTPUnRjr?= =?Windows-1252?Q?5ePtnZ+1kYo7GTpUlH4pPCNDrUxJc3AyngMru4LHuqjL31XnFoKOPDUP?= =?Windows-1252?Q?fH0uMt3xxn5J7NEV1loZmpPmtnXj9i2C2BxBJXocJdo3sJo8yVTzhR9B?= =?Windows-1252?Q?+G4tZKl1jaEJMYZ/RpuQmLbI24lSaArbeSue8ZqN3ALKJ2jyH+iW88ug?= =?Windows-1252?Q?ee4b0ZffAnyOv8mqbUAkFYUIJN4DJvAcM9hn/c5tOsD/GcF9NiGzc+Vf?= =?Windows-1252?Q?vBTbxsYNKKTvhYzsYAuYNiVq5yz2bbU7XCbGL6lJKuFeFqRuAeHJW6nK?= =?Windows-1252?Q?BWivatBWCu06qYd7yQsZbL4FCWbVBJ7jikvYPxrhYnGMkAuX4/haPYpL?= =?Windows-1252?Q?nGVksDCtplhSzN9oNOCuUAFMImHqehMyoBHTZO5Al134ouppJ1zx8hz6?= =?Windows-1252?Q?AzHEJEjbPJp4equlgYpxUMfSbtX4EqzaXQOx+0w7nipt14rhucroRGrj?= =?Windows-1252?Q?Fjqi8ZW32Uk4oRdcU5QbmO1QYlWYlgafU+nMRO0mFmCfyI5wPvHU2lVc?= =?Windows-1252?Q?sXe76nIaB8gJ3TJuBkkOfEvXzW8fk2IS+022XSF6ZGHRFRPgL48PD6tg?= =?Windows-1252?Q?zkU+7QUWzg/G0NCLoUQO8M+MrsGGksWoyVTT+k202mhhQWAdsVhkdPUH?= =?Windows-1252?Q?hxLWGZHz94XwR4YxaRtdDJwchfab1AVe3TTrpW2rA2m2tQ027zXtu8x6?= =?Windows-1252?Q?iUUDO1K2l+iOr5FG/wbYGgGWDBBbjhKDEv7ygOmOuF0SmT8YhAvpdcCu?= =?Windows-1252?Q?wiXcBobUPgRzytwmIJG29w0+aHjqnwrfrUZd/W40hUiOchLHXWU4xcD0?= =?Windows-1252?Q?=3D?= X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1731;6:KE9UCAkmmtOroVDdU8EIFR1UBNrD8yKnlkD6lCJED+BqO2KWRVJiDh4UERB5/fgnnQAUgskFFANyCnq82dZiORa/3Vn08A6AlU4wCV9nr7AkexotmSsPIfw0SCgdMIpovLt7on1CNB5nNg2X8zb2wKyHefbedT9VssD0DmTXvFC/10IZOqd8lC8QSp1o6VCCSfvc8AzTpKn0jK5ZtXgzh0aFA8QqOaD1gxuUdKKmgX2Z4ONrJ8hL2RWosjSCBjbcGm7KuGOPJr9yMjk4fzaKQH+5qbDZAg/ShFXQKo7U5EDz1HbB9J71m1ysdLjuWYlBvHqzVJNlfoNf2hvP2XpEsnvBisBXVzxLvK872SD9vbpsmpg+sKpL8k6NVZBjfe5UFA+UQxUnBlxZ9tfoTPnp2hgtAz+oyw5vXWquDVlzoBPEMmlUiFzlZRrJEBYhKPcVm4auqMtQK3OWWmUsHWJMKQ==;5:plSxuI1UiYWhY7fZgsgRkpHAWm/cUcnn4dfU3d4EweiUUEq/xZHwGB3N2a4bq+E7L2fxmIMWKdBwFvAatCxqUzDy6wo0aGn4lwVVm/PfEwWbW9GufBKI37dWq+SmR7dDpoGpRaX92fg9PC5TlooB0w==;24:ba6Q5XEoAN2w85ZCdqhp2OjsfnXLPSmgjrPpjQBPooZP/XiDz4pLOoKD8IXWR9I2EfgBor6BTXxE2yoy6oveOO72+wJxJ3b9lacg4tQSkH8= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1731;7:QmH8jd+ugkbCaTRTUwljlnqpXfJDU1QPMTmsRqh5+J5qDbWeov+zCHIGAOvFcet+CXxN6cYMhvPqnANzdHt1ZzUwvP+RxNnHWBkCRO1n3j0ValBEeRpJ2TBBwz+/92Mnn53k8XRsXQ1JQA1NVPlZhsM39HkLXmIK6sQPDwe7C+EMVHa3OfAufIrocwNjgjwcxOUcZOSnGJNOHq3r1yarCC8uopfpxor0qqIMgUuRzcAiRY7ZLgzr3EylVumSPXtM+ssp2lyi5n/4OS1G0MpPWJ4cUmHLH30dx7WMf9U4Ibrrp1VoUO6z3cVQma/LSP9Wk7rw60xVBT/d1AL91O59Mg==;20:fR/PQGf4Q3D9lOSsPg4XLqteX9c99dDS38CShwzTDme6Ft3hxFR8ja+T7pPKZU8gSThDMvpCB9M+GW8KgtOMQyvZken+GmDd+u/ng36qyqIj2jHrizmNgjl3tajw/5xMWr68DTAPx1SJnDLwS89jhFPcQj1laDz8cNV8HToxLmo= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2017 18:47:15.7411 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB1731 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Kirill, On 04/06/2017 05:01 PM, Kirill A. Shutemov wrote: > On x86, 5-level paging enables 56-bit userspace virtual address space. > Not all user space is ready to handle wide addresses. It's known that > at least some JIT compilers use higher bits in pointers to encode their > information. It collides with valid pointers with 5-level paging and > leads to crashes. > > To mitigate this, we are not going to allocate virtual address space > above 47-bit by default. > > But userspace can ask for allocation from full address space by > specifying hint address (with or without MAP_FIXED) above 47-bits. > > If hint address set above 47-bit, but MAP_FIXED is not specified, we try > to look for unmapped area by specified address. If it's already > occupied, we look for unmapped area in *full* address space, rather than > from 47-bit window. Do you wish after the first over-47-bit mapping the following mmap() calls return also over-47-bits if there is free space? It so, you could simplify all this code by changing only mm->mmap_base on the first over-47-bit mmap() call. This will do simple trick. > > This approach helps to easily make application's memory allocator aware > about large address space without manually tracking allocated virtual > address space. > > One important case we need to handle here is interaction with MPX. > MPX (without MAWA( extension cannot handle addresses above 47-bit, so we > need to make sure that MPX cannot be enabled we already have VMA above > the boundary and forbid creating such VMAs once MPX is enabled. > > Signed-off-by: Kirill A. Shutemov > Cc: Dmitry Safonov > --- > arch/x86/include/asm/elf.h | 2 +- > arch/x86/include/asm/mpx.h | 9 +++++++++ > arch/x86/include/asm/processor.h | 9 ++++++--- > arch/x86/kernel/sys_x86_64.c | 28 +++++++++++++++++++++++++++- > arch/x86/mm/hugetlbpage.c | 27 ++++++++++++++++++++++++--- > arch/x86/mm/mmap.c | 2 +- > arch/x86/mm/mpx.c | 33 ++++++++++++++++++++++++++++++++- > 7 files changed, 100 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h > index d4d3ed456cb7..67260dbe1688 100644 > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -250,7 +250,7 @@ extern int force_personality32; > the loader. We need to make sure that it is out of the way of the program > that it will "exec", and that there is sufficient room for the brk. */ > > -#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) > +#define ELF_ET_DYN_BASE (DEFAULT_MAP_WINDOW / 3 * 2) This will kill 32-bit userspace: As DEFAULT_MAP_WINDOW is defined as what previously was TASK_SIZE_MAX, not TASK_SIZE, for ia32/x32 ELF_ET_DYN_BASE will be over 4Gb. Here is the test: [root@localhost test]# cat hello-world.c #include int main(int argc, char **argv) { printf("Maybe this world is another planet's hell.\n"); return 0; } [root@localhost test]# gcc -m32 hello-world.c -o hello-world [root@localhost test]# ./hello-world [ 35.306726] hello-world[1948]: segfault at ffa5288c ip 00000000f77b5a82 sp 00000000ffa52890 error 6 in ld-2.23.so[f77b5000+23000] Segmentation fault (core dumped) So, dynamic base should differ between 32/64-bits as it was with TASK_SIZE. > > /* This yields a mask that user programs can use to figure out what > instruction set this CPU supports. This could be done in user space, > diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h > index a0d662be4c5b..7d7404756bb4 100644 > --- a/arch/x86/include/asm/mpx.h > +++ b/arch/x86/include/asm/mpx.h > @@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm) > } > void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long start, unsigned long end); > + > +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, > + unsigned long flags); > #else > static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) > { > @@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct mm_struct *mm, > unsigned long start, unsigned long end) > { > } > + > +static inline unsigned long mpx_unmapped_area_check(unsigned long addr, > + unsigned long len, unsigned long flags) > +{ > + return addr; > +} > #endif /* CONFIG_X86_INTEL_MPX */ > > #endif /* _ASM_X86_MPX_H */ > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 3cada998a402..9f437aea7f57 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -795,6 +795,7 @@ static inline void spin_lock_prefetch(const void *x) > #define IA32_PAGE_OFFSET PAGE_OFFSET > #define TASK_SIZE PAGE_OFFSET > #define TASK_SIZE_MAX TASK_SIZE > +#define DEFAULT_MAP_WINDOW TASK_SIZE > #define STACK_TOP TASK_SIZE > #define STACK_TOP_MAX STACK_TOP > > @@ -834,7 +835,9 @@ static inline void spin_lock_prefetch(const void *x) > * particular problem by preventing anything from being mapped > * at the maximum canonical address. > */ > -#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) > +#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) > + > +#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) > > /* This decides where the kernel will search for a free chunk of vm > * space during mmap's. > @@ -847,7 +850,7 @@ static inline void spin_lock_prefetch(const void *x) > #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \ > IA32_PAGE_OFFSET : TASK_SIZE_MAX) > > -#define STACK_TOP TASK_SIZE > +#define STACK_TOP DEFAULT_MAP_WINDOW > #define STACK_TOP_MAX TASK_SIZE_MAX > > #define INIT_THREAD { \ > @@ -870,7 +873,7 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, > * space during mmap's. > */ > #define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3)) > -#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE) > +#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(DEFAULT_MAP_WINDOW) ditto > > #define KSTK_EIP(task) (task_pt_regs(task)->ip) > > diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c > index 207b8f2582c7..593a31e93812 100644 > --- a/arch/x86/kernel/sys_x86_64.c > +++ b/arch/x86/kernel/sys_x86_64.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > /* > * Align a virtual address to avoid aliasing in the I$ on AMD F15h. > @@ -132,6 +133,10 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, > struct vm_unmapped_area_info info; > unsigned long begin, end; > > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > if (flags & MAP_FIXED) > return addr; > > @@ -151,7 +156,16 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, > info.flags = 0; > info.length = len; > info.low_limit = begin; > - info.high_limit = end; > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW) > + info.high_limit = min(end, TASK_SIZE); > + else > + info.high_limit = min(end, DEFAULT_MAP_WINDOW); > + > info.align_mask = 0; > info.align_offset = pgoff << PAGE_SHIFT; > if (filp) { > @@ -171,6 +185,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > unsigned long addr = addr0; > struct vm_unmapped_area_info info; > > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > /* requested length too big for entire address space */ > if (len > TASK_SIZE) > return -ENOMEM; > @@ -195,6 +213,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > info.length = len; > info.low_limit = PAGE_SIZE; > info.high_limit = get_mmap_base(0); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; Hmm, TASK_SIZE depends now on TIF_ADDR32, which is set during exec(). That means for ia32/x32 ELF which has TASK_SIZE < 4Gb as TIF_ADDR32 is set, which can do 64-bit syscalls - the subtraction will be a negative.. > + > info.align_mask = 0; > info.align_offset = pgoff << PAGE_SHIFT; > if (filp) { > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index 302f43fd9c28..9a0b89252c52 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #if 0 /* This is just for testing */ > struct page * > @@ -87,23 +88,38 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file, > info.low_limit = get_mmap_base(1); > info.high_limit = in_compat_syscall() ? > tasksize_32bit() : tasksize_64bit(); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW) > + info.high_limit = TASK_SIZE; > + > info.align_mask = PAGE_MASK & ~huge_page_mask(h); > info.align_offset = 0; > return vm_unmapped_area(&info); > } > > static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, > - unsigned long addr0, unsigned long len, > + unsigned long addr, unsigned long len, > unsigned long pgoff, unsigned long flags) > { > struct hstate *h = hstate_file(file); > struct vm_unmapped_area_info info; > - unsigned long addr; > > info.flags = VM_UNMAPPED_AREA_TOPDOWN; > info.length = len; > info.low_limit = PAGE_SIZE; > info.high_limit = get_mmap_base(0); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; ditto > + > info.align_mask = PAGE_MASK & ~huge_page_mask(h); > info.align_offset = 0; > addr = vm_unmapped_area(&info); > @@ -118,7 +134,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, > VM_BUG_ON(addr != -ENOMEM); > info.flags = 0; > info.low_limit = TASK_UNMAPPED_BASE; > - info.high_limit = TASK_SIZE; > + info.high_limit = DEFAULT_MAP_WINDOW; ditto about 32-bits > addr = vm_unmapped_area(&info); > } > > @@ -135,6 +151,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > > if (len & ~huge_page_mask(h)) > return -EINVAL; > + > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > if (len > TASK_SIZE) > return -ENOMEM; > > diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c > index 19ad095b41df..d63232a31945 100644 > --- a/arch/x86/mm/mmap.c > +++ b/arch/x86/mm/mmap.c > @@ -44,7 +44,7 @@ unsigned long tasksize_32bit(void) > > unsigned long tasksize_64bit(void) > { > - return TASK_SIZE_MAX; > + return DEFAULT_MAP_WINDOW; > } > > static unsigned long stack_maxrandom_size(unsigned long task_size) > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c > index cd44ae727df7..a26a1b373fd0 100644 > --- a/arch/x86/mm/mpx.c > +++ b/arch/x86/mm/mpx.c > @@ -355,10 +355,19 @@ int mpx_enable_management(void) > */ > bd_base = mpx_get_bounds_dir(); > down_write(&mm->mmap_sem); > + > + /* MPX doesn't support addresses above 47-bits yet. */ > + if (find_vma(mm, DEFAULT_MAP_WINDOW)) { > + pr_warn_once("%s (%d): MPX cannot handle addresses " > + "above 47-bits. Disabling.", > + current->comm, current->pid); > + ret = -ENXIO; > + goto out; > + } > mm->context.bd_addr = bd_base; > if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) > ret = -ENXIO; > - > +out: > up_write(&mm->mmap_sem); > return ret; > } > @@ -1038,3 +1047,25 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > if (ret) > force_sig(SIGSEGV, current); > } > + > +/* MPX cannot handle addresses above 47-bits yet. */ > +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, > + unsigned long flags) > +{ > + if (!kernel_managing_mpx_tables(current->mm)) > + return addr; > + if (addr + len <= DEFAULT_MAP_WINDOW) > + return addr; > + if (flags & MAP_FIXED) > + return -ENOMEM; > + > + /* > + * Requested len is larger than whole area we're allowed to map in. > + * Resetting hinting address wouldn't do much good -- fail early. > + */ > + if (len > DEFAULT_MAP_WINDOW) > + return -ENOMEM; > + > + /* Look for unmap area within DEFAULT_MAP_WINDOW */ > + return 0; > +} > -- Dmitry From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dmitry Safonov Subject: Re: [PATCH 8/8] x86/mm: Allow to have userspace mappings above 47-bits Date: Thu, 6 Apr 2017 21:43:41 +0300 Message-ID: <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> References: <20170406140106.78087-1-kirill.shutemov@linux.intel.com> <20170406140106.78087-9-kirill.shutemov@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-eopbgr30090.outbound.protection.outlook.com ([40.107.3.90]:45380 "EHLO EUR03-AM5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754410AbdDFSrU (ORCPT ); Thu, 6 Apr 2017 14:47:20 -0400 In-Reply-To: <20170406140106.78087-9-kirill.shutemov@linux.intel.com> Sender: linux-arch-owner@vger.kernel.org List-ID: To: "Kirill A. Shutemov" Cc: Linus Torvalds , Andrew Morton , x86@kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , Andy Lutomirski , linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Hi Kirill, On 04/06/2017 05:01 PM, Kirill A. Shutemov wrote: > On x86, 5-level paging enables 56-bit userspace virtual address space. > Not all user space is ready to handle wide addresses. It's known that > at least some JIT compilers use higher bits in pointers to encode their > information. It collides with valid pointers with 5-level paging and > leads to crashes. > > To mitigate this, we are not going to allocate virtual address space > above 47-bit by default. > > But userspace can ask for allocation from full address space by > specifying hint address (with or without MAP_FIXED) above 47-bits. > > If hint address set above 47-bit, but MAP_FIXED is not specified, we try > to look for unmapped area by specified address. If it's already > occupied, we look for unmapped area in *full* address space, rather than > from 47-bit window. Do you wish after the first over-47-bit mapping the following mmap() calls return also over-47-bits if there is free space? It so, you could simplify all this code by changing only mm->mmap_base on the first over-47-bit mmap() call. This will do simple trick. > > This approach helps to easily make application's memory allocator aware > about large address space without manually tracking allocated virtual > address space. > > One important case we need to handle here is interaction with MPX. > MPX (without MAWA( extension cannot handle addresses above 47-bit, so we > need to make sure that MPX cannot be enabled we already have VMA above > the boundary and forbid creating such VMAs once MPX is enabled. > > Signed-off-by: Kirill A. Shutemov > Cc: Dmitry Safonov > --- > arch/x86/include/asm/elf.h | 2 +- > arch/x86/include/asm/mpx.h | 9 +++++++++ > arch/x86/include/asm/processor.h | 9 ++++++--- > arch/x86/kernel/sys_x86_64.c | 28 +++++++++++++++++++++++++++- > arch/x86/mm/hugetlbpage.c | 27 ++++++++++++++++++++++++--- > arch/x86/mm/mmap.c | 2 +- > arch/x86/mm/mpx.c | 33 ++++++++++++++++++++++++++++++++- > 7 files changed, 100 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h > index d4d3ed456cb7..67260dbe1688 100644 > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -250,7 +250,7 @@ extern int force_personality32; > the loader. We need to make sure that it is out of the way of the program > that it will "exec", and that there is sufficient room for the brk. */ > > -#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) > +#define ELF_ET_DYN_BASE (DEFAULT_MAP_WINDOW / 3 * 2) This will kill 32-bit userspace: As DEFAULT_MAP_WINDOW is defined as what previously was TASK_SIZE_MAX, not TASK_SIZE, for ia32/x32 ELF_ET_DYN_BASE will be over 4Gb. Here is the test: [root@localhost test]# cat hello-world.c #include int main(int argc, char **argv) { printf("Maybe this world is another planet's hell.\n"); return 0; } [root@localhost test]# gcc -m32 hello-world.c -o hello-world [root@localhost test]# ./hello-world [ 35.306726] hello-world[1948]: segfault at ffa5288c ip 00000000f77b5a82 sp 00000000ffa52890 error 6 in ld-2.23.so[f77b5000+23000] Segmentation fault (core dumped) So, dynamic base should differ between 32/64-bits as it was with TASK_SIZE. > > /* This yields a mask that user programs can use to figure out what > instruction set this CPU supports. This could be done in user space, > diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h > index a0d662be4c5b..7d7404756bb4 100644 > --- a/arch/x86/include/asm/mpx.h > +++ b/arch/x86/include/asm/mpx.h > @@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm) > } > void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long start, unsigned long end); > + > +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, > + unsigned long flags); > #else > static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) > { > @@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct mm_struct *mm, > unsigned long start, unsigned long end) > { > } > + > +static inline unsigned long mpx_unmapped_area_check(unsigned long addr, > + unsigned long len, unsigned long flags) > +{ > + return addr; > +} > #endif /* CONFIG_X86_INTEL_MPX */ > > #endif /* _ASM_X86_MPX_H */ > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 3cada998a402..9f437aea7f57 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -795,6 +795,7 @@ static inline void spin_lock_prefetch(const void *x) > #define IA32_PAGE_OFFSET PAGE_OFFSET > #define TASK_SIZE PAGE_OFFSET > #define TASK_SIZE_MAX TASK_SIZE > +#define DEFAULT_MAP_WINDOW TASK_SIZE > #define STACK_TOP TASK_SIZE > #define STACK_TOP_MAX STACK_TOP > > @@ -834,7 +835,9 @@ static inline void spin_lock_prefetch(const void *x) > * particular problem by preventing anything from being mapped > * at the maximum canonical address. > */ > -#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) > +#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) > + > +#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) > > /* This decides where the kernel will search for a free chunk of vm > * space during mmap's. > @@ -847,7 +850,7 @@ static inline void spin_lock_prefetch(const void *x) > #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \ > IA32_PAGE_OFFSET : TASK_SIZE_MAX) > > -#define STACK_TOP TASK_SIZE > +#define STACK_TOP DEFAULT_MAP_WINDOW > #define STACK_TOP_MAX TASK_SIZE_MAX > > #define INIT_THREAD { \ > @@ -870,7 +873,7 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, > * space during mmap's. > */ > #define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3)) > -#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE) > +#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(DEFAULT_MAP_WINDOW) ditto > > #define KSTK_EIP(task) (task_pt_regs(task)->ip) > > diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c > index 207b8f2582c7..593a31e93812 100644 > --- a/arch/x86/kernel/sys_x86_64.c > +++ b/arch/x86/kernel/sys_x86_64.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > /* > * Align a virtual address to avoid aliasing in the I$ on AMD F15h. > @@ -132,6 +133,10 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, > struct vm_unmapped_area_info info; > unsigned long begin, end; > > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > if (flags & MAP_FIXED) > return addr; > > @@ -151,7 +156,16 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, > info.flags = 0; > info.length = len; > info.low_limit = begin; > - info.high_limit = end; > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW) > + info.high_limit = min(end, TASK_SIZE); > + else > + info.high_limit = min(end, DEFAULT_MAP_WINDOW); > + > info.align_mask = 0; > info.align_offset = pgoff << PAGE_SHIFT; > if (filp) { > @@ -171,6 +185,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > unsigned long addr = addr0; > struct vm_unmapped_area_info info; > > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > /* requested length too big for entire address space */ > if (len > TASK_SIZE) > return -ENOMEM; > @@ -195,6 +213,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > info.length = len; > info.low_limit = PAGE_SIZE; > info.high_limit = get_mmap_base(0); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; Hmm, TASK_SIZE depends now on TIF_ADDR32, which is set during exec(). That means for ia32/x32 ELF which has TASK_SIZE < 4Gb as TIF_ADDR32 is set, which can do 64-bit syscalls - the subtraction will be a negative.. > + > info.align_mask = 0; > info.align_offset = pgoff << PAGE_SHIFT; > if (filp) { > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index 302f43fd9c28..9a0b89252c52 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #if 0 /* This is just for testing */ > struct page * > @@ -87,23 +88,38 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file, > info.low_limit = get_mmap_base(1); > info.high_limit = in_compat_syscall() ? > tasksize_32bit() : tasksize_64bit(); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW) > + info.high_limit = TASK_SIZE; > + > info.align_mask = PAGE_MASK & ~huge_page_mask(h); > info.align_offset = 0; > return vm_unmapped_area(&info); > } > > static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, > - unsigned long addr0, unsigned long len, > + unsigned long addr, unsigned long len, > unsigned long pgoff, unsigned long flags) > { > struct hstate *h = hstate_file(file); > struct vm_unmapped_area_info info; > - unsigned long addr; > > info.flags = VM_UNMAPPED_AREA_TOPDOWN; > info.length = len; > info.low_limit = PAGE_SIZE; > info.high_limit = get_mmap_base(0); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; ditto > + > info.align_mask = PAGE_MASK & ~huge_page_mask(h); > info.align_offset = 0; > addr = vm_unmapped_area(&info); > @@ -118,7 +134,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, > VM_BUG_ON(addr != -ENOMEM); > info.flags = 0; > info.low_limit = TASK_UNMAPPED_BASE; > - info.high_limit = TASK_SIZE; > + info.high_limit = DEFAULT_MAP_WINDOW; ditto about 32-bits > addr = vm_unmapped_area(&info); > } > > @@ -135,6 +151,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > > if (len & ~huge_page_mask(h)) > return -EINVAL; > + > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > if (len > TASK_SIZE) > return -ENOMEM; > > diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c > index 19ad095b41df..d63232a31945 100644 > --- a/arch/x86/mm/mmap.c > +++ b/arch/x86/mm/mmap.c > @@ -44,7 +44,7 @@ unsigned long tasksize_32bit(void) > > unsigned long tasksize_64bit(void) > { > - return TASK_SIZE_MAX; > + return DEFAULT_MAP_WINDOW; > } > > static unsigned long stack_maxrandom_size(unsigned long task_size) > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c > index cd44ae727df7..a26a1b373fd0 100644 > --- a/arch/x86/mm/mpx.c > +++ b/arch/x86/mm/mpx.c > @@ -355,10 +355,19 @@ int mpx_enable_management(void) > */ > bd_base = mpx_get_bounds_dir(); > down_write(&mm->mmap_sem); > + > + /* MPX doesn't support addresses above 47-bits yet. */ > + if (find_vma(mm, DEFAULT_MAP_WINDOW)) { > + pr_warn_once("%s (%d): MPX cannot handle addresses " > + "above 47-bits. Disabling.", > + current->comm, current->pid); > + ret = -ENXIO; > + goto out; > + } > mm->context.bd_addr = bd_base; > if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) > ret = -ENXIO; > - > +out: > up_write(&mm->mmap_sem); > return ret; > } > @@ -1038,3 +1047,25 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > if (ret) > force_sig(SIGSEGV, current); > } > + > +/* MPX cannot handle addresses above 47-bits yet. */ > +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, > + unsigned long flags) > +{ > + if (!kernel_managing_mpx_tables(current->mm)) > + return addr; > + if (addr + len <= DEFAULT_MAP_WINDOW) > + return addr; > + if (flags & MAP_FIXED) > + return -ENOMEM; > + > + /* > + * Requested len is larger than whole area we're allowed to map in. > + * Resetting hinting address wouldn't do much good -- fail early. > + */ > + if (len > DEFAULT_MAP_WINDOW) > + return -ENOMEM; > + > + /* Look for unmap area within DEFAULT_MAP_WINDOW */ > + return 0; > +} > -- Dmitry From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f198.google.com (mail-qk0-f198.google.com [209.85.220.198]) by kanga.kvack.org (Postfix) with ESMTP id 984146B0038 for ; Thu, 6 Apr 2017 15:01:46 -0400 (EDT) Received: by mail-qk0-f198.google.com with SMTP id d62so14138132qkf.13 for ; Thu, 06 Apr 2017 12:01:46 -0700 (PDT) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0130.outbound.protection.outlook.com. [104.47.2.130]) by mx.google.com with ESMTPS id m62si2169142qkd.195.2017.04.06.12.01.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 06 Apr 2017 12:01:45 -0700 (PDT) Subject: Re: [PATCH 8/8] x86/mm: Allow to have userspace mappings above 47-bits References: <20170406140106.78087-1-kirill.shutemov@linux.intel.com> <20170406140106.78087-9-kirill.shutemov@linux.intel.com> From: Dmitry Safonov Message-ID: <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> Date: Thu, 6 Apr 2017 21:43:41 +0300 MIME-Version: 1.0 In-Reply-To: <20170406140106.78087-9-kirill.shutemov@linux.intel.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: Linus Torvalds , Andrew Morton , x86@kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , Andy Lutomirski , linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Hi Kirill, On 04/06/2017 05:01 PM, Kirill A. Shutemov wrote: > On x86, 5-level paging enables 56-bit userspace virtual address space. > Not all user space is ready to handle wide addresses. It's known that > at least some JIT compilers use higher bits in pointers to encode their > information. It collides with valid pointers with 5-level paging and > leads to crashes. > > To mitigate this, we are not going to allocate virtual address space > above 47-bit by default. > > But userspace can ask for allocation from full address space by > specifying hint address (with or without MAP_FIXED) above 47-bits. > > If hint address set above 47-bit, but MAP_FIXED is not specified, we try > to look for unmapped area by specified address. If it's already > occupied, we look for unmapped area in *full* address space, rather than > from 47-bit window. Do you wish after the first over-47-bit mapping the following mmap() calls return also over-47-bits if there is free space? It so, you could simplify all this code by changing only mm->mmap_base on the first over-47-bit mmap() call. This will do simple trick. > > This approach helps to easily make application's memory allocator aware > about large address space without manually tracking allocated virtual > address space. > > One important case we need to handle here is interaction with MPX. > MPX (without MAWA( extension cannot handle addresses above 47-bit, so we > need to make sure that MPX cannot be enabled we already have VMA above > the boundary and forbid creating such VMAs once MPX is enabled. > > Signed-off-by: Kirill A. Shutemov > Cc: Dmitry Safonov > --- > arch/x86/include/asm/elf.h | 2 +- > arch/x86/include/asm/mpx.h | 9 +++++++++ > arch/x86/include/asm/processor.h | 9 ++++++--- > arch/x86/kernel/sys_x86_64.c | 28 +++++++++++++++++++++++++++- > arch/x86/mm/hugetlbpage.c | 27 ++++++++++++++++++++++++--- > arch/x86/mm/mmap.c | 2 +- > arch/x86/mm/mpx.c | 33 ++++++++++++++++++++++++++++++++- > 7 files changed, 100 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h > index d4d3ed456cb7..67260dbe1688 100644 > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -250,7 +250,7 @@ extern int force_personality32; > the loader. We need to make sure that it is out of the way of the program > that it will "exec", and that there is sufficient room for the brk. */ > > -#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) > +#define ELF_ET_DYN_BASE (DEFAULT_MAP_WINDOW / 3 * 2) This will kill 32-bit userspace: As DEFAULT_MAP_WINDOW is defined as what previously was TASK_SIZE_MAX, not TASK_SIZE, for ia32/x32 ELF_ET_DYN_BASE will be over 4Gb. Here is the test: [root@localhost test]# cat hello-world.c #include int main(int argc, char **argv) { printf("Maybe this world is another planet's hell.\n"); return 0; } [root@localhost test]# gcc -m32 hello-world.c -o hello-world [root@localhost test]# ./hello-world [ 35.306726] hello-world[1948]: segfault at ffa5288c ip 00000000f77b5a82 sp 00000000ffa52890 error 6 in ld-2.23.so[f77b5000+23000] Segmentation fault (core dumped) So, dynamic base should differ between 32/64-bits as it was with TASK_SIZE. > > /* This yields a mask that user programs can use to figure out what > instruction set this CPU supports. This could be done in user space, > diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h > index a0d662be4c5b..7d7404756bb4 100644 > --- a/arch/x86/include/asm/mpx.h > +++ b/arch/x86/include/asm/mpx.h > @@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm) > } > void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long start, unsigned long end); > + > +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, > + unsigned long flags); > #else > static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) > { > @@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct mm_struct *mm, > unsigned long start, unsigned long end) > { > } > + > +static inline unsigned long mpx_unmapped_area_check(unsigned long addr, > + unsigned long len, unsigned long flags) > +{ > + return addr; > +} > #endif /* CONFIG_X86_INTEL_MPX */ > > #endif /* _ASM_X86_MPX_H */ > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 3cada998a402..9f437aea7f57 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -795,6 +795,7 @@ static inline void spin_lock_prefetch(const void *x) > #define IA32_PAGE_OFFSET PAGE_OFFSET > #define TASK_SIZE PAGE_OFFSET > #define TASK_SIZE_MAX TASK_SIZE > +#define DEFAULT_MAP_WINDOW TASK_SIZE > #define STACK_TOP TASK_SIZE > #define STACK_TOP_MAX STACK_TOP > > @@ -834,7 +835,9 @@ static inline void spin_lock_prefetch(const void *x) > * particular problem by preventing anything from being mapped > * at the maximum canonical address. > */ > -#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) > +#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) > + > +#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) > > /* This decides where the kernel will search for a free chunk of vm > * space during mmap's. > @@ -847,7 +850,7 @@ static inline void spin_lock_prefetch(const void *x) > #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \ > IA32_PAGE_OFFSET : TASK_SIZE_MAX) > > -#define STACK_TOP TASK_SIZE > +#define STACK_TOP DEFAULT_MAP_WINDOW > #define STACK_TOP_MAX TASK_SIZE_MAX > > #define INIT_THREAD { \ > @@ -870,7 +873,7 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, > * space during mmap's. > */ > #define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3)) > -#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE) > +#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(DEFAULT_MAP_WINDOW) ditto > > #define KSTK_EIP(task) (task_pt_regs(task)->ip) > > diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c > index 207b8f2582c7..593a31e93812 100644 > --- a/arch/x86/kernel/sys_x86_64.c > +++ b/arch/x86/kernel/sys_x86_64.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > /* > * Align a virtual address to avoid aliasing in the I$ on AMD F15h. > @@ -132,6 +133,10 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, > struct vm_unmapped_area_info info; > unsigned long begin, end; > > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > if (flags & MAP_FIXED) > return addr; > > @@ -151,7 +156,16 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, > info.flags = 0; > info.length = len; > info.low_limit = begin; > - info.high_limit = end; > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW) > + info.high_limit = min(end, TASK_SIZE); > + else > + info.high_limit = min(end, DEFAULT_MAP_WINDOW); > + > info.align_mask = 0; > info.align_offset = pgoff << PAGE_SHIFT; > if (filp) { > @@ -171,6 +185,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > unsigned long addr = addr0; > struct vm_unmapped_area_info info; > > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > /* requested length too big for entire address space */ > if (len > TASK_SIZE) > return -ENOMEM; > @@ -195,6 +213,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > info.length = len; > info.low_limit = PAGE_SIZE; > info.high_limit = get_mmap_base(0); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; Hmm, TASK_SIZE depends now on TIF_ADDR32, which is set during exec(). That means for ia32/x32 ELF which has TASK_SIZE < 4Gb as TIF_ADDR32 is set, which can do 64-bit syscalls - the subtraction will be a negative.. > + > info.align_mask = 0; > info.align_offset = pgoff << PAGE_SHIFT; > if (filp) { > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index 302f43fd9c28..9a0b89252c52 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #if 0 /* This is just for testing */ > struct page * > @@ -87,23 +88,38 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file, > info.low_limit = get_mmap_base(1); > info.high_limit = in_compat_syscall() ? > tasksize_32bit() : tasksize_64bit(); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW) > + info.high_limit = TASK_SIZE; > + > info.align_mask = PAGE_MASK & ~huge_page_mask(h); > info.align_offset = 0; > return vm_unmapped_area(&info); > } > > static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, > - unsigned long addr0, unsigned long len, > + unsigned long addr, unsigned long len, > unsigned long pgoff, unsigned long flags) > { > struct hstate *h = hstate_file(file); > struct vm_unmapped_area_info info; > - unsigned long addr; > > info.flags = VM_UNMAPPED_AREA_TOPDOWN; > info.length = len; > info.low_limit = PAGE_SIZE; > info.high_limit = get_mmap_base(0); > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; ditto > + > info.align_mask = PAGE_MASK & ~huge_page_mask(h); > info.align_offset = 0; > addr = vm_unmapped_area(&info); > @@ -118,7 +134,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, > VM_BUG_ON(addr != -ENOMEM); > info.flags = 0; > info.low_limit = TASK_UNMAPPED_BASE; > - info.high_limit = TASK_SIZE; > + info.high_limit = DEFAULT_MAP_WINDOW; ditto about 32-bits > addr = vm_unmapped_area(&info); > } > > @@ -135,6 +151,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > > if (len & ~huge_page_mask(h)) > return -EINVAL; > + > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > if (len > TASK_SIZE) > return -ENOMEM; > > diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c > index 19ad095b41df..d63232a31945 100644 > --- a/arch/x86/mm/mmap.c > +++ b/arch/x86/mm/mmap.c > @@ -44,7 +44,7 @@ unsigned long tasksize_32bit(void) > > unsigned long tasksize_64bit(void) > { > - return TASK_SIZE_MAX; > + return DEFAULT_MAP_WINDOW; > } > > static unsigned long stack_maxrandom_size(unsigned long task_size) > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c > index cd44ae727df7..a26a1b373fd0 100644 > --- a/arch/x86/mm/mpx.c > +++ b/arch/x86/mm/mpx.c > @@ -355,10 +355,19 @@ int mpx_enable_management(void) > */ > bd_base = mpx_get_bounds_dir(); > down_write(&mm->mmap_sem); > + > + /* MPX doesn't support addresses above 47-bits yet. */ > + if (find_vma(mm, DEFAULT_MAP_WINDOW)) { > + pr_warn_once("%s (%d): MPX cannot handle addresses " > + "above 47-bits. Disabling.", > + current->comm, current->pid); > + ret = -ENXIO; > + goto out; > + } > mm->context.bd_addr = bd_base; > if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) > ret = -ENXIO; > - > +out: > up_write(&mm->mmap_sem); > return ret; > } > @@ -1038,3 +1047,25 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > if (ret) > force_sig(SIGSEGV, current); > } > + > +/* MPX cannot handle addresses above 47-bits yet. */ > +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, > + unsigned long flags) > +{ > + if (!kernel_managing_mpx_tables(current->mm)) > + return addr; > + if (addr + len <= DEFAULT_MAP_WINDOW) > + return addr; > + if (flags & MAP_FIXED) > + return -ENOMEM; > + > + /* > + * Requested len is larger than whole area we're allowed to map in. > + * Resetting hinting address wouldn't do much good -- fail early. > + */ > + if (len > DEFAULT_MAP_WINDOW) > + return -ENOMEM; > + > + /* Look for unmap area within DEFAULT_MAP_WINDOW */ > + return 0; > +} > -- Dmitry -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org