From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756188AbdDFTTf (ORCPT ); Thu, 6 Apr 2017 15:19:35 -0400 Received: from mail-eopbgr20129.outbound.protection.outlook.com ([40.107.2.129]:40281 "EHLO EUR02-VE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755512AbdDFTT1 (ORCPT ); Thu, 6 Apr 2017 15:19:27 -0400 Authentication-Results: virtuozzo.com; dkim=none (message not signed) header.d=none;virtuozzo.com; dmarc=none action=none header.from=virtuozzo.com; Subject: Re: [PATCH 8/8] x86/mm: Allow to have userspace mappings above 47-bits To: "Kirill A. Shutemov" References: <20170406140106.78087-1-kirill.shutemov@linux.intel.com> <20170406140106.78087-9-kirill.shutemov@linux.intel.com> <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> CC: Linus Torvalds , Andrew Morton , , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , Andy Lutomirski , , , From: Dmitry Safonov Message-ID: Date: Thu, 6 Apr 2017 22:15:47 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: AM4P190CA0024.EURP190.PROD.OUTLOOK.COM (10.172.213.162) To DB6PR0801MB1736.eurprd08.prod.outlook.com (10.169.227.7) X-MS-Office365-Filtering-Correlation-Id: 3476cb4a-ffb7-476c-013e-08d47d21d51f X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:DB6PR0801MB1736; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1736;3:mm5blkLv2xTAaD3HXgUYbCmTMMhp2SNnZPfHsV27tZz1d50bQuYpR0eeEyJXZ8286AazQ7RIXn8SMS7gAv6fo5sEr+2UnCC1lNhmdBWjKpyAba0oDDsg2q4x/uFefe8A+HSGSC9B6I7vbr5mv0jj9ZYnGsnJxGnkrN700S+MBrwTnUsU3JT1dGMVv7SGm8YMCQpVbRJa6TofLXTdX2oXTICyMSiNRPhAsdo6rQWmqhf6lfRdAKt1PwsCW4b8R02kfsLnZy567v++hi8dlLB7ORjiKNJL4u4XlBdKT4Sd6sDR83duv+tZ8S3tqhmJxUdWcqm8PElxlQaH84obbxrK0A==;25:IfGS4bgbL3SQjrAc1s4mJyTspjfJpfb2jGhzMrr7OICqufaV3URSK2rJsHfazh9Zm5gKNRUSOj8mXTdm6sx4RFrOQiU1kGe9yRYeenxaZiiu9wjfV3QbeAeX89Quh1miLjSl0EUhkF1tKUzP5Gzwb16CMQ2SjIKvKH+KOAcLYEnYa5ezNBSErAmg7TKfxM5BF3opGW+/DndQcre5edzROlow3HSH7ruIcp1Lt2l95zwRciF/2lu2PSMOvf5P7nG5IU5Y+khe+ip2ZjTQ6Ld6FZ4Qk1BKzVhwdOQewXZD083+uuGBy8W34Sij37Hqf1zEPj1DEtjRuZFWg1wdHnragolf1mkQgu2gfSEyECbO/1E9SxD/kltl4EviF+e+p/wRXmlecFCtSLhs7ZZVIpjdf47qAC0fn1SqnGufT99Mz1UpiJH/zvAchlT5Koz9ghkiIKi+U70zDmofWKs6mRX4Rw== X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1736;31:5PxuEpYMy2m6nU01pe+LPqek0U3oifjQPZK8D6VFAbe1wzsGvXPh/rB39kbJkFQhRwMwbfQEKmwWxPVpPcDQFLrAZqn0O+7RBJAUYY+zbTVN71QiA+cvdnpEBfj6Nwb9WPTqcWShwaBMuLCUFg775XVExP/9Thv+qYG4xful4Wi1esDTiHjTeE5Qgl2mWnYcK75EEtnFcyYUUQXNRFg0oNz4ZQU8hWWWDromJ6OnWHw=;20:eS4SBbu8g28VpReGts6sEWYeQSdSZskW+9J6di3hWnHiHvH496BH8e26e0EctnSdfDHXvDs/TCIiQONKuAXc4K4qaEj/y1RDCyOeCFOFWnRxBCkriXHM/nilWGTGj0kjHnBPcky6cMh8wTzk9rS4pbyouk7gJSO1QeYE0b/YVdZQtG6ZngCQ3FbxjBPMVMdgpEj5LHhsns4Hz6qpU3ZABYqG46hZtOv8kE61b/BeeBH0FGDuOeNNWolgBZxuWAJRhk7YRMJNS1iPdzWlhlKU2l6ftXAt6XoCq7PDR+dgtLdxClsOLyasgOeAY1rJEyTw9yiM4eDqmkq4F1R5YRz2S90BoxNMd387ugcIIFVxGaUeqdEI9atlvUFioMvveiw6wPrHN7LI6LbwtsDfNlDs+Aqb5Qamw6QBL1UttMSdl5g= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(265634631926514)(228905959029699); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(93006095)(93001095)(6041248)(20161123562025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(20161123560025)(20161123555025)(6072148);SRVR:DB6PR0801MB1736;BCL:0;PCL:0;RULEID:;SRVR:DB6PR0801MB1736; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1736;4:UtsnQWa844B1HW7yhBFssVWIu77QwH8oXVCN4g+xPckIC2rUiM+xZw9bznitZxIHvsd0fi3bxw69CeViIZnbQPVCHvmFns4NpkPfGiO/Y3tOvR9hf9JdqMbT7V7U27vVl4Sfv/AN/kZDH2wlJTwbmT0JpRUV6pg3k+TleEdaCWiT+2mOsJQrjqFmfnxVf/im3XtIcmgsS0nc6IyYFg9RkgR5SUChMjeaXzjhsoYDArLhpfcXJKn5RAl2PV0hx3ep15X4t/v8MKCBAc2iLyepTj+vxjAz2UEMYIxitx2yb246BZewtd2MlZcuR+FwYs0mgxDMQge5Zqq25Ic6JjXYXCWDdBWxKUyBRvhqToIp6J9d432sfoV2qTDY9XpJO+4Jhs4+rjlJluT9vSljiFXM4gezkTUwmSzjhgHWD920y0pNdLFCmSC9e2ZHZNAi02ctDOKGiM/4Kyt7ZujAQ85ylgIyr/9tfmIYxqr6SXIQg4kEl6/BPqBa8i8IWw6mS1dxN+B59tV5dcxqIvG5/XZ7Eu5T+eAVVlMuiypmO+vLyIM+k1hJV96UheB2RoqV3KBIpsxUpL+3TKkOE1EP6FXLbXl26OdrNFFwiwbhUaOwiHqNBl8Y3EnI9lni0/YbJo18rQd3wNxQH8QBFURX/Qvr6DESm8gMqNLXtCCVPt+rSP6Z7lgFRoCljzSXAR3aToBvhUMyD/Xy2xnwJGMpRK9OJXwT7M1MNXxjncE3x6RBNEfhWPklekoQ4KnYWzqZQVd0Cdkn8bSLjrcSX2uclpkxorDePSujOSUhBhO86dd+S2k= X-Forefront-PRVS: 02698DF457 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(39400400002)(39830400002)(39410400002)(39450400003)(377454003)(24454002)(2906002)(5660300001)(7416002)(83506001)(54906002)(2950100002)(54356999)(23746002)(3846002)(6116002)(6666003)(189998001)(305945005)(6916009)(76176999)(230700001)(50986999)(53546009)(8676002)(38730400002)(229853002)(31686004)(7736002)(36756003)(42186005)(81166006)(4001350100001)(33646002)(66066001)(25786009)(53936002)(65956001)(4326008)(86362001)(77096006)(575784001)(90366009)(64126003)(110136004)(50466002)(31696002)(47776003)(6486002)(573474001);DIR:OUT;SFP:1102;SCL:1;SRVR:DB6PR0801MB1736;H:[172.16.25.13];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;DB6PR0801MB1736;23:xt0FiYlXm0myOUL75X6gz1i7LPzf6a6bnuj?= =?Windows-1252?Q?Gk4vVUip3zsFrtWOE6bEB96lR65nxKe16VZAvlR2U8+YsLu+nhjbCHxZ?= =?Windows-1252?Q?Cn88hSweqn8EijfDCjk914qpVl46YmZ9pwpROtZKC43MqNasQbZq/hp0?= =?Windows-1252?Q?gJ72/pAO3/iTd9uXcZqg2Q1YBXydNtMgeMX9FTLThXJAPnGe3hylaRIT?= =?Windows-1252?Q?I/PTiWGfiWM35m2K2WYqCnTH/69nKWqgQyLamDihRZXd19+uCr6KIDAC?= =?Windows-1252?Q?URNwXXOOzbY7N/XzPZqxSkEXvL/27ica8x7/L3AC8r7+9X9odRu3rH3e?= =?Windows-1252?Q?6GC1mK+z3S4owux5BXdFJouD9b0Mh2UYgDpVcNrDrjY9uDJbCQ0cXdZ5?= =?Windows-1252?Q?XxOBpgd9eB7xL/fvNYxytHtbdY+L5FFttFf1U4MeedYYHGKyhyr9+fur?= =?Windows-1252?Q?3BudaWkdIOCIHogUwreu/zQIH4CFmc5QsQhLkn4ngoY6KvgNyRBRLE65?= =?Windows-1252?Q?JpY1cJYVG55awWcXXtaENIcU2hhXNZg7ZkL9uO7LYF1x0dIoyrV4ughH?= =?Windows-1252?Q?if+HaNNYCbT5dkGRvD9M1hO6Xjg5RVs73BKNIbYHpabnUnRL2PmJEpI2?= =?Windows-1252?Q?Qa9MrQjo4If4woTnIjpxrJx2KTiI5hnwIF7WUexAmtJpLyu8FQegmlwA?= =?Windows-1252?Q?uAsE2srGVfveG2niXHCgLp8Hf/u3KsddTHIyKJggNHrY+EL2BOZgfVI7?= =?Windows-1252?Q?0RzortQnSjw7aH1qLlP27+fdvTVUXYFS3G5FPSWPP4QrucMYH/CuMC1H?= =?Windows-1252?Q?+H0YfKkkybwSrMygArkTbpZG01dDwoolz3oNE1+ASfcP1A0C6/zhE+Bb?= =?Windows-1252?Q?GdG1ROHvLMXRi2IUsvv8WzCIliPcK04HCpw/vn4KoUuGPRtGv4m3QDs8?= =?Windows-1252?Q?YUByweyJmQO+xlB1tVGohRMkuw2YtWlahoSEYO0EkioYgbcDzFfSd8z6?= =?Windows-1252?Q?f3dEZYycWiTaB07u1YunrWGrCtEnvVxOjXRTd9XRs8yIBumUXICn+UcL?= =?Windows-1252?Q?VCZiPwEEgc6DSlEqC6/iCFRsS1JB/O5Siad8huLJJMvZuchjPDrXctVu?= =?Windows-1252?Q?0gBffOlpqwh9L4eyAp8/SR6qwOvFPagmYXF8oeJqnwUpmr3WAYwJfvxr?= =?Windows-1252?Q?ulET+jiKCn2jGmWg/0RCENdHjVxS1dGgRhHXFJXFUmwg1FGawkUvlg5i?= =?Windows-1252?Q?ZuLCqwHRE5ypSsFZ2rwE/b5U4c/R3fF0vn4OlagWzL3RkMa7eZgLQ6Ke?= =?Windows-1252?Q?gmgaxoLZjsOobKnl8V5ZSs/y76XLVmpMAUNbkX5PkE5puwU42O7MYab0?= =?Windows-1252?Q?jl/TSCVdN5oeAdF5kcqTHrMGTf+VQtXfBDw3y5l2WpKwSTZ/dQ5fvn5s?= =?Windows-1252?Q?=3D?= X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1736;6:w0dsYCDwcjby7pfAJmcYCL75W6l6t11jSEnOfKLuHnKyCk6PPAiQut2WKgXTq4o24wFiMB4dtxW2beBakN4tJemGGGoI8H6xp4xJwLQxL26MF2D2uKFjnRMTllXDV2b4qb7m8aOlSaBoBSIB6jWBEaCbEu6r2PS8XSVZKR4GjEemdNBNU5rM4ZK9aKftGR/+T4dsA+cD9Qjday23IpyL2+NPAOtq1sB6VxRyRnzOyrJ8JtNdJq7hUZnvxOfAbRN0v7l6p7k7qqXNIR57qXreluUD05Fzs+Rq9I7kTLkNcpOfTrJUZq7CRrk7exhO3SpN01xgZ6RP88MgmSrSZeuP/hdDgSVOxBXEPaQXfGZPffL+RX1IS6OikXreH8ag1QjyMNa8v1qRVkUs3sGCkEDnbQ==;5:67jch3tdTZJiuRc0yP9D3Dcujgy0UPeirdGCkqR2B7KLPENP6GydMsOzVj2zhtyH03Yil/bru2VsQPVH5W+PN+WGHLNcSpN21eyACTDlHSXbEdH4k+SdDfx4dkOUifBDmAqisGpSQffJRFWLuxfTDw==;24:ITd/yNlo0YqmAGiiVr3sDKbToHvR8RoalJEDgoAlcp1CGlyrFlYc26sdoBtsc9I/sxvGBmXBO739uAFai9/qOexHp/JdWdD7khxq1hUnN+c= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1736;7:R/CRPIvXBrEk0jFieSio+dXVchTY8pX4E+u+GR9acvWOHiLXzA+HnRIGixuUv1DEOdScNbCRUn0+tq8st6GGgkTDlGz1hmdbwBJ6rZ+X8VJVN4EY1S9w2aigf1baLkNR44zmmrRVbCdZTa8Mj5gcAJwgzr3PpEkq1D6nsTCVCy5wxPXjYdOGGb9QkfPepXaKseOWp7wwIMymVJ3trRa9fchZdllsLLAI3rF7BbtDezqmYC/te99ttdp5XwfGmhAPmK47AiD3gJJVB0pxc/iy1ZuugfUluPh1POn/N0dDDtJSrsXJ0lpZqJlPFT8DzBizICgFmRJtXV5ZryA5gAJm2Q==;20:91azIYM7qvVmX/VCnF1GiHkFLFLzpudBN9UmnP3Dh3nGgwR/QYOY+iw8s+zrvff/zFoDH62RznN15AyZ5hk3UkTv2sS/4X6jhWMdJDMfig1V1eRfGy34JNRJrJ74tlniYmSmK3T3y5kXaPHmHeCCl1rCCQ3YbwnKWwcSSkNnNyA= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2017 19:19:21.8585 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1736 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/06/2017 09:43 PM, Dmitry Safonov wrote: > Hi Kirill, > > On 04/06/2017 05:01 PM, Kirill A. Shutemov wrote: >> On x86, 5-level paging enables 56-bit userspace virtual address space. >> Not all user space is ready to handle wide addresses. It's known that >> at least some JIT compilers use higher bits in pointers to encode their >> information. It collides with valid pointers with 5-level paging and >> leads to crashes. >> >> To mitigate this, we are not going to allocate virtual address space >> above 47-bit by default. >> >> But userspace can ask for allocation from full address space by >> specifying hint address (with or without MAP_FIXED) above 47-bits. >> >> If hint address set above 47-bit, but MAP_FIXED is not specified, we try >> to look for unmapped area by specified address. If it's already >> occupied, we look for unmapped area in *full* address space, rather than >> from 47-bit window. > > Do you wish after the first over-47-bit mapping the following mmap() > calls return also over-47-bits if there is free space? > It so, you could simplify all this code by changing only mm->mmap_base > on the first over-47-bit mmap() call. > This will do simple trick. > >> >> This approach helps to easily make application's memory allocator aware >> about large address space without manually tracking allocated virtual >> address space. >> >> One important case we need to handle here is interaction with MPX. >> MPX (without MAWA( extension cannot handle addresses above 47-bit, so we >> need to make sure that MPX cannot be enabled we already have VMA above >> the boundary and forbid creating such VMAs once MPX is enabled. >> >> Signed-off-by: Kirill A. Shutemov >> Cc: Dmitry Safonov >> --- >> arch/x86/include/asm/elf.h | 2 +- >> arch/x86/include/asm/mpx.h | 9 +++++++++ >> arch/x86/include/asm/processor.h | 9 ++++++--- >> arch/x86/kernel/sys_x86_64.c | 28 +++++++++++++++++++++++++++- >> arch/x86/mm/hugetlbpage.c | 27 ++++++++++++++++++++++++--- >> arch/x86/mm/mmap.c | 2 +- >> arch/x86/mm/mpx.c | 33 ++++++++++++++++++++++++++++++++- >> 7 files changed, 100 insertions(+), 10 deletions(-) >> >> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h >> index d4d3ed456cb7..67260dbe1688 100644 >> --- a/arch/x86/include/asm/elf.h >> +++ b/arch/x86/include/asm/elf.h >> @@ -250,7 +250,7 @@ extern int force_personality32; >> the loader. We need to make sure that it is out of the way of the >> program >> that it will "exec", and that there is sufficient room for the >> brk. */ >> >> -#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) >> +#define ELF_ET_DYN_BASE (DEFAULT_MAP_WINDOW / 3 * 2) > > This will kill 32-bit userspace: > As DEFAULT_MAP_WINDOW is defined as what previously was TASK_SIZE_MAX, > not TASK_SIZE, for ia32/x32 ELF_ET_DYN_BASE will be over 4Gb. > > Here is the test: > [root@localhost test]# cat hello-world.c > #include > > int main(int argc, char **argv) > { > printf("Maybe this world is another planet's hell.\n"); > return 0; > } > [root@localhost test]# gcc -m32 hello-world.c -o hello-world > [root@localhost test]# ./hello-world > [ 35.306726] hello-world[1948]: segfault at ffa5288c ip > 00000000f77b5a82 sp 00000000ffa52890 error 6 in ld-2.23.so[f77b5000+23000] > Segmentation fault (core dumped) > > So, dynamic base should differ between 32/64-bits as it was with TASK_SIZE. I just tried to define it like this: -#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) +#define DEFAULT_MAP_WINDOW (test_thread_flag(TIF_ADDR32) ? \ + IA32_PAGE_OFFSET : ((1UL << 47) - PAGE_SIZE)) And it looks working better. > > >> >> /* This yields a mask that user programs can use to figure out what >> instruction set this CPU supports. This could be done in user space, >> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h >> index a0d662be4c5b..7d7404756bb4 100644 >> --- a/arch/x86/include/asm/mpx.h >> +++ b/arch/x86/include/asm/mpx.h >> @@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm) >> } >> void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, >> unsigned long start, unsigned long end); >> + >> +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned >> long len, >> + unsigned long flags); >> #else >> static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) >> { >> @@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct >> mm_struct *mm, >> unsigned long start, unsigned long end) >> { >> } >> + >> +static inline unsigned long mpx_unmapped_area_check(unsigned long addr, >> + unsigned long len, unsigned long flags) >> +{ >> + return addr; >> +} >> #endif /* CONFIG_X86_INTEL_MPX */ >> >> #endif /* _ASM_X86_MPX_H */ >> diff --git a/arch/x86/include/asm/processor.h >> b/arch/x86/include/asm/processor.h >> index 3cada998a402..9f437aea7f57 100644 >> --- a/arch/x86/include/asm/processor.h >> +++ b/arch/x86/include/asm/processor.h >> @@ -795,6 +795,7 @@ static inline void spin_lock_prefetch(const void *x) >> #define IA32_PAGE_OFFSET PAGE_OFFSET >> #define TASK_SIZE PAGE_OFFSET >> #define TASK_SIZE_MAX TASK_SIZE >> +#define DEFAULT_MAP_WINDOW TASK_SIZE >> #define STACK_TOP TASK_SIZE >> #define STACK_TOP_MAX STACK_TOP >> >> @@ -834,7 +835,9 @@ static inline void spin_lock_prefetch(const void *x) >> * particular problem by preventing anything from being mapped >> * at the maximum canonical address. >> */ >> -#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) >> +#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) >> + >> +#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) >> >> /* This decides where the kernel will search for a free chunk of vm >> * space during mmap's. >> @@ -847,7 +850,7 @@ static inline void spin_lock_prefetch(const void *x) >> #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, >> TIF_ADDR32)) ? \ >> IA32_PAGE_OFFSET : TASK_SIZE_MAX) >> >> -#define STACK_TOP TASK_SIZE >> +#define STACK_TOP DEFAULT_MAP_WINDOW >> #define STACK_TOP_MAX TASK_SIZE_MAX >> >> #define INIT_THREAD { \ >> @@ -870,7 +873,7 @@ extern void start_thread(struct pt_regs *regs, >> unsigned long new_ip, >> * space during mmap's. >> */ >> #define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3)) >> -#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE) >> +#define TASK_UNMAPPED_BASE >> __TASK_UNMAPPED_BASE(DEFAULT_MAP_WINDOW) > > ditto > >> >> #define KSTK_EIP(task) (task_pt_regs(task)->ip) >> >> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c >> index 207b8f2582c7..593a31e93812 100644 >> --- a/arch/x86/kernel/sys_x86_64.c >> +++ b/arch/x86/kernel/sys_x86_64.c >> @@ -21,6 +21,7 @@ >> #include >> #include >> #include >> +#include >> >> /* >> * Align a virtual address to avoid aliasing in the I$ on AMD F15h. >> @@ -132,6 +133,10 @@ arch_get_unmapped_area(struct file *filp, >> unsigned long addr, >> struct vm_unmapped_area_info info; >> unsigned long begin, end; >> >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> if (flags & MAP_FIXED) >> return addr; >> >> @@ -151,7 +156,16 @@ arch_get_unmapped_area(struct file *filp, >> unsigned long addr, >> info.flags = 0; >> info.length = len; >> info.low_limit = begin; >> - info.high_limit = end; >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW) >> + info.high_limit = min(end, TASK_SIZE); >> + else >> + info.high_limit = min(end, DEFAULT_MAP_WINDOW); >> + >> info.align_mask = 0; >> info.align_offset = pgoff << PAGE_SHIFT; >> if (filp) { >> @@ -171,6 +185,10 @@ arch_get_unmapped_area_topdown(struct file *filp, >> const unsigned long addr0, >> unsigned long addr = addr0; >> struct vm_unmapped_area_info info; >> >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> /* requested length too big for entire address space */ >> if (len > TASK_SIZE) >> return -ENOMEM; >> @@ -195,6 +213,14 @@ arch_get_unmapped_area_topdown(struct file *filp, >> const unsigned long addr0, >> info.length = len; >> info.low_limit = PAGE_SIZE; >> info.high_limit = get_mmap_base(0); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) >> + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > Hmm, TASK_SIZE depends now on TIF_ADDR32, which is set during exec(). > That means for ia32/x32 ELF which has TASK_SIZE < 4Gb as TIF_ADDR32 > is set, which can do 64-bit syscalls - the subtraction will be > a negative.. > > >> + >> info.align_mask = 0; >> info.align_offset = pgoff << PAGE_SHIFT; >> if (filp) { >> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >> index 302f43fd9c28..9a0b89252c52 100644 >> --- a/arch/x86/mm/hugetlbpage.c >> +++ b/arch/x86/mm/hugetlbpage.c >> @@ -18,6 +18,7 @@ >> #include >> #include >> #include >> +#include >> >> #if 0 /* This is just for testing */ >> struct page * >> @@ -87,23 +88,38 @@ static unsigned long >> hugetlb_get_unmapped_area_bottomup(struct file *file, >> info.low_limit = get_mmap_base(1); >> info.high_limit = in_compat_syscall() ? >> tasksize_32bit() : tasksize_64bit(); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW) >> + info.high_limit = TASK_SIZE; >> + >> info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> info.align_offset = 0; >> return vm_unmapped_area(&info); >> } >> >> static unsigned long hugetlb_get_unmapped_area_topdown(struct file >> *file, >> - unsigned long addr0, unsigned long len, >> + unsigned long addr, unsigned long len, >> unsigned long pgoff, unsigned long flags) >> { >> struct hstate *h = hstate_file(file); >> struct vm_unmapped_area_info info; >> - unsigned long addr; >> >> info.flags = VM_UNMAPPED_AREA_TOPDOWN; >> info.length = len; >> info.low_limit = PAGE_SIZE; >> info.high_limit = get_mmap_base(0); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) >> + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > ditto > >> + >> info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> info.align_offset = 0; >> addr = vm_unmapped_area(&info); >> @@ -118,7 +134,7 @@ static unsigned long >> hugetlb_get_unmapped_area_topdown(struct file *file, >> VM_BUG_ON(addr != -ENOMEM); >> info.flags = 0; >> info.low_limit = TASK_UNMAPPED_BASE; >> - info.high_limit = TASK_SIZE; >> + info.high_limit = DEFAULT_MAP_WINDOW; > > ditto about 32-bits > >> addr = vm_unmapped_area(&info); >> } >> >> @@ -135,6 +151,11 @@ hugetlb_get_unmapped_area(struct file *file, >> unsigned long addr, >> >> if (len & ~huge_page_mask(h)) >> return -EINVAL; >> + >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> if (len > TASK_SIZE) >> return -ENOMEM; >> >> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c >> index 19ad095b41df..d63232a31945 100644 >> --- a/arch/x86/mm/mmap.c >> +++ b/arch/x86/mm/mmap.c >> @@ -44,7 +44,7 @@ unsigned long tasksize_32bit(void) >> >> unsigned long tasksize_64bit(void) >> { >> - return TASK_SIZE_MAX; >> + return DEFAULT_MAP_WINDOW; >> } >> >> static unsigned long stack_maxrandom_size(unsigned long task_size) >> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c >> index cd44ae727df7..a26a1b373fd0 100644 >> --- a/arch/x86/mm/mpx.c >> +++ b/arch/x86/mm/mpx.c >> @@ -355,10 +355,19 @@ int mpx_enable_management(void) >> */ >> bd_base = mpx_get_bounds_dir(); >> down_write(&mm->mmap_sem); >> + >> + /* MPX doesn't support addresses above 47-bits yet. */ >> + if (find_vma(mm, DEFAULT_MAP_WINDOW)) { >> + pr_warn_once("%s (%d): MPX cannot handle addresses " >> + "above 47-bits. Disabling.", >> + current->comm, current->pid); >> + ret = -ENXIO; >> + goto out; >> + } >> mm->context.bd_addr = bd_base; >> if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) >> ret = -ENXIO; >> - >> +out: >> up_write(&mm->mmap_sem); >> return ret; >> } >> @@ -1038,3 +1047,25 @@ void mpx_notify_unmap(struct mm_struct *mm, >> struct vm_area_struct *vma, >> if (ret) >> force_sig(SIGSEGV, current); >> } >> + >> +/* MPX cannot handle addresses above 47-bits yet. */ >> +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned >> long len, >> + unsigned long flags) >> +{ >> + if (!kernel_managing_mpx_tables(current->mm)) >> + return addr; >> + if (addr + len <= DEFAULT_MAP_WINDOW) >> + return addr; >> + if (flags & MAP_FIXED) >> + return -ENOMEM; >> + >> + /* >> + * Requested len is larger than whole area we're allowed to map in. >> + * Resetting hinting address wouldn't do much good -- fail early. >> + */ >> + if (len > DEFAULT_MAP_WINDOW) >> + return -ENOMEM; >> + >> + /* Look for unmap area within DEFAULT_MAP_WINDOW */ >> + return 0; >> +} >> > > -- Dmitry From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dmitry Safonov Subject: Re: [PATCH 8/8] x86/mm: Allow to have userspace mappings above 47-bits Date: Thu, 6 Apr 2017 22:15:47 +0300 Message-ID: References: <20170406140106.78087-1-kirill.shutemov@linux.intel.com> <20170406140106.78087-9-kirill.shutemov@linux.intel.com> <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-eopbgr20129.outbound.protection.outlook.com ([40.107.2.129]:40281 "EHLO EUR02-VE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755512AbdDFTT1 (ORCPT ); Thu, 6 Apr 2017 15:19:27 -0400 In-Reply-To: <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> Sender: linux-arch-owner@vger.kernel.org List-ID: To: "Kirill A. Shutemov" Cc: Linus Torvalds , Andrew Morton , x86@kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , Andy Lutomirski , linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On 04/06/2017 09:43 PM, Dmitry Safonov wrote: > Hi Kirill, > > On 04/06/2017 05:01 PM, Kirill A. Shutemov wrote: >> On x86, 5-level paging enables 56-bit userspace virtual address space. >> Not all user space is ready to handle wide addresses. It's known that >> at least some JIT compilers use higher bits in pointers to encode their >> information. It collides with valid pointers with 5-level paging and >> leads to crashes. >> >> To mitigate this, we are not going to allocate virtual address space >> above 47-bit by default. >> >> But userspace can ask for allocation from full address space by >> specifying hint address (with or without MAP_FIXED) above 47-bits. >> >> If hint address set above 47-bit, but MAP_FIXED is not specified, we try >> to look for unmapped area by specified address. If it's already >> occupied, we look for unmapped area in *full* address space, rather than >> from 47-bit window. > > Do you wish after the first over-47-bit mapping the following mmap() > calls return also over-47-bits if there is free space? > It so, you could simplify all this code by changing only mm->mmap_base > on the first over-47-bit mmap() call. > This will do simple trick. > >> >> This approach helps to easily make application's memory allocator aware >> about large address space without manually tracking allocated virtual >> address space. >> >> One important case we need to handle here is interaction with MPX. >> MPX (without MAWA( extension cannot handle addresses above 47-bit, so we >> need to make sure that MPX cannot be enabled we already have VMA above >> the boundary and forbid creating such VMAs once MPX is enabled. >> >> Signed-off-by: Kirill A. Shutemov >> Cc: Dmitry Safonov >> --- >> arch/x86/include/asm/elf.h | 2 +- >> arch/x86/include/asm/mpx.h | 9 +++++++++ >> arch/x86/include/asm/processor.h | 9 ++++++--- >> arch/x86/kernel/sys_x86_64.c | 28 +++++++++++++++++++++++++++- >> arch/x86/mm/hugetlbpage.c | 27 ++++++++++++++++++++++++--- >> arch/x86/mm/mmap.c | 2 +- >> arch/x86/mm/mpx.c | 33 ++++++++++++++++++++++++++++++++- >> 7 files changed, 100 insertions(+), 10 deletions(-) >> >> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h >> index d4d3ed456cb7..67260dbe1688 100644 >> --- a/arch/x86/include/asm/elf.h >> +++ b/arch/x86/include/asm/elf.h >> @@ -250,7 +250,7 @@ extern int force_personality32; >> the loader. We need to make sure that it is out of the way of the >> program >> that it will "exec", and that there is sufficient room for the >> brk. */ >> >> -#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) >> +#define ELF_ET_DYN_BASE (DEFAULT_MAP_WINDOW / 3 * 2) > > This will kill 32-bit userspace: > As DEFAULT_MAP_WINDOW is defined as what previously was TASK_SIZE_MAX, > not TASK_SIZE, for ia32/x32 ELF_ET_DYN_BASE will be over 4Gb. > > Here is the test: > [root@localhost test]# cat hello-world.c > #include > > int main(int argc, char **argv) > { > printf("Maybe this world is another planet's hell.\n"); > return 0; > } > [root@localhost test]# gcc -m32 hello-world.c -o hello-world > [root@localhost test]# ./hello-world > [ 35.306726] hello-world[1948]: segfault at ffa5288c ip > 00000000f77b5a82 sp 00000000ffa52890 error 6 in ld-2.23.so[f77b5000+23000] > Segmentation fault (core dumped) > > So, dynamic base should differ between 32/64-bits as it was with TASK_SIZE. I just tried to define it like this: -#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) +#define DEFAULT_MAP_WINDOW (test_thread_flag(TIF_ADDR32) ? \ + IA32_PAGE_OFFSET : ((1UL << 47) - PAGE_SIZE)) And it looks working better. > > >> >> /* This yields a mask that user programs can use to figure out what >> instruction set this CPU supports. This could be done in user space, >> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h >> index a0d662be4c5b..7d7404756bb4 100644 >> --- a/arch/x86/include/asm/mpx.h >> +++ b/arch/x86/include/asm/mpx.h >> @@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm) >> } >> void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, >> unsigned long start, unsigned long end); >> + >> +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned >> long len, >> + unsigned long flags); >> #else >> static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) >> { >> @@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct >> mm_struct *mm, >> unsigned long start, unsigned long end) >> { >> } >> + >> +static inline unsigned long mpx_unmapped_area_check(unsigned long addr, >> + unsigned long len, unsigned long flags) >> +{ >> + return addr; >> +} >> #endif /* CONFIG_X86_INTEL_MPX */ >> >> #endif /* _ASM_X86_MPX_H */ >> diff --git a/arch/x86/include/asm/processor.h >> b/arch/x86/include/asm/processor.h >> index 3cada998a402..9f437aea7f57 100644 >> --- a/arch/x86/include/asm/processor.h >> +++ b/arch/x86/include/asm/processor.h >> @@ -795,6 +795,7 @@ static inline void spin_lock_prefetch(const void *x) >> #define IA32_PAGE_OFFSET PAGE_OFFSET >> #define TASK_SIZE PAGE_OFFSET >> #define TASK_SIZE_MAX TASK_SIZE >> +#define DEFAULT_MAP_WINDOW TASK_SIZE >> #define STACK_TOP TASK_SIZE >> #define STACK_TOP_MAX STACK_TOP >> >> @@ -834,7 +835,9 @@ static inline void spin_lock_prefetch(const void *x) >> * particular problem by preventing anything from being mapped >> * at the maximum canonical address. >> */ >> -#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) >> +#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) >> + >> +#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) >> >> /* This decides where the kernel will search for a free chunk of vm >> * space during mmap's. >> @@ -847,7 +850,7 @@ static inline void spin_lock_prefetch(const void *x) >> #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, >> TIF_ADDR32)) ? \ >> IA32_PAGE_OFFSET : TASK_SIZE_MAX) >> >> -#define STACK_TOP TASK_SIZE >> +#define STACK_TOP DEFAULT_MAP_WINDOW >> #define STACK_TOP_MAX TASK_SIZE_MAX >> >> #define INIT_THREAD { \ >> @@ -870,7 +873,7 @@ extern void start_thread(struct pt_regs *regs, >> unsigned long new_ip, >> * space during mmap's. >> */ >> #define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3)) >> -#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE) >> +#define TASK_UNMAPPED_BASE >> __TASK_UNMAPPED_BASE(DEFAULT_MAP_WINDOW) > > ditto > >> >> #define KSTK_EIP(task) (task_pt_regs(task)->ip) >> >> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c >> index 207b8f2582c7..593a31e93812 100644 >> --- a/arch/x86/kernel/sys_x86_64.c >> +++ b/arch/x86/kernel/sys_x86_64.c >> @@ -21,6 +21,7 @@ >> #include >> #include >> #include >> +#include >> >> /* >> * Align a virtual address to avoid aliasing in the I$ on AMD F15h. >> @@ -132,6 +133,10 @@ arch_get_unmapped_area(struct file *filp, >> unsigned long addr, >> struct vm_unmapped_area_info info; >> unsigned long begin, end; >> >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> if (flags & MAP_FIXED) >> return addr; >> >> @@ -151,7 +156,16 @@ arch_get_unmapped_area(struct file *filp, >> unsigned long addr, >> info.flags = 0; >> info.length = len; >> info.low_limit = begin; >> - info.high_limit = end; >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW) >> + info.high_limit = min(end, TASK_SIZE); >> + else >> + info.high_limit = min(end, DEFAULT_MAP_WINDOW); >> + >> info.align_mask = 0; >> info.align_offset = pgoff << PAGE_SHIFT; >> if (filp) { >> @@ -171,6 +185,10 @@ arch_get_unmapped_area_topdown(struct file *filp, >> const unsigned long addr0, >> unsigned long addr = addr0; >> struct vm_unmapped_area_info info; >> >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> /* requested length too big for entire address space */ >> if (len > TASK_SIZE) >> return -ENOMEM; >> @@ -195,6 +213,14 @@ arch_get_unmapped_area_topdown(struct file *filp, >> const unsigned long addr0, >> info.length = len; >> info.low_limit = PAGE_SIZE; >> info.high_limit = get_mmap_base(0); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) >> + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > Hmm, TASK_SIZE depends now on TIF_ADDR32, which is set during exec(). > That means for ia32/x32 ELF which has TASK_SIZE < 4Gb as TIF_ADDR32 > is set, which can do 64-bit syscalls - the subtraction will be > a negative.. > > >> + >> info.align_mask = 0; >> info.align_offset = pgoff << PAGE_SHIFT; >> if (filp) { >> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >> index 302f43fd9c28..9a0b89252c52 100644 >> --- a/arch/x86/mm/hugetlbpage.c >> +++ b/arch/x86/mm/hugetlbpage.c >> @@ -18,6 +18,7 @@ >> #include >> #include >> #include >> +#include >> >> #if 0 /* This is just for testing */ >> struct page * >> @@ -87,23 +88,38 @@ static unsigned long >> hugetlb_get_unmapped_area_bottomup(struct file *file, >> info.low_limit = get_mmap_base(1); >> info.high_limit = in_compat_syscall() ? >> tasksize_32bit() : tasksize_64bit(); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW) >> + info.high_limit = TASK_SIZE; >> + >> info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> info.align_offset = 0; >> return vm_unmapped_area(&info); >> } >> >> static unsigned long hugetlb_get_unmapped_area_topdown(struct file >> *file, >> - unsigned long addr0, unsigned long len, >> + unsigned long addr, unsigned long len, >> unsigned long pgoff, unsigned long flags) >> { >> struct hstate *h = hstate_file(file); >> struct vm_unmapped_area_info info; >> - unsigned long addr; >> >> info.flags = VM_UNMAPPED_AREA_TOPDOWN; >> info.length = len; >> info.low_limit = PAGE_SIZE; >> info.high_limit = get_mmap_base(0); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) >> + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > ditto > >> + >> info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> info.align_offset = 0; >> addr = vm_unmapped_area(&info); >> @@ -118,7 +134,7 @@ static unsigned long >> hugetlb_get_unmapped_area_topdown(struct file *file, >> VM_BUG_ON(addr != -ENOMEM); >> info.flags = 0; >> info.low_limit = TASK_UNMAPPED_BASE; >> - info.high_limit = TASK_SIZE; >> + info.high_limit = DEFAULT_MAP_WINDOW; > > ditto about 32-bits > >> addr = vm_unmapped_area(&info); >> } >> >> @@ -135,6 +151,11 @@ hugetlb_get_unmapped_area(struct file *file, >> unsigned long addr, >> >> if (len & ~huge_page_mask(h)) >> return -EINVAL; >> + >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> if (len > TASK_SIZE) >> return -ENOMEM; >> >> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c >> index 19ad095b41df..d63232a31945 100644 >> --- a/arch/x86/mm/mmap.c >> +++ b/arch/x86/mm/mmap.c >> @@ -44,7 +44,7 @@ unsigned long tasksize_32bit(void) >> >> unsigned long tasksize_64bit(void) >> { >> - return TASK_SIZE_MAX; >> + return DEFAULT_MAP_WINDOW; >> } >> >> static unsigned long stack_maxrandom_size(unsigned long task_size) >> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c >> index cd44ae727df7..a26a1b373fd0 100644 >> --- a/arch/x86/mm/mpx.c >> +++ b/arch/x86/mm/mpx.c >> @@ -355,10 +355,19 @@ int mpx_enable_management(void) >> */ >> bd_base = mpx_get_bounds_dir(); >> down_write(&mm->mmap_sem); >> + >> + /* MPX doesn't support addresses above 47-bits yet. */ >> + if (find_vma(mm, DEFAULT_MAP_WINDOW)) { >> + pr_warn_once("%s (%d): MPX cannot handle addresses " >> + "above 47-bits. Disabling.", >> + current->comm, current->pid); >> + ret = -ENXIO; >> + goto out; >> + } >> mm->context.bd_addr = bd_base; >> if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) >> ret = -ENXIO; >> - >> +out: >> up_write(&mm->mmap_sem); >> return ret; >> } >> @@ -1038,3 +1047,25 @@ void mpx_notify_unmap(struct mm_struct *mm, >> struct vm_area_struct *vma, >> if (ret) >> force_sig(SIGSEGV, current); >> } >> + >> +/* MPX cannot handle addresses above 47-bits yet. */ >> +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned >> long len, >> + unsigned long flags) >> +{ >> + if (!kernel_managing_mpx_tables(current->mm)) >> + return addr; >> + if (addr + len <= DEFAULT_MAP_WINDOW) >> + return addr; >> + if (flags & MAP_FIXED) >> + return -ENOMEM; >> + >> + /* >> + * Requested len is larger than whole area we're allowed to map in. >> + * Resetting hinting address wouldn't do much good -- fail early. >> + */ >> + if (len > DEFAULT_MAP_WINDOW) >> + return -ENOMEM; >> + >> + /* Look for unmap area within DEFAULT_MAP_WINDOW */ >> + return 0; >> +} >> > > -- Dmitry From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f198.google.com (mail-qt0-f198.google.com [209.85.216.198]) by kanga.kvack.org (Postfix) with ESMTP id 52DFF6B0390 for ; Thu, 6 Apr 2017 15:35:53 -0400 (EDT) Received: by mail-qt0-f198.google.com with SMTP id y19so14604347qty.21 for ; Thu, 06 Apr 2017 12:35:53 -0700 (PDT) Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0138.outbound.protection.outlook.com. [104.47.0.138]) by mx.google.com with ESMTPS id w76si2259228qka.93.2017.04.06.12.35.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 06 Apr 2017 12:35:51 -0700 (PDT) Subject: Re: [PATCH 8/8] x86/mm: Allow to have userspace mappings above 47-bits References: <20170406140106.78087-1-kirill.shutemov@linux.intel.com> <20170406140106.78087-9-kirill.shutemov@linux.intel.com> <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> From: Dmitry Safonov Message-ID: Date: Thu, 6 Apr 2017 22:15:47 +0300 MIME-Version: 1.0 In-Reply-To: <3cb79f4b-76f5-6e31-6973-e9281b2e4553@virtuozzo.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: Linus Torvalds , Andrew Morton , x86@kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , Andy Lutomirski , linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On 04/06/2017 09:43 PM, Dmitry Safonov wrote: > Hi Kirill, > > On 04/06/2017 05:01 PM, Kirill A. Shutemov wrote: >> On x86, 5-level paging enables 56-bit userspace virtual address space. >> Not all user space is ready to handle wide addresses. It's known that >> at least some JIT compilers use higher bits in pointers to encode their >> information. It collides with valid pointers with 5-level paging and >> leads to crashes. >> >> To mitigate this, we are not going to allocate virtual address space >> above 47-bit by default. >> >> But userspace can ask for allocation from full address space by >> specifying hint address (with or without MAP_FIXED) above 47-bits. >> >> If hint address set above 47-bit, but MAP_FIXED is not specified, we try >> to look for unmapped area by specified address. If it's already >> occupied, we look for unmapped area in *full* address space, rather than >> from 47-bit window. > > Do you wish after the first over-47-bit mapping the following mmap() > calls return also over-47-bits if there is free space? > It so, you could simplify all this code by changing only mm->mmap_base > on the first over-47-bit mmap() call. > This will do simple trick. > >> >> This approach helps to easily make application's memory allocator aware >> about large address space without manually tracking allocated virtual >> address space. >> >> One important case we need to handle here is interaction with MPX. >> MPX (without MAWA( extension cannot handle addresses above 47-bit, so we >> need to make sure that MPX cannot be enabled we already have VMA above >> the boundary and forbid creating such VMAs once MPX is enabled. >> >> Signed-off-by: Kirill A. Shutemov >> Cc: Dmitry Safonov >> --- >> arch/x86/include/asm/elf.h | 2 +- >> arch/x86/include/asm/mpx.h | 9 +++++++++ >> arch/x86/include/asm/processor.h | 9 ++++++--- >> arch/x86/kernel/sys_x86_64.c | 28 +++++++++++++++++++++++++++- >> arch/x86/mm/hugetlbpage.c | 27 ++++++++++++++++++++++++--- >> arch/x86/mm/mmap.c | 2 +- >> arch/x86/mm/mpx.c | 33 ++++++++++++++++++++++++++++++++- >> 7 files changed, 100 insertions(+), 10 deletions(-) >> >> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h >> index d4d3ed456cb7..67260dbe1688 100644 >> --- a/arch/x86/include/asm/elf.h >> +++ b/arch/x86/include/asm/elf.h >> @@ -250,7 +250,7 @@ extern int force_personality32; >> the loader. We need to make sure that it is out of the way of the >> program >> that it will "exec", and that there is sufficient room for the >> brk. */ >> >> -#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) >> +#define ELF_ET_DYN_BASE (DEFAULT_MAP_WINDOW / 3 * 2) > > This will kill 32-bit userspace: > As DEFAULT_MAP_WINDOW is defined as what previously was TASK_SIZE_MAX, > not TASK_SIZE, for ia32/x32 ELF_ET_DYN_BASE will be over 4Gb. > > Here is the test: > [root@localhost test]# cat hello-world.c > #include > > int main(int argc, char **argv) > { > printf("Maybe this world is another planet's hell.\n"); > return 0; > } > [root@localhost test]# gcc -m32 hello-world.c -o hello-world > [root@localhost test]# ./hello-world > [ 35.306726] hello-world[1948]: segfault at ffa5288c ip > 00000000f77b5a82 sp 00000000ffa52890 error 6 in ld-2.23.so[f77b5000+23000] > Segmentation fault (core dumped) > > So, dynamic base should differ between 32/64-bits as it was with TASK_SIZE. I just tried to define it like this: -#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) +#define DEFAULT_MAP_WINDOW (test_thread_flag(TIF_ADDR32) ? \ + IA32_PAGE_OFFSET : ((1UL << 47) - PAGE_SIZE)) And it looks working better. > > >> >> /* This yields a mask that user programs can use to figure out what >> instruction set this CPU supports. This could be done in user space, >> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h >> index a0d662be4c5b..7d7404756bb4 100644 >> --- a/arch/x86/include/asm/mpx.h >> +++ b/arch/x86/include/asm/mpx.h >> @@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm) >> } >> void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, >> unsigned long start, unsigned long end); >> + >> +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned >> long len, >> + unsigned long flags); >> #else >> static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) >> { >> @@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct >> mm_struct *mm, >> unsigned long start, unsigned long end) >> { >> } >> + >> +static inline unsigned long mpx_unmapped_area_check(unsigned long addr, >> + unsigned long len, unsigned long flags) >> +{ >> + return addr; >> +} >> #endif /* CONFIG_X86_INTEL_MPX */ >> >> #endif /* _ASM_X86_MPX_H */ >> diff --git a/arch/x86/include/asm/processor.h >> b/arch/x86/include/asm/processor.h >> index 3cada998a402..9f437aea7f57 100644 >> --- a/arch/x86/include/asm/processor.h >> +++ b/arch/x86/include/asm/processor.h >> @@ -795,6 +795,7 @@ static inline void spin_lock_prefetch(const void *x) >> #define IA32_PAGE_OFFSET PAGE_OFFSET >> #define TASK_SIZE PAGE_OFFSET >> #define TASK_SIZE_MAX TASK_SIZE >> +#define DEFAULT_MAP_WINDOW TASK_SIZE >> #define STACK_TOP TASK_SIZE >> #define STACK_TOP_MAX STACK_TOP >> >> @@ -834,7 +835,9 @@ static inline void spin_lock_prefetch(const void *x) >> * particular problem by preventing anything from being mapped >> * at the maximum canonical address. >> */ >> -#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) >> +#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) >> + >> +#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) >> >> /* This decides where the kernel will search for a free chunk of vm >> * space during mmap's. >> @@ -847,7 +850,7 @@ static inline void spin_lock_prefetch(const void *x) >> #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, >> TIF_ADDR32)) ? \ >> IA32_PAGE_OFFSET : TASK_SIZE_MAX) >> >> -#define STACK_TOP TASK_SIZE >> +#define STACK_TOP DEFAULT_MAP_WINDOW >> #define STACK_TOP_MAX TASK_SIZE_MAX >> >> #define INIT_THREAD { \ >> @@ -870,7 +873,7 @@ extern void start_thread(struct pt_regs *regs, >> unsigned long new_ip, >> * space during mmap's. >> */ >> #define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3)) >> -#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE) >> +#define TASK_UNMAPPED_BASE >> __TASK_UNMAPPED_BASE(DEFAULT_MAP_WINDOW) > > ditto > >> >> #define KSTK_EIP(task) (task_pt_regs(task)->ip) >> >> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c >> index 207b8f2582c7..593a31e93812 100644 >> --- a/arch/x86/kernel/sys_x86_64.c >> +++ b/arch/x86/kernel/sys_x86_64.c >> @@ -21,6 +21,7 @@ >> #include >> #include >> #include >> +#include >> >> /* >> * Align a virtual address to avoid aliasing in the I$ on AMD F15h. >> @@ -132,6 +133,10 @@ arch_get_unmapped_area(struct file *filp, >> unsigned long addr, >> struct vm_unmapped_area_info info; >> unsigned long begin, end; >> >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> if (flags & MAP_FIXED) >> return addr; >> >> @@ -151,7 +156,16 @@ arch_get_unmapped_area(struct file *filp, >> unsigned long addr, >> info.flags = 0; >> info.length = len; >> info.low_limit = begin; >> - info.high_limit = end; >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW) >> + info.high_limit = min(end, TASK_SIZE); >> + else >> + info.high_limit = min(end, DEFAULT_MAP_WINDOW); >> + >> info.align_mask = 0; >> info.align_offset = pgoff << PAGE_SHIFT; >> if (filp) { >> @@ -171,6 +185,10 @@ arch_get_unmapped_area_topdown(struct file *filp, >> const unsigned long addr0, >> unsigned long addr = addr0; >> struct vm_unmapped_area_info info; >> >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> /* requested length too big for entire address space */ >> if (len > TASK_SIZE) >> return -ENOMEM; >> @@ -195,6 +213,14 @@ arch_get_unmapped_area_topdown(struct file *filp, >> const unsigned long addr0, >> info.length = len; >> info.low_limit = PAGE_SIZE; >> info.high_limit = get_mmap_base(0); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) >> + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > Hmm, TASK_SIZE depends now on TIF_ADDR32, which is set during exec(). > That means for ia32/x32 ELF which has TASK_SIZE < 4Gb as TIF_ADDR32 > is set, which can do 64-bit syscalls - the subtraction will be > a negative.. > > >> + >> info.align_mask = 0; >> info.align_offset = pgoff << PAGE_SHIFT; >> if (filp) { >> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >> index 302f43fd9c28..9a0b89252c52 100644 >> --- a/arch/x86/mm/hugetlbpage.c >> +++ b/arch/x86/mm/hugetlbpage.c >> @@ -18,6 +18,7 @@ >> #include >> #include >> #include >> +#include >> >> #if 0 /* This is just for testing */ >> struct page * >> @@ -87,23 +88,38 @@ static unsigned long >> hugetlb_get_unmapped_area_bottomup(struct file *file, >> info.low_limit = get_mmap_base(1); >> info.high_limit = in_compat_syscall() ? >> tasksize_32bit() : tasksize_64bit(); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW) >> + info.high_limit = TASK_SIZE; >> + >> info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> info.align_offset = 0; >> return vm_unmapped_area(&info); >> } >> >> static unsigned long hugetlb_get_unmapped_area_topdown(struct file >> *file, >> - unsigned long addr0, unsigned long len, >> + unsigned long addr, unsigned long len, >> unsigned long pgoff, unsigned long flags) >> { >> struct hstate *h = hstate_file(file); >> struct vm_unmapped_area_info info; >> - unsigned long addr; >> >> info.flags = VM_UNMAPPED_AREA_TOPDOWN; >> info.length = len; >> info.low_limit = PAGE_SIZE; >> info.high_limit = get_mmap_base(0); >> + >> + /* >> + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped >> area >> + * in the full address space. >> + */ >> + if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall()) >> + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > ditto > >> + >> info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> info.align_offset = 0; >> addr = vm_unmapped_area(&info); >> @@ -118,7 +134,7 @@ static unsigned long >> hugetlb_get_unmapped_area_topdown(struct file *file, >> VM_BUG_ON(addr != -ENOMEM); >> info.flags = 0; >> info.low_limit = TASK_UNMAPPED_BASE; >> - info.high_limit = TASK_SIZE; >> + info.high_limit = DEFAULT_MAP_WINDOW; > > ditto about 32-bits > >> addr = vm_unmapped_area(&info); >> } >> >> @@ -135,6 +151,11 @@ hugetlb_get_unmapped_area(struct file *file, >> unsigned long addr, >> >> if (len & ~huge_page_mask(h)) >> return -EINVAL; >> + >> + addr = mpx_unmapped_area_check(addr, len, flags); >> + if (IS_ERR_VALUE(addr)) >> + return addr; >> + >> if (len > TASK_SIZE) >> return -ENOMEM; >> >> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c >> index 19ad095b41df..d63232a31945 100644 >> --- a/arch/x86/mm/mmap.c >> +++ b/arch/x86/mm/mmap.c >> @@ -44,7 +44,7 @@ unsigned long tasksize_32bit(void) >> >> unsigned long tasksize_64bit(void) >> { >> - return TASK_SIZE_MAX; >> + return DEFAULT_MAP_WINDOW; >> } >> >> static unsigned long stack_maxrandom_size(unsigned long task_size) >> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c >> index cd44ae727df7..a26a1b373fd0 100644 >> --- a/arch/x86/mm/mpx.c >> +++ b/arch/x86/mm/mpx.c >> @@ -355,10 +355,19 @@ int mpx_enable_management(void) >> */ >> bd_base = mpx_get_bounds_dir(); >> down_write(&mm->mmap_sem); >> + >> + /* MPX doesn't support addresses above 47-bits yet. */ >> + if (find_vma(mm, DEFAULT_MAP_WINDOW)) { >> + pr_warn_once("%s (%d): MPX cannot handle addresses " >> + "above 47-bits. Disabling.", >> + current->comm, current->pid); >> + ret = -ENXIO; >> + goto out; >> + } >> mm->context.bd_addr = bd_base; >> if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) >> ret = -ENXIO; >> - >> +out: >> up_write(&mm->mmap_sem); >> return ret; >> } >> @@ -1038,3 +1047,25 @@ void mpx_notify_unmap(struct mm_struct *mm, >> struct vm_area_struct *vma, >> if (ret) >> force_sig(SIGSEGV, current); >> } >> + >> +/* MPX cannot handle addresses above 47-bits yet. */ >> +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned >> long len, >> + unsigned long flags) >> +{ >> + if (!kernel_managing_mpx_tables(current->mm)) >> + return addr; >> + if (addr + len <= DEFAULT_MAP_WINDOW) >> + return addr; >> + if (flags & MAP_FIXED) >> + return -ENOMEM; >> + >> + /* >> + * Requested len is larger than whole area we're allowed to map in. >> + * Resetting hinting address wouldn't do much good -- fail early. >> + */ >> + if (len > DEFAULT_MAP_WINDOW) >> + return -ENOMEM; >> + >> + /* Look for unmap area within DEFAULT_MAP_WINDOW */ >> + return 0; >> +} >> > > -- Dmitry -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org