From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758797AbdACNWN (ORCPT ); Tue, 3 Jan 2017 08:22:13 -0500 Received: from mout.kundenserver.de ([212.227.17.13]:61258 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934605AbdACNUv (ORCPT ); Tue, 3 Jan 2017 08:20:51 -0500 From: Arnd Bergmann To: Andy Lutomirski Cc: "Kirill A. Shutemov" , Linus Torvalds , Andrew Morton , X86 ML , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , linux-arch , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Linux API , "linux-arm-kernel@lists.infradead.org" , Catalin Marinas , Will Deacon Subject: Re: [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR Date: Tue, 03 Jan 2017 14:18:01 +0100 Message-ID: <3492795.xaneWtGxgW@wuerfel> User-Agent: KMail/5.1.3 (Linux/4.4.0-34-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: References: <20161227015413.187403-1-kirill.shutemov@linux.intel.com> <2736959.3MfCab47fD@wuerfel> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Provags-ID: V03:K0:arPtIgqKX8B3VmYmf9hxn3/IOK9oqCKrj7McSz2VqwSYJ/pkrR5 TE5APwSewLJ+9Z9VQEV5aT8JHFYK8phgD4TaU88mZ7vkb7dS0MQPENM8Zo512rD+BnGhNHO KvGipfkeRNG+X7PGSIrXY42+SBM89E4hGG0UCnol8blxlymLznbmD3NEdBAH7VQgESDrgnp cs37VU+YWfnRWhxXJ35+w== X-UI-Out-Filterresults: notjunk:1;V01:K0:Ua9JrYXe/Bg=:w6YEw+w+fN9x97IMpQmj2Z sf52GQs4iyP4HWLrw9BTHkVSzTkqORZ8Y04/4BjjErgBgF2PERSXUZwR3KL+JDQszu8nA72eP qNxSBc6g5hF1bH7iZYbdjESBmBcGuKkUBBTfHBCg3uclpRKFcjfABiYkqloCv1nnAlV2kI2rR BPQ08jaq9Axm1WfHb8Lq4gPZIA49jamAmswSKeqLE/B+XxZU6kt2JwkfKhipmJcoS2xAt+TQS meAQF++q6VVg8eqvxHjJ/5A7Lxn/vp98jGk5y5pyQGcm0cfSToy03qVZ6nWx4bdO/X+XB3CAP snB/O6oKcygzjinexw1nxLcV5Y8N39RqdLSm29wy5KpexR+tdQWmkLYLLxflkPLe3XnsNssjI X3Z6NaTxov+eyZd56bGTm7g/cZET7aouYSPrnxC8z3bRESNN7AQ4mfXGbrNizafHYd5D9fKs2 BaVpqU+VamsMUeX1/iTXUSl7Ae7SS6azAnzQ6Zwn3zhLec5wdbkXLN+L2kjeCV9X/Eg74AOmU Mfj3hDln2CmDN0b4gxRcsz9lnoZjBzIgSQxeYh0ZnxZPA57lwEf2n1D6RbacwOVLNkuu/7fhh v2vnzOqXb0oLN8y4f7A7NoOoVmvylbE8uQDuEtOXHZxhnTnsGdSZTKV+o4/KK0tvgzyrURbgJ gaIuMMs++wTNw21QMQn5RjIb6sBM8fx/V/CJQm6VQsvV/IAx40ol6rEMIqhfOkM2qqM4oVpWt lDOEeKuqOJHIZCfh Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday, January 2, 2017 10:08:28 PM CET Andy Lutomirski wrote: > > > This seems to nicely address the same problem on arm64, which has > > run into the same issue due to the various page table formats > > that can currently be chosen at compile time. > > On further reflection, I think this has very little to do with paging > formats except insofar as paging formats make us notice the problem. > The issue is that user code wants to be able to assume an upper limit > on an address, and it gets an upper limit right now that depends on > architecture due to paging formats. But someone really might want to > write a *portable* 64-bit program that allocates memory with the high > 16 bits clear. So let's add such a mechanism directly. > > As a thought experiment, what if x86_64 simply never allocated "high" > (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall > were used? Old glibc would continue working. Old VMs would work. > New programs that want to use ginormous mappings would have to use the > new syscall. This would be totally stateless and would have no issues > with CRIU. I can see this working well for the 47-bit addressing default, but what about applications that actually rely on 39-bit addressing (I'd have to double-check, but I think this was the limit that people were most interested in for arm64)? 39 bits seems a little small to make that the default for everyone who doesn't pass the extra flag. Having to pass another flag to limit the addresses introduces other problems (e.g. mmap from library call that doesn't pass that flag). > If necessary, we could also have a prctl that changes a > "personality-like" limit that is in effect when the old mmap was used. > I say "personality-like" because it would reset under exactly the same > conditions that personality resets itself. For "personality-like", it would still have to interact with the existing PER_LINUX32 and PER_LINUX32_3GB flags that do the exact same thing, so actually using personality might be better. We still have a few bits in the personality arguments, and we could combine them with the existing ADDR_LIMIT_3GB and ADDR_LIMIT_32BIT flags that are mutually exclusive by definition, such as ADDR_LIMIT_32BIT = 0x0800000, /* existing */ ADDR_LIMIT_3GB = 0x8000000, /* existing */ ADDR_LIMIT_39BIT = 0x0010000, /* next free bit */ ADDR_LIMIT_42BIT = 0x8010000, ADDR_LIMIT_47BIT = 0x0810000, ADDR_LIMIT_48BIT = 0x8810000, This would probably take only one or two personality bits for the limits that are interesting in practice. Arnd