From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755334AbdABJKA (ORCPT ); Mon, 2 Jan 2017 04:10:00 -0500 Received: from mail-wj0-f195.google.com ([209.85.210.195]:36824 "EHLO mail-wj0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750990AbdABJJ4 (ORCPT ); Mon, 2 Jan 2017 04:09:56 -0500 Date: Mon, 2 Jan 2017 12:09:52 +0300 From: "Kirill A. Shutemov" To: Andy Lutomirski Cc: "Kirill A. Shutemov" , Linus Torvalds , Andrew Morton , X86 ML , Thomas Gleixner , Ingo Molnar , Arnd Bergmann , "H. Peter Anvin" , Andi Kleen , Dave Hansen , linux-arch , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Linux API Subject: Re: [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR Message-ID: <20170102090952.GB30735@node.shutemov.name> References: <20161227015413.187403-1-kirill.shutemov@linux.intel.com> <20161227015413.187403-30-kirill.shutemov@linux.intel.com> <20161227022405.GA8780@node.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 26, 2016 at 07:22:03PM -0800, Andy Lutomirski wrote: > On Mon, Dec 26, 2016 at 6:24 PM, Kirill A. Shutemov > wrote: > > On Mon, Dec 26, 2016 at 06:06:01PM -0800, Andy Lutomirski wrote: > >> On Mon, Dec 26, 2016 at 5:54 PM, Kirill A. Shutemov > >> wrote: > >> > This patch introduces new rlimit resource to manage maximum virtual > >> > address available to userspace to map. > >> > > >> > On x86, 5-level paging enables 56-bit userspace virtual address space. > >> > Not all user space is ready to handle wide addresses. It's known that > >> > at least some JIT compilers use high bit in pointers to encode their > >> > information. It collides with valid pointers with 5-level paging and > >> > leads to crashes. > >> > > >> > The patch aims to address this compatibility issue. > >> > > >> > MM would use min(RLIMIT_VADDR, TASK_SIZE) as upper limit of virtual > >> > address available to map by userspace. > >> > > >> > The default hard limit will be RLIM_INFINITY, which basically means that > >> > TASK_SIZE limits available address space. > >> > > >> > The soft limit will also be RLIM_INFINITY everywhere, but the machine > >> > with 5-level paging enabled. In this case, soft limit would be > >> > (1UL << 47) - PAGE_SIZE. It’s current x86-64 TASK_SIZE_MAX with 4-level > >> > paging which known to be safe > >> > > >> > New rlimit resource would follow usual semantics with regards to > >> > inheritance: preserved on fork(2) and exec(2). This has potential to > >> > break application if limits set too wide or too narrow, but this is not > >> > uncommon for other resources (consider RLIMIT_DATA or RLIMIT_AS). > >> > > >> > As with other resources you can set the limit lower than current usage. > >> > It would affect only future virtual address space allocations. > >> > > >> > Use-cases for new rlimit: > >> > > >> > - Bumping the soft limit to RLIM_INFINITY, allows current process all > >> > its children to use addresses above 47-bits. > >> > > >> > - Bumping the soft limit to RLIM_INFINITY after fork(2), but before > >> > exec(2) allows the child to use addresses above 47-bits. > >> > > >> > - Lowering the hard limit to 47-bits would prevent current process all > >> > its children to use addresses above 47-bits, unless a process has > >> > CAP_SYS_RESOURCES. > >> > > >> > - It’s also can be handy to lower hard or soft limit to arbitrary > >> > address. User-mode emulation in QEMU may lower the limit to 32-bit > >> > to emulate 32-bit machine on 64-bit host. > >> > >> I tend to think that this should be a personality or an ELF flag, not > >> an rlimit. > > > > My plan was to implement ELF flag on top. Basically, ELF flag would mean > > that we bump soft limit to hard limit on exec. > > > >> That way setuid works right. > > > > Um.. I probably miss background here. > > > > If a setuid program depends on the lower limit, then a malicious > program shouldn't be able to cause it to run with the higher limit. > The personality code should already get this case right because > personalities are reset when setuid happens. It would be nice to have more fine-grained control than binary personality flag gives. It would cover more use-cases. Well, we could reset the limit on exec of setuid binary too. That's not ideal, but... -- Kirill A. Shutemov