From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0F84C43331 for ; Thu, 2 Apr 2020 04:09:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 538CB20747 for ; Thu, 2 Apr 2020 04:09:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="XftRYf/S" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 538CB20747 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EECFF8E006A; Thu, 2 Apr 2020 00:09:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E4E918E000D; Thu, 2 Apr 2020 00:09:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEFE08E006A; Thu, 2 Apr 2020 00:09:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60]) by kanga.kvack.org (Postfix) with ESMTP id B18D68E000D for ; Thu, 2 Apr 2020 00:09:19 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 70BBD181AC9CC for ; Thu, 2 Apr 2020 04:09:19 +0000 (UTC) X-FDA: 76661585238.22.mine39_515b21ce53761 X-HE-Tag: mine39_515b21ce53761 X-Filterd-Recvd-Size: 10994 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Thu, 2 Apr 2020 04:09:18 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 872FC206E9; Thu, 2 Apr 2020 04:09:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1585800558; bh=kCRusNNvwq7qek/5UHa8uEkwrH+Tsg4lY3byjpmcODs=; h=Date:From:To:Subject:In-Reply-To:From; b=XftRYf/SjYWn3k4/vS2lDZ62wRVBaJAyJ0MBAaOgJ1Szd0rUcaW1aQzbzuoPo1mgA jotzNaJub+eQEpjh57JbO2txuEFVREaH/9QF/AQpr5mT64g2wcr8NvuWMv06msIWZt HM+QKQHyFh4qCEEy66pxkJ5ugDVytLuSA+ysh034= Date: Wed, 01 Apr 2020 21:09:17 -0700 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, arnd@arndb.de, bgeffon@google.com, fweimer@redhat.com, joel@joelfernandes.org, jsbarnes@google.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, lokeshgidra@google.com, luto@amacapital.net, minchan@kernel.org, mm-commits@vger.kernel.org, mst@redhat.com, natechancellor@gmail.com, sonnyrao@google.com, torvalds@linux-foundation.org, vbabka@suse.cz, will@kernel.org, yuzhao@google.com Subject: [patch 106/155] mm/mremap: add MREMAP_DONTUNMAP to mremap() Message-ID: <20200402040917.hq6rsr5i3%akpm@linux-foundation.org> In-Reply-To: <20200401210155.09e3b9742e1c6e732f5a7250@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Brian Geffon Subject: mm/mremap: add MREMAP_DONTUNMAP to mremap() When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. The remap operation will be performed as it would have been normally by moving over the page tables to the new mapping. The old vma will have any locked flags cleared, have no pagetables, and any userfaultfds that were watching that range will continue watching it. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. Because MREMAP_DONTUNMAP always results in moving a VMA you MUST use the MREMAP_MAYMOVE flag, it's not possible to resize a VMA while also moving with MREMAP_DONTUNMAP so old_len must always be equal to the new_len otherwise it will return -EINVAL. We hope to use this in Chrome OS where with userfaultfd we could write an anonymous mapping to disk without having to STOP the process or worry about VMA permission changes. This feature also has a use case in Android, Lokesh Gidra has said that "As part of using userfaultfd for GC, We'll have to move the physical pages of the java heap to a separate location. For this purpose mremap will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java heap, its virtual mapping will be removed as well. Therefore, we'll require performing mmap immediately after. This is not only time consuming but also opens a time window where a native thread may call mmap and reserve the java heap's address range for its own usage. This flag solves the problem." [bgeffon@google.com: v6] Link: http://lkml.kernel.org/r/20200218173221.237674-1-bgeffon@google.com [bgeffon@google.com: v7] Link: http://lkml.kernel.org/r/20200221174248.244748-1-bgeffon@google.com Link: http://lkml.kernel.org/r/20200207201856.46070-1-bgeffon@google.com Signed-off-by: Brian Geffon Tested-by: Lokesh Gidra Reviewed-by: Minchan Kim Acked-by: Kirill A. Shutemov Acked-by: Vlastimil Babka Cc: "Michael S . Tsirkin" Cc: Arnd Bergmann Cc: Andy Lutomirski Cc: Will Deacon Cc: Andrea Arcangeli Cc: Sonny Rao Cc: Minchan Kim Cc: Joel Fernandes Cc: Yu Zhao Cc: Jesse Barnes Cc: Nathan Chancellor Cc: Florian Weimer Signed-off-by: Andrew Morton --- include/uapi/linux/mman.h | 5 +- mm/mremap.c | 90 +++++++++++++++++++++++++++--------- 2 files changed, 72 insertions(+), 23 deletions(-) --- a/include/uapi/linux/mman.h~mm-add-mremap_dontunmap-to-mremap +++ a/include/uapi/linux/mman.h @@ -5,8 +5,9 @@ #include #include -#define MREMAP_MAYMOVE 1 -#define MREMAP_FIXED 2 +#define MREMAP_MAYMOVE 1 +#define MREMAP_FIXED 2 +#define MREMAP_DONTUNMAP 4 #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 --- a/mm/mremap.c~mm-add-mremap_dontunmap-to-mremap +++ a/mm/mremap.c @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, - bool *locked, struct vm_userfaultfd_ctx *uf, - struct list_head *uf_unmap) + bool *locked, unsigned long flags, + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) { struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *new_vma; @@ -408,11 +408,32 @@ static unsigned long move_vma(struct vm_ if (unlikely(vma->vm_flags & VM_PFNMAP)) untrack_pfn_moved(vma); + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) { + if (vm_flags & VM_ACCOUNT) { + /* Always put back VM_ACCOUNT since we won't unmap */ + vma->vm_flags |= VM_ACCOUNT; + + vm_acct_memory(vma_pages(new_vma)); + } + + /* We always clear VM_LOCKED[ONFAULT] on the old vma */ + vma->vm_flags &= VM_LOCKED_CLEAR_MASK; + + /* Because we won't unmap we don't need to touch locked_vm */ + goto out; + } + if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { /* OOM: unable to split vma, just get accounts right */ vm_unacct_memory(excess >> PAGE_SHIFT); excess = 0; } + + if (vm_flags & VM_LOCKED) { + mm->locked_vm += new_len >> PAGE_SHIFT; + *locked = true; + } +out: mm->hiwater_vm = hiwater_vm; /* Restore VM_ACCOUNT if one or two pieces of vma left */ @@ -422,16 +443,12 @@ static unsigned long move_vma(struct vm_ vma->vm_next->vm_flags |= VM_ACCOUNT; } - if (vm_flags & VM_LOCKED) { - mm->locked_vm += new_len >> PAGE_SHIFT; - *locked = true; - } - return new_addr; } static struct vm_area_struct *vma_to_resize(unsigned long addr, - unsigned long old_len, unsigned long new_len, unsigned long *p) + unsigned long old_len, unsigned long new_len, unsigned long flags, + unsigned long *p) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = find_vma(mm, addr); @@ -453,6 +470,10 @@ static struct vm_area_struct *vma_to_res return ERR_PTR(-EINVAL); } + if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) || + vma->vm_flags & VM_SHARED)) + return ERR_PTR(-EINVAL); + if (is_vm_hugetlb_page(vma)) return ERR_PTR(-EINVAL); @@ -497,7 +518,7 @@ static struct vm_area_struct *vma_to_res static unsigned long mremap_to(unsigned long addr, unsigned long old_len, unsigned long new_addr, unsigned long new_len, bool *locked, - struct vm_userfaultfd_ctx *uf, + unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { @@ -505,7 +526,7 @@ static unsigned long mremap_to(unsigned struct vm_area_struct *vma; unsigned long ret = -EINVAL; unsigned long charged = 0; - unsigned long map_flags; + unsigned long map_flags = 0; if (offset_in_page(new_addr)) goto out; @@ -534,9 +555,11 @@ static unsigned long mremap_to(unsigned if ((mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM; - ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); - if (ret) - goto out; + if (flags & MREMAP_FIXED) { + ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); + if (ret) + goto out; + } if (old_len >= new_len) { ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap); @@ -545,13 +568,22 @@ static unsigned long mremap_to(unsigned old_len = new_len; } - vma = vma_to_resize(addr, old_len, new_len, &charged); + vma = vma_to_resize(addr, old_len, new_len, flags, &charged); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto out; } - map_flags = MAP_FIXED; + /* MREMAP_DONTUNMAP expands by old_len since old_len == new_len */ + if (flags & MREMAP_DONTUNMAP && + !may_expand_vm(mm, vma->vm_flags, old_len >> PAGE_SHIFT)) { + ret = -ENOMEM; + goto out; + } + + if (flags & MREMAP_FIXED) + map_flags |= MAP_FIXED; + if (vma->vm_flags & VM_MAYSHARE) map_flags |= MAP_SHARED; @@ -561,10 +593,16 @@ static unsigned long mremap_to(unsigned if (IS_ERR_VALUE(ret)) goto out1; - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf, + /* We got a new mapping */ + if (!(flags & MREMAP_FIXED)) + new_addr = ret; + + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap); + if (!(offset_in_page(ret))) goto out; + out1: vm_unacct_memory(charged); @@ -618,12 +656,21 @@ SYSCALL_DEFINE5(mremap, unsigned long, a */ addr = untagged_addr(addr); - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) return ret; + /* + * MREMAP_DONTUNMAP is always a move and it does not allow resizing + * in the process. + */ + if (flags & MREMAP_DONTUNMAP && + (!(flags & MREMAP_MAYMOVE) || old_len != new_len)) + return ret; + + if (offset_in_page(addr)) return ret; @@ -641,9 +688,10 @@ SYSCALL_DEFINE5(mremap, unsigned long, a if (down_write_killable(¤t->mm->mmap_sem)) return -EINTR; - if (flags & MREMAP_FIXED) { + if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) { ret = mremap_to(addr, old_len, new_addr, new_len, - &locked, &uf, &uf_unmap_early, &uf_unmap); + &locked, flags, &uf, &uf_unmap_early, + &uf_unmap); goto out; } @@ -671,7 +719,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, a /* * Ok, we need to grow.. */ - vma = vma_to_resize(addr, old_len, new_len, &charged); + vma = vma_to_resize(addr, old_len, new_len, flags, &charged); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto out; @@ -721,7 +769,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, a } ret = move_vma(vma, addr, old_len, new_len, new_addr, - &locked, &uf, &uf_unmap); + &locked, flags, &uf, &uf_unmap); } out: if (offset_in_page(ret)) { _