From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47228C433DF for ; Wed, 19 Aug 2020 14:17:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CF69120639 for ; Wed, 19 Aug 2020 14:17:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="Hil5WHSM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CF69120639 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 719C56B00B0; Wed, 19 Aug 2020 10:17:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6EF7F6B00B1; Wed, 19 Aug 2020 10:17:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B7828D0041; Wed, 19 Aug 2020 10:17:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id 428C46B00B0 for ; Wed, 19 Aug 2020 10:17:34 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id EEC3D180AD811 for ; Wed, 19 Aug 2020 14:17:33 +0000 (UTC) X-FDA: 77167521186.19.sun39_2b0eb7727028 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id B610C1AD1B4 for ; Wed, 19 Aug 2020 14:17:33 +0000 (UTC) X-HE-Tag: sun39_2b0eb7727028 X-Filterd-Recvd-Size: 28922 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Wed, 19 Aug 2020 14:17:32 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id t10so10878807plz.10 for ; Wed, 19 Aug 2020 07:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1EsQvfY1soCeX3L9z3HNA7ftSiYI//1ECgSsZU6cJ1k=; b=Hil5WHSMDlh7Vb+UJqyrUaMLK3cd+ZrvlenMIYs2OyJOics0qv4qR/iikvLkXQsV8d K00YvA9P/S9LPXWMFOzepu7KkMFwkY5peDYgeLIuC19hso1350jHtiIqkF3fqhmQ/84b hy/lTNTA2bO9oFYEFt0RLSwB9gqoejA7xlLz+tHEaG19am88JvQS6tw9sdr+Q1LF8mBp cpvRpmFoEcV0S+PiigPS+2hUJH7OoG7t1K6cl5oTd0NdxUx73gxc5qEAwDAM/n6wV6D6 zi5eDIRUHVNTJIYg2TwUvZ8YXu9Ysvh3lSKnNLqGpUrxcOwR7BIeVeVtEcSv19UhZh6f 751Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1EsQvfY1soCeX3L9z3HNA7ftSiYI//1ECgSsZU6cJ1k=; b=jWzn4VQS/v0W8dydkIKXGi5DmXykK7oLySL2UVxViWWccYbaur7pyd06Xlq7K0ZwFU hWFwMxacNHq7mjPOSCuA4/w5CNBMe8bCyg3Ez6r4i73X0Ch24kot3zCp+Yu9gjhPQMSY R9JJOtBqzfFvU6gyKdb9VoNj2qwXy8cvYZXNTjvbrHJOwZfzpwagRpIEUfJvflr8RRfh exa9aMevRhM7CXbXcgKRgVcV15HQa7mM6pp/jUC/I05/4VwiJiu2xGEIonc2p4SkKCa/ nlnC1SMGtix81TSQ5RCfQSBKBQlk3qwq20v8CRX5QqbNoTGnwAhVvWuPwFuq8C83H8Lx d/tg== X-Gm-Message-State: AOAM532XaByNCYjEOGODMJBK1sVM6EegkpwilWDCWzOaJcBLXXj60HKg ebiLQUqgWsaHt6wpTlkV58zJ+A== X-Google-Smtp-Source: ABdhPJz/Vs5mpPyFcZrCJJnRLe3P9+gS4Tt8/EMKi1eC/CKen2k52s4QZBw7LW6qkkAknaK3fNTQSA== X-Received: by 2002:a17:90b:145:: with SMTP id em5mr4065539pjb.236.1597846651755; Wed, 19 Aug 2020 07:17:31 -0700 (PDT) Received: from nagraj.local ([49.206.21.239]) by smtp.gmail.com with ESMTPSA id f43sm3285017pjg.35.2020.08.19.07.17.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Aug 2020 07:17:30 -0700 (PDT) From: Sumit Semwal To: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexey Dobriyan , Jonathan Corbet Cc: Mauro Carvalho Chehab , Kees Cook , Michal Hocko , Colin Cross , Alexey Gladkov , Matthew Wilcox , Jason Gunthorpe , "Kirill A . Shutemov" , Michel Lespinasse , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Song Liu , Huang Ying , Vlastimil Babka , Yang Shi , chenqiwu , Mathieu Desnoyers , John Hubbard , Thomas Hellstrom , Mike Christie , Bart Van Assche , Amit Pundir , Thomas Gleixner , Christian Brauner , Daniel Jordan , Adrian Reber , Nicolas Viennot , Al Viro , Thomas Cedeno , linux-fsdevel@vger.kernel.org, Pekka Enberg , Dave Hansen , Peter Zijlstra , Ingo Molnar , Oleg Nesterov , "Eric W. Biederman" , Jan Glauber , John Stultz , Rob Landley , Cyrill Gorcunov , "Serge E. Hallyn" , David Rientjes , Hugh Dickins , Rik van Riel , Mel Gorman , Tang Chen , Robin Holt , Shaohua Li , Sasha Levin , Johannes Weiner , Minchan Kim , Sumit Semwal Subject: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Date: Wed, 19 Aug 2020 19:46:50 +0530 Message-Id: <20200819141650.7462-3-sumit.semwal@linaro.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200819141650.7462-1-sumit.semwal@linaro.org> References: <20200819141650.7462-1-sumit.semwal@linaro.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: B610C1AD1B4 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Colin Cross In many userspace applications, and especially in VM based applications like Android uses heavily, there are multiple different allocators in use= . At a minimum there is libc malloc and the stack, and in many cases there are libc malloc, the stack, direct syscalls to mmap anonymous memory, and multiple VM heaps (one for small objects, one for big objects, etc.). Each of these layers usually has its own tools to inspect its usage; malloc by compiling a debug version, the VM through heap inspection tools= , and for direct syscalls there is usually no way to track them. On Android we heavily use a set of tools that use an extended version of the logic covered in Documentation/vm/pagemap.txt to walk all pages mappe= d in userspace and slice their usage by process, shared (COW) vs. unique mappings, backing, etc. This can account for real physical memory usage even in cases like fork without exec (which Android uses heavily to share as many private COW pages as possible between processes), Kernel SamePage Merging, and clean zero pages. It produces a measurement of the pages that only exist in that process (USS, for unique), and a measurement of the physical memory usage of that process with the cost of shared pages being evenly split between processes that share them (PSS). If all anonymous memory is indistinguishable then figuring out the real physical memory usage (PSS) of each heap requires either a pagemap walkin= g tool that can understand the heap debugging of every layer, or for every layer's heap debugging tools to implement the pagemap walking logic, in which case it is hard to get a consistent view of memory across the whole system. Tracking the information in userspace leads to all sorts of problems. It either needs to be stored inside the process, which means every process has to have an API to export its current heap information upon request, or it has to be stored externally in a filesystem that somebody needs to clean up on crashes. It needs to be readable while the process is still running, so it has to have some sort of synchronization with every layer of userspace. Efficiently tracking the ranges requires reimplementing something like the kernel vma trees, and linking to it from every layer of userspace. It requires more memory, more syscalls, more runtime cost, and more complexity to separately track regions that the kernel is already tracking. This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a userspace-provided name for anonymous vmas. The names of named anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:]. Userspace can set the name for a region of memory by calling prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name); Setting the name to NULL clears it. The name is stored in a user pointer in the shared union in vm_area_struc= t that points to a null terminated string inside the user process. vmas that point to the same address and are otherwise mergeable will be merged= , but vmas that point to equivalent strings at different addresses will not be merged. The idea to store a userspace pointer to reduce the complexity within mm (at the expense of the complexity of reading /proc/pid/mem) came from Dav= e Hansen. This results in no runtime overhead in the mm subsystem other than comparing the anon_name pointers when considering vma merging. The pointer is stored in a union with fields that are only used on file-backe= d mappings, so it does not increase memory usage. (Upstream changed to remove the union, so this patch adds it back as well= ) Signed-off-by: Colin Cross Cc: Pekka Enberg Cc: Dave Hansen Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Jan Glauber Cc: John Stultz Cc: Rob Landley Cc: Cyrill Gorcunov Cc: Kees Cook Cc: "Serge E. Hallyn" Cc: David Rientjes Cc: Al Viro Cc: Hugh Dickins Cc: Rik van Riel Cc: Mel Gorman Cc: Michel Lespinasse Cc: Tang Chen Cc: Robin Holt Cc: Shaohua Li Cc: Sasha Levin Cc: Johannes Weiner Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Sumit Semwal --- Documentation/filesystems/proc.rst | 2 ++ fs/proc/task_mmu.c | 24 ++++++++++++- include/linux/mm.h | 5 ++- include/linux/mm_types.h | 23 +++++++++++-- include/uapi/linux/prctl.h | 3 ++ kernel/sys.c | 32 ++++++++++++++++++ mm/interval_tree.c | 34 +++++++++---------- mm/madvise.c | 54 +++++++++++++++++++++++++++--- mm/mempolicy.c | 3 +- mm/mlock.c | 2 +- mm/mmap.c | 38 ++++++++++++--------- mm/mprotect.c | 2 +- 12 files changed, 177 insertions(+), 45 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesyste= ms/proc.rst index 533c79e8d2cd..41a9cea73b8b 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -429,6 +429,8 @@ is not associated with a file: [stack] the stack of the main process [vdso] the "virtual dynamic shared object", the kernel system call handler +[anon:] an anonymous mapping that has been + named by userspace =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D =20 or if empty, the mapping is anonymous. diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5066b0251ed8..136fd3c3ad7b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm, return mm->total_vm; } =20 +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct= *vma) +{ + struct mm_struct *mm =3D vma->vm_mm; + char anon_name[NAME_MAX + 1]; + int n; + + n =3D access_remote_vm(mm, (unsigned long)vma_anon_name(vma), + anon_name, NAME_MAX, 0); + if (n > 0) { + seq_puts(m, "[anon:"); + seq_write(m, anon_name, strnlen(anon_name, n)); + seq_putc(m, ']'); + } +} + #ifdef CONFIG_NUMA /* * Save get_task_policy() for show_numa_map(). @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_stru= ct *vma) goto done; } =20 - if (is_stack(vma)) + if (is_stack(vma)) { name =3D "[stack]"; + goto done; + } + + if (vma_anon_name(vma)) { + seq_pad(m, ' '); + seq_print_vma_name(m, vma); + } } =20 done: diff --git a/include/linux/mm.h b/include/linux/mm.h index 1983e08f5906..c64171529254 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct = *vma, unsigned long start, extern struct vm_area_struct *vma_merge(struct mm_struct *, struct vm_area_struct *prev, unsigned long addr, unsigned long end, unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t, - struct mempolicy *, struct vm_userfaultfd_ctx); + struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *); extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *)= ; extern int __split_vma(struct mm_struct *, struct vm_area_struct *, unsigned long addr, int new_below); @@ -3123,5 +3123,8 @@ unsigned long wp_shared_mapping_range(struct addres= s_space *mapping, =20 extern int sysctl_nr_trim_pages; =20 +int madvise_set_anon_name(unsigned long start, unsigned long len_in, + unsigned long name_addr); + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 496c3ff97cce..ac8d687ebfb5 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -336,10 +336,18 @@ struct vm_area_struct { /* * For areas with an address space and backing store, * linkage into the address_space->i_mmap interval tree. + * + * For private anonymous mappings, a pointer to a null terminated strin= g + * in the user process containing the name given to the vma, or NULL + * if unnamed. */ - struct { - struct rb_node rb; - unsigned long rb_subtree_last; + + union { + struct { + struct rb_node rb; + unsigned long rb_subtree_last; + } interval; + const char __user *anon_name; } shared; =20 /* @@ -772,4 +780,13 @@ typedef struct { unsigned long val; } swp_entry_t; =20 +/* Return the name for an anonymous mapping or NULL for a file-backed ma= pping */ +static inline const char __user *vma_anon_name(struct vm_area_struct *vm= a) +{ + if (vma->vm_file) + return NULL; + + return vma->shared.anon_name; +} + #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 07b4f8131e36..10773270f67b 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -238,4 +238,7 @@ struct prctl_mm_map { #define PR_SET_IO_FLUSHER 57 #define PR_GET_IO_FLUSHER 58 =20 +#define PR_SET_VMA 0x53564d41 +# define PR_SET_VMA_ANON_NAME 0 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index ca11af9d815d..da90837b5ccd 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2280,6 +2280,35 @@ int __weak arch_prctl_spec_ctrl_set(struct task_st= ruct *t, unsigned long which, =20 #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE) =20 +#ifdef CONFIG_MMU +static int prctl_set_vma(unsigned long opt, unsigned long addr, + unsigned long len, unsigned long arg) +{ + struct mm_struct *mm =3D current->mm; + int error; + + mmap_write_lock(mm); + + switch (opt) { + case PR_SET_VMA_ANON_NAME: + error =3D madvise_set_anon_name(addr, len, arg); + break; + default: + error =3D -EINVAL; + } + + mmap_write_unlock(mm); + + return error; +} +#else /* CONFIG_MMU */ +static int prctl_set_vma(unsigned long opt, unsigned long start, + unsigned long len_in, unsigned long arg) +{ + return -EINVAL; +} +#endif + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, = arg3, unsigned long, arg4, unsigned long, arg5) { @@ -2530,6 +2559,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, = arg2, unsigned long, arg3, =20 error =3D (current->flags & PR_IO_FLUSHER) =3D=3D PR_IO_FLUSHER; break; + case PR_SET_VMA: + error =3D prctl_set_vma(arg2, arg3, arg4, arg5); + break; default: error =3D -EINVAL; break; diff --git a/mm/interval_tree.c b/mm/interval_tree.c index 11c75fb07584..d684ce0762cd 100644 --- a/mm/interval_tree.c +++ b/mm/interval_tree.c @@ -20,8 +20,8 @@ static inline unsigned long vma_last_pgoff(struct vm_ar= ea_struct *v) return v->vm_pgoff + vma_pages(v) - 1; } =20 -INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb, - unsigned long, shared.rb_subtree_last, +INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.interval.rb, + unsigned long, shared.interval.rb_subtree_last, vma_start_pgoff, vma_last_pgoff,, vma_interval_tree) =20 /* Insert node immediately after prev in the interval tree */ @@ -35,26 +35,26 @@ void vma_interval_tree_insert_after(struct vm_area_st= ruct *node, =20 VM_BUG_ON_VMA(vma_start_pgoff(node) !=3D vma_start_pgoff(prev), node); =20 - if (!prev->shared.rb.rb_right) { + if (!prev->shared.interval.rb.rb_right) { parent =3D prev; - link =3D &prev->shared.rb.rb_right; + link =3D &prev->shared.interval.rb.rb_right; } else { - parent =3D rb_entry(prev->shared.rb.rb_right, - struct vm_area_struct, shared.rb); - if (parent->shared.rb_subtree_last < last) - parent->shared.rb_subtree_last =3D last; - while (parent->shared.rb.rb_left) { - parent =3D rb_entry(parent->shared.rb.rb_left, - struct vm_area_struct, shared.rb); - if (parent->shared.rb_subtree_last < last) - parent->shared.rb_subtree_last =3D last; + parent =3D rb_entry(prev->shared.interval.rb.rb_right, + struct vm_area_struct, shared.interval.rb); + if (parent->shared.interval.rb_subtree_last < last) + parent->shared.interval.rb_subtree_last =3D last; + while (parent->shared.interval.rb.rb_left) { + parent =3D rb_entry(parent->shared.interval.rb.rb_left, + struct vm_area_struct, shared.interval.rb); + if (parent->shared.interval.rb_subtree_last < last) + parent->shared.interval.rb_subtree_last =3D last; } - link =3D &parent->shared.rb.rb_left; + link =3D &parent->shared.interval.rb.rb_left; } =20 - node->shared.rb_subtree_last =3D last; - rb_link_node(&node->shared.rb, &parent->shared.rb, link); - rb_insert_augmented(&node->shared.rb, &root->rb_root, + node->shared.interval.rb_subtree_last =3D last; + rb_link_node(&node->shared.interval.rb, &parent->shared.interval.rb, li= nk); + rb_insert_augmented(&node->shared.interval.rb, &root->rb_root, &vma_interval_tree_augment); } =20 diff --git a/mm/madvise.c b/mm/madvise.c index 84482c21b029..7da8493fa6d3 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -65,13 +65,14 @@ static int madvise_need_mmap_write(int behavior) */ static int madvise_update_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, - unsigned long end, unsigned long new_flags) + unsigned long end, unsigned long new_flags, + const char __user *new_anon_name) { struct mm_struct *mm =3D vma->vm_mm; int error; pgoff_t pgoff; =20 - if (new_flags =3D=3D vma->vm_flags) { + if (new_flags =3D=3D vma->vm_flags && new_anon_name =3D=3D vma_anon_nam= e(vma)) { *prev =3D vma; return 0; } @@ -79,7 +80,7 @@ static int madvise_update_vma(struct vm_area_struct *vm= a, pgoff =3D vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *prev =3D vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx); + vma->vm_userfaultfd_ctx, new_anon_name); if (*prev) { vma =3D *prev; goto success; @@ -112,10 +113,30 @@ static int madvise_update_vma(struct vm_area_struct= *vma, * vm_flags is protected by the mmap_lock held in write mode. */ vma->vm_flags =3D new_flags; + if (!vma->vm_file) + vma->shared.anon_name =3D new_anon_name; =20 return 0; } =20 +static int madvise_vma_anon_name(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end, + unsigned long name_addr) +{ + int error; + + /* Only anonymous mappings can be named */ + if (vma->vm_file) + return -EINVAL; + + error =3D madvise_update_vma(vma, prev, start, end, vma->vm_flags, + (const char __user *)name_addr); + if (error =3D=3D -ENOMEM) + error =3D -EAGAIN; + return error; +} + #ifdef CONFIG_SWAP static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *walk) @@ -877,7 +898,8 @@ static int madvise_vma_behavior(struct vm_area_struct= *vma, break; } =20 - error =3D madvise_update_vma(vma, prev, start, end, new_flags); + error =3D madvise_update_vma(vma, prev, start, end, new_flags, + vma_anon_name(vma)); =20 out: if (error =3D=3D -ENOMEM) @@ -1059,6 +1081,30 @@ int madvise_walk_vmas(unsigned long start, unsigne= d long end, return unmapped_error; } =20 +int madvise_set_anon_name(unsigned long start, unsigned long len_in, + unsigned long name_addr) +{ + unsigned long end; + unsigned long len; + + if (start & ~PAGE_MASK) + return -EINVAL; + len =3D (len_in + ~PAGE_MASK) & PAGE_MASK; + + /* Check to see whether len was rounded up from small -ve to zero */ + if (len_in && !len) + return -EINVAL; + + end =3D start + len; + if (end < start) + return -EINVAL; + + if (end =3D=3D start) + return 0; + + return madvise_walk_vmas(start, end, name_addr, madvise_vma_anon_name); +} + /* * The madvise(2) system call. * diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eddbe4e56c73..94338d9bfe57 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -829,7 +829,8 @@ static int mbind_range(struct mm_struct *mm, unsigned= long start, ((vmstart - vma->vm_start) >> PAGE_SHIFT); prev =3D vma_merge(mm, prev, vmstart, vmend, vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff, - new_pol, vma->vm_userfaultfd_ctx); + new_pol, vma->vm_userfaultfd_ctx, + vma_anon_name(vma)); if (prev) { vma =3D prev; next =3D vma->vm_next; diff --git a/mm/mlock.c b/mm/mlock.c index 93ca2bf30b4f..8e0046c4642f 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -534,7 +534,7 @@ static int mlock_fixup(struct vm_area_struct *vma, st= ruct vm_area_struct **prev, pgoff =3D vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *prev =3D vma_merge(mm, *prev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx); + vma->vm_userfaultfd_ctx, vma_anon_name(vma)); if (*prev) { vma =3D *prev; goto success; diff --git a/mm/mmap.c b/mm/mmap.c index 40248d84ad5f..8f3cd352a48f 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -987,7 +987,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned= long start, */ static inline int is_mergeable_vma(struct vm_area_struct *vma, struct file *file, unsigned long vm_flags, - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + const char __user *anon_name) { /* * VM_SOFTDIRTY should not prevent from VMA merging, if we @@ -1005,6 +1006,8 @@ static inline int is_mergeable_vma(struct vm_area_s= truct *vma, return 0; if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx)) return 0; + if (vma_anon_name(vma) !=3D anon_name) + return 0; return 1; } =20 @@ -1037,9 +1040,10 @@ static int can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags, struct anon_vma *anon_vma, struct file *file, pgoff_t vm_pgoff, - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + const char __user *anon_name) { - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) && + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name= ) && is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { if (vma->vm_pgoff =3D=3D vm_pgoff) return 1; @@ -1058,9 +1062,10 @@ static int can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, struct anon_vma *anon_vma, struct file *file, pgoff_t vm_pgoff, - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + const char __user *anon_name) { - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) && + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name= ) && is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { pgoff_t vm_pglen; vm_pglen =3D vma_pages(vma); @@ -1071,9 +1076,9 @@ can_vma_merge_after(struct vm_area_struct *vma, uns= igned long vm_flags, } =20 /* - * Given a mapping request (addr,end,vm_flags,file,pgoff), figure out - * whether that can be merged with its predecessor or its successor. - * Or both (it neatly fills a hole). + * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name), + * figure out whether that can be merged with its predecessor or its + * successor. Or both (it neatly fills a hole). * * In most cases - when called for mmap, brk or mremap - [addr,end) is * certain not to be mapped by the time vma_merge is called; but when @@ -1118,7 +1123,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *= mm, unsigned long end, unsigned long vm_flags, struct anon_vma *anon_vma, struct file *file, pgoff_t pgoff, struct mempolicy *policy, - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, + const char __user *anon_name) { pgoff_t pglen =3D (end - addr) >> PAGE_SHIFT; struct vm_area_struct *area, *next; @@ -1151,7 +1157,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *= mm, mpol_equal(vma_policy(prev), policy) && can_vma_merge_after(prev, vm_flags, anon_vma, file, pgoff, - vm_userfaultfd_ctx)) { + vm_userfaultfd_ctx, anon_name)) { /* * OK, it can. Can we now merge in the successor as well? */ @@ -1160,7 +1166,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *= mm, can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx) && + vm_userfaultfd_ctx, anon_name) && is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) { /* cases 1, 6 */ @@ -1183,7 +1189,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *= mm, mpol_equal(policy, vma_policy(next)) && can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx)) { + vm_userfaultfd_ctx, anon_name)) { if (prev && addr < prev->vm_end) /* case 4 */ err =3D __vma_adjust(prev, prev->vm_start, addr, prev->vm_pgoff, NULL, next); @@ -1731,7 +1737,7 @@ unsigned long mmap_region(struct file *file, unsign= ed long addr, * Can we just expand an old mapping? */ vma =3D vma_merge(mm, prev, addr, addr + len, vm_flags, - NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX); + NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, NULL); if (vma) goto out; =20 @@ -1779,7 +1785,7 @@ unsigned long mmap_region(struct file *file, unsign= ed long addr, */ if (unlikely(vm_flags !=3D vma->vm_flags && prev)) { merge =3D vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_fla= gs, - NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX); + NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL); if (merge) { fput(file); vm_area_free(vma); @@ -3063,7 +3069,7 @@ static int do_brk_flags(unsigned long addr, unsigne= d long len, unsigned long fla =20 /* Can we just expand an old private anonymous mapping? */ vma =3D vma_merge(mm, prev, addr, addr + len, flags, - NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX); + NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL); if (vma) goto out; =20 @@ -3262,7 +3268,7 @@ struct vm_area_struct *copy_vma(struct vm_area_stru= ct **vmap, return NULL; /* should never get here */ new_vma =3D vma_merge(mm, prev, addr, addr + len, vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx); + vma->vm_userfaultfd_ctx, vma_anon_name(vma)); if (new_vma) { /* * Source vma may have been merged into new_vma diff --git a/mm/mprotect.c b/mm/mprotect.c index ce8b8a5eacbb..d90c349a3fd9 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -454,7 +454,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_= area_struct **pprev, pgoff =3D vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *pprev =3D vma_merge(mm, *pprev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx); + vma->vm_userfaultfd_ctx, vma_anon_name(vma)); if (*pprev) { vma =3D *pprev; VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY); --=20 2.28.0