From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADB4BC433E1 for ; Wed, 19 Aug 2020 14:37:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 226E720738 for ; Wed, 19 Aug 2020 14:37:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 226E720738 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5B0EC6B00B5; Wed, 19 Aug 2020 10:37:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 539F06B00B6; Wed, 19 Aug 2020 10:37:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38E916B00B7; Wed, 19 Aug 2020 10:37:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id 185336B00B5 for ; Wed, 19 Aug 2020 10:37:06 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B8FF53628 for ; Wed, 19 Aug 2020 14:37:05 +0000 (UTC) X-FDA: 77167570410.05.wash80_090a2f427028 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 87F8B1801A08A for ; Wed, 19 Aug 2020 14:37:05 +0000 (UTC) X-HE-Tag: wash80_090a2f427028 X-Filterd-Recvd-Size: 28301 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Wed, 19 Aug 2020 14:37:04 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 95F42B593; Wed, 19 Aug 2020 14:37:29 +0000 (UTC) Date: Wed, 19 Aug 2020 16:37:01 +0200 From: Michal Hocko To: Sumit Semwal Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexey Dobriyan , Jonathan Corbet , Mauro Carvalho Chehab , Kees Cook , Colin Cross , Alexey Gladkov , Matthew Wilcox , Jason Gunthorpe , "Kirill A . Shutemov" , Michel Lespinasse , Michal =?iso-8859-1?Q?Koutn=FD?= , Song Liu , Huang Ying , Vlastimil Babka , Yang Shi , chenqiwu , Mathieu Desnoyers , John Hubbard , Thomas Hellstrom , Mike Christie , Bart Van Assche , Amit Pundir , Thomas Gleixner , Christian Brauner , Daniel Jordan , Adrian Reber , Nicolas Viennot , Al Viro , Thomas Cedeno , linux-fsdevel@vger.kernel.org, Pekka Enberg , Dave Hansen , Peter Zijlstra , Ingo Molnar , Oleg Nesterov , "Eric W. Biederman" , Jan Glauber , John Stultz , Rob Landley , Cyrill Gorcunov , "Serge E. Hallyn" , David Rientjes , Hugh Dickins , Rik van Riel , Mel Gorman , Tang Chen , Robin Holt , Shaohua Li , Sasha Levin , Johannes Weiner , Minchan Kim , linux-api@vger.kernel.org Subject: Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Message-ID: <20200819143701.GT5422@dhcp22.suse.cz> References: <20200819141650.7462-1-sumit.semwal@linaro.org> <20200819141650.7462-3-sumit.semwal@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200819141650.7462-3-sumit.semwal@linaro.org> X-Rspamd-Queue-Id: 87F8B1801A08A X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [Cc linux-api] On Wed 19-08-20 19:46:50, Sumit Semwal wrote: > From: Colin Cross > > In many userspace applications, and especially in VM based applications > like Android uses heavily, there are multiple different allocators in use. > At a minimum there is libc malloc and the stack, and in many cases there > are libc malloc, the stack, direct syscalls to mmap anonymous memory, and > multiple VM heaps (one for small objects, one for big objects, etc.). > Each of these layers usually has its own tools to inspect its usage; > malloc by compiling a debug version, the VM through heap inspection tools, > and for direct syscalls there is usually no way to track them. > > On Android we heavily use a set of tools that use an extended version of > the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped > in userspace and slice their usage by process, shared (COW) vs. unique > mappings, backing, etc. This can account for real physical memory usage > even in cases like fork without exec (which Android uses heavily to share > as many private COW pages as possible between processes), Kernel SamePage > Merging, and clean zero pages. It produces a measurement of the pages > that only exist in that process (USS, for unique), and a measurement of > the physical memory usage of that process with the cost of shared pages > being evenly split between processes that share them (PSS). > > If all anonymous memory is indistinguishable then figuring out the real > physical memory usage (PSS) of each heap requires either a pagemap walking > tool that can understand the heap debugging of every layer, or for every > layer's heap debugging tools to implement the pagemap walking logic, in > which case it is hard to get a consistent view of memory across the whole > system. > > Tracking the information in userspace leads to all sorts of problems. > It either needs to be stored inside the process, which means every > process has to have an API to export its current heap information upon > request, or it has to be stored externally in a filesystem that > somebody needs to clean up on crashes. It needs to be readable while > the process is still running, so it has to have some sort of > synchronization with every layer of userspace. Efficiently tracking > the ranges requires reimplementing something like the kernel vma > trees, and linking to it from every layer of userspace. It requires > more memory, more syscalls, more runtime cost, and more complexity to > separately track regions that the kernel is already tracking. > > This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a > userspace-provided name for anonymous vmas. The names of named anonymous > vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:]. > > Userspace can set the name for a region of memory by calling > prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name); > Setting the name to NULL clears it. > > The name is stored in a user pointer in the shared union in vm_area_struct > that points to a null terminated string inside the user process. vmas > that point to the same address and are otherwise mergeable will be merged, > but vmas that point to equivalent strings at different addresses will not > be merged. > > The idea to store a userspace pointer to reduce the complexity within mm > (at the expense of the complexity of reading /proc/pid/mem) came from Dave > Hansen. This results in no runtime overhead in the mm subsystem other > than comparing the anon_name pointers when considering vma merging. The > pointer is stored in a union with fields that are only used on file-backed > mappings, so it does not increase memory usage. > (Upstream changed to remove the union, so this patch adds it back as well) > > Signed-off-by: Colin Cross > Cc: Pekka Enberg > Cc: Dave Hansen > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Oleg Nesterov > Cc: "Eric W. Biederman" > Cc: Jan Glauber > Cc: John Stultz > Cc: Rob Landley > Cc: Cyrill Gorcunov > Cc: Kees Cook > Cc: "Serge E. Hallyn" > Cc: David Rientjes > Cc: Al Viro > Cc: Hugh Dickins > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Michel Lespinasse > Cc: Tang Chen > Cc: Robin Holt > Cc: Shaohua Li > Cc: Sasha Levin > Cc: Johannes Weiner > Cc: Minchan Kim > Signed-off-by: Andrew Morton > Signed-off-by: Sumit Semwal > --- > Documentation/filesystems/proc.rst | 2 ++ > fs/proc/task_mmu.c | 24 ++++++++++++- > include/linux/mm.h | 5 ++- > include/linux/mm_types.h | 23 +++++++++++-- > include/uapi/linux/prctl.h | 3 ++ > kernel/sys.c | 32 ++++++++++++++++++ > mm/interval_tree.c | 34 +++++++++---------- > mm/madvise.c | 54 +++++++++++++++++++++++++++--- > mm/mempolicy.c | 3 +- > mm/mlock.c | 2 +- > mm/mmap.c | 38 ++++++++++++--------- > mm/mprotect.c | 2 +- > 12 files changed, 177 insertions(+), 45 deletions(-) > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 533c79e8d2cd..41a9cea73b8b 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -429,6 +429,8 @@ is not associated with a file: > [stack] the stack of the main process > [vdso] the "virtual dynamic shared object", > the kernel system call handler > +[anon:] an anonymous mapping that has been > + named by userspace > ======= ==================================== > > or if empty, the mapping is anonymous. > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 5066b0251ed8..136fd3c3ad7b 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm, > return mm->total_vm; > } > > +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma) > +{ > + struct mm_struct *mm = vma->vm_mm; > + char anon_name[NAME_MAX + 1]; > + int n; > + > + n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma), > + anon_name, NAME_MAX, 0); > + if (n > 0) { > + seq_puts(m, "[anon:"); > + seq_write(m, anon_name, strnlen(anon_name, n)); > + seq_putc(m, ']'); > + } > +} > + > #ifdef CONFIG_NUMA > /* > * Save get_task_policy() for show_numa_map(). > @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > goto done; > } > > - if (is_stack(vma)) > + if (is_stack(vma)) { > name = "[stack]"; > + goto done; > + } > + > + if (vma_anon_name(vma)) { > + seq_pad(m, ' '); > + seq_print_vma_name(m, vma); > + } > } > > done: > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 1983e08f5906..c64171529254 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start, > extern struct vm_area_struct *vma_merge(struct mm_struct *, > struct vm_area_struct *prev, unsigned long addr, unsigned long end, > unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t, > - struct mempolicy *, struct vm_userfaultfd_ctx); > + struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *); > extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); > extern int __split_vma(struct mm_struct *, struct vm_area_struct *, > unsigned long addr, int new_below); > @@ -3123,5 +3123,8 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping, > > extern int sysctl_nr_trim_pages; > > +int madvise_set_anon_name(unsigned long start, unsigned long len_in, > + unsigned long name_addr); > + > #endif /* __KERNEL__ */ > #endif /* _LINUX_MM_H */ > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 496c3ff97cce..ac8d687ebfb5 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -336,10 +336,18 @@ struct vm_area_struct { > /* > * For areas with an address space and backing store, > * linkage into the address_space->i_mmap interval tree. > + * > + * For private anonymous mappings, a pointer to a null terminated string > + * in the user process containing the name given to the vma, or NULL > + * if unnamed. > */ > - struct { > - struct rb_node rb; > - unsigned long rb_subtree_last; > + > + union { > + struct { > + struct rb_node rb; > + unsigned long rb_subtree_last; > + } interval; > + const char __user *anon_name; > } shared; > > /* > @@ -772,4 +780,13 @@ typedef struct { > unsigned long val; > } swp_entry_t; > > +/* Return the name for an anonymous mapping or NULL for a file-backed mapping */ > +static inline const char __user *vma_anon_name(struct vm_area_struct *vma) > +{ > + if (vma->vm_file) > + return NULL; > + > + return vma->shared.anon_name; > +} > + > #endif /* _LINUX_MM_TYPES_H */ > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > index 07b4f8131e36..10773270f67b 100644 > --- a/include/uapi/linux/prctl.h > +++ b/include/uapi/linux/prctl.h > @@ -238,4 +238,7 @@ struct prctl_mm_map { > #define PR_SET_IO_FLUSHER 57 > #define PR_GET_IO_FLUSHER 58 > > +#define PR_SET_VMA 0x53564d41 > +# define PR_SET_VMA_ANON_NAME 0 > + > #endif /* _LINUX_PRCTL_H */ > diff --git a/kernel/sys.c b/kernel/sys.c > index ca11af9d815d..da90837b5ccd 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2280,6 +2280,35 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, > > #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE) > > +#ifdef CONFIG_MMU > +static int prctl_set_vma(unsigned long opt, unsigned long addr, > + unsigned long len, unsigned long arg) > +{ > + struct mm_struct *mm = current->mm; > + int error; > + > + mmap_write_lock(mm); > + > + switch (opt) { > + case PR_SET_VMA_ANON_NAME: > + error = madvise_set_anon_name(addr, len, arg); > + break; > + default: > + error = -EINVAL; > + } > + > + mmap_write_unlock(mm); > + > + return error; > +} > +#else /* CONFIG_MMU */ > +static int prctl_set_vma(unsigned long opt, unsigned long start, > + unsigned long len_in, unsigned long arg) > +{ > + return -EINVAL; > +} > +#endif > + > SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > unsigned long, arg4, unsigned long, arg5) > { > @@ -2530,6 +2559,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > > error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER; > break; > + case PR_SET_VMA: > + error = prctl_set_vma(arg2, arg3, arg4, arg5); > + break; > default: > error = -EINVAL; > break; > diff --git a/mm/interval_tree.c b/mm/interval_tree.c > index 11c75fb07584..d684ce0762cd 100644 > --- a/mm/interval_tree.c > +++ b/mm/interval_tree.c > @@ -20,8 +20,8 @@ static inline unsigned long vma_last_pgoff(struct vm_area_struct *v) > return v->vm_pgoff + vma_pages(v) - 1; > } > > -INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb, > - unsigned long, shared.rb_subtree_last, > +INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.interval.rb, > + unsigned long, shared.interval.rb_subtree_last, > vma_start_pgoff, vma_last_pgoff,, vma_interval_tree) > > /* Insert node immediately after prev in the interval tree */ > @@ -35,26 +35,26 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node, > > VM_BUG_ON_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node); > > - if (!prev->shared.rb.rb_right) { > + if (!prev->shared.interval.rb.rb_right) { > parent = prev; > - link = &prev->shared.rb.rb_right; > + link = &prev->shared.interval.rb.rb_right; > } else { > - parent = rb_entry(prev->shared.rb.rb_right, > - struct vm_area_struct, shared.rb); > - if (parent->shared.rb_subtree_last < last) > - parent->shared.rb_subtree_last = last; > - while (parent->shared.rb.rb_left) { > - parent = rb_entry(parent->shared.rb.rb_left, > - struct vm_area_struct, shared.rb); > - if (parent->shared.rb_subtree_last < last) > - parent->shared.rb_subtree_last = last; > + parent = rb_entry(prev->shared.interval.rb.rb_right, > + struct vm_area_struct, shared.interval.rb); > + if (parent->shared.interval.rb_subtree_last < last) > + parent->shared.interval.rb_subtree_last = last; > + while (parent->shared.interval.rb.rb_left) { > + parent = rb_entry(parent->shared.interval.rb.rb_left, > + struct vm_area_struct, shared.interval.rb); > + if (parent->shared.interval.rb_subtree_last < last) > + parent->shared.interval.rb_subtree_last = last; > } > - link = &parent->shared.rb.rb_left; > + link = &parent->shared.interval.rb.rb_left; > } > > - node->shared.rb_subtree_last = last; > - rb_link_node(&node->shared.rb, &parent->shared.rb, link); > - rb_insert_augmented(&node->shared.rb, &root->rb_root, > + node->shared.interval.rb_subtree_last = last; > + rb_link_node(&node->shared.interval.rb, &parent->shared.interval.rb, link); > + rb_insert_augmented(&node->shared.interval.rb, &root->rb_root, > &vma_interval_tree_augment); > } > > diff --git a/mm/madvise.c b/mm/madvise.c > index 84482c21b029..7da8493fa6d3 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -65,13 +65,14 @@ static int madvise_need_mmap_write(int behavior) > */ > static int madvise_update_vma(struct vm_area_struct *vma, > struct vm_area_struct **prev, unsigned long start, > - unsigned long end, unsigned long new_flags) > + unsigned long end, unsigned long new_flags, > + const char __user *new_anon_name) > { > struct mm_struct *mm = vma->vm_mm; > int error; > pgoff_t pgoff; > > - if (new_flags == vma->vm_flags) { > + if (new_flags == vma->vm_flags && new_anon_name == vma_anon_name(vma)) { > *prev = vma; > return 0; > } > @@ -79,7 +80,7 @@ static int madvise_update_vma(struct vm_area_struct *vma, > pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > *prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma, > vma->vm_file, pgoff, vma_policy(vma), > - vma->vm_userfaultfd_ctx); > + vma->vm_userfaultfd_ctx, new_anon_name); > if (*prev) { > vma = *prev; > goto success; > @@ -112,10 +113,30 @@ static int madvise_update_vma(struct vm_area_struct *vma, > * vm_flags is protected by the mmap_lock held in write mode. > */ > vma->vm_flags = new_flags; > + if (!vma->vm_file) > + vma->shared.anon_name = new_anon_name; > > return 0; > } > > +static int madvise_vma_anon_name(struct vm_area_struct *vma, > + struct vm_area_struct **prev, > + unsigned long start, unsigned long end, > + unsigned long name_addr) > +{ > + int error; > + > + /* Only anonymous mappings can be named */ > + if (vma->vm_file) > + return -EINVAL; > + > + error = madvise_update_vma(vma, prev, start, end, vma->vm_flags, > + (const char __user *)name_addr); > + if (error == -ENOMEM) > + error = -EAGAIN; > + return error; > +} > + > #ifdef CONFIG_SWAP > static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, > unsigned long end, struct mm_walk *walk) > @@ -877,7 +898,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, > break; > } > > - error = madvise_update_vma(vma, prev, start, end, new_flags); > + error = madvise_update_vma(vma, prev, start, end, new_flags, > + vma_anon_name(vma)); > > out: > if (error == -ENOMEM) > @@ -1059,6 +1081,30 @@ int madvise_walk_vmas(unsigned long start, unsigned long end, > return unmapped_error; > } > > +int madvise_set_anon_name(unsigned long start, unsigned long len_in, > + unsigned long name_addr) > +{ > + unsigned long end; > + unsigned long len; > + > + if (start & ~PAGE_MASK) > + return -EINVAL; > + len = (len_in + ~PAGE_MASK) & PAGE_MASK; > + > + /* Check to see whether len was rounded up from small -ve to zero */ > + if (len_in && !len) > + return -EINVAL; > + > + end = start + len; > + if (end < start) > + return -EINVAL; > + > + if (end == start) > + return 0; > + > + return madvise_walk_vmas(start, end, name_addr, madvise_vma_anon_name); > +} > + > /* > * The madvise(2) system call. > * > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index eddbe4e56c73..94338d9bfe57 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -829,7 +829,8 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, > ((vmstart - vma->vm_start) >> PAGE_SHIFT); > prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags, > vma->anon_vma, vma->vm_file, pgoff, > - new_pol, vma->vm_userfaultfd_ctx); > + new_pol, vma->vm_userfaultfd_ctx, > + vma_anon_name(vma)); > if (prev) { > vma = prev; > next = vma->vm_next; > diff --git a/mm/mlock.c b/mm/mlock.c > index 93ca2bf30b4f..8e0046c4642f 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -534,7 +534,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev, > pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > *prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma, > vma->vm_file, pgoff, vma_policy(vma), > - vma->vm_userfaultfd_ctx); > + vma->vm_userfaultfd_ctx, vma_anon_name(vma)); > if (*prev) { > vma = *prev; > goto success; > diff --git a/mm/mmap.c b/mm/mmap.c > index 40248d84ad5f..8f3cd352a48f 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -987,7 +987,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, > */ > static inline int is_mergeable_vma(struct vm_area_struct *vma, > struct file *file, unsigned long vm_flags, > - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) > + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, > + const char __user *anon_name) > { > /* > * VM_SOFTDIRTY should not prevent from VMA merging, if we > @@ -1005,6 +1006,8 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma, > return 0; > if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx)) > return 0; > + if (vma_anon_name(vma) != anon_name) > + return 0; > return 1; > } > > @@ -1037,9 +1040,10 @@ static int > can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags, > struct anon_vma *anon_vma, struct file *file, > pgoff_t vm_pgoff, > - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) > + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, > + const char __user *anon_name) > { > - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) && > + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) && > is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { > if (vma->vm_pgoff == vm_pgoff) > return 1; > @@ -1058,9 +1062,10 @@ static int > can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, > struct anon_vma *anon_vma, struct file *file, > pgoff_t vm_pgoff, > - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) > + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, > + const char __user *anon_name) > { > - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) && > + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) && > is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { > pgoff_t vm_pglen; > vm_pglen = vma_pages(vma); > @@ -1071,9 +1076,9 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, > } > > /* > - * Given a mapping request (addr,end,vm_flags,file,pgoff), figure out > - * whether that can be merged with its predecessor or its successor. > - * Or both (it neatly fills a hole). > + * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name), > + * figure out whether that can be merged with its predecessor or its > + * successor. Or both (it neatly fills a hole). > * > * In most cases - when called for mmap, brk or mremap - [addr,end) is > * certain not to be mapped by the time vma_merge is called; but when > @@ -1118,7 +1123,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, > unsigned long end, unsigned long vm_flags, > struct anon_vma *anon_vma, struct file *file, > pgoff_t pgoff, struct mempolicy *policy, > - struct vm_userfaultfd_ctx vm_userfaultfd_ctx) > + struct vm_userfaultfd_ctx vm_userfaultfd_ctx, > + const char __user *anon_name) > { > pgoff_t pglen = (end - addr) >> PAGE_SHIFT; > struct vm_area_struct *area, *next; > @@ -1151,7 +1157,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, > mpol_equal(vma_policy(prev), policy) && > can_vma_merge_after(prev, vm_flags, > anon_vma, file, pgoff, > - vm_userfaultfd_ctx)) { > + vm_userfaultfd_ctx, anon_name)) { > /* > * OK, it can. Can we now merge in the successor as well? > */ > @@ -1160,7 +1166,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, > can_vma_merge_before(next, vm_flags, > anon_vma, file, > pgoff+pglen, > - vm_userfaultfd_ctx) && > + vm_userfaultfd_ctx, anon_name) && > is_mergeable_anon_vma(prev->anon_vma, > next->anon_vma, NULL)) { > /* cases 1, 6 */ > @@ -1183,7 +1189,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, > mpol_equal(policy, vma_policy(next)) && > can_vma_merge_before(next, vm_flags, > anon_vma, file, pgoff+pglen, > - vm_userfaultfd_ctx)) { > + vm_userfaultfd_ctx, anon_name)) { > if (prev && addr < prev->vm_end) /* case 4 */ > err = __vma_adjust(prev, prev->vm_start, > addr, prev->vm_pgoff, NULL, next); > @@ -1731,7 +1737,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > * Can we just expand an old mapping? > */ > vma = vma_merge(mm, prev, addr, addr + len, vm_flags, > - NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX); > + NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, NULL); > if (vma) > goto out; > > @@ -1779,7 +1785,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > */ > if (unlikely(vm_flags != vma->vm_flags && prev)) { > merge = vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags, > - NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX); > + NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL); > if (merge) { > fput(file); > vm_area_free(vma); > @@ -3063,7 +3069,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla > > /* Can we just expand an old private anonymous mapping? */ > vma = vma_merge(mm, prev, addr, addr + len, flags, > - NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX); > + NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL); > if (vma) > goto out; > > @@ -3262,7 +3268,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, > return NULL; /* should never get here */ > new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags, > vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), > - vma->vm_userfaultfd_ctx); > + vma->vm_userfaultfd_ctx, vma_anon_name(vma)); > if (new_vma) { > /* > * Source vma may have been merged into new_vma > diff --git a/mm/mprotect.c b/mm/mprotect.c > index ce8b8a5eacbb..d90c349a3fd9 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -454,7 +454,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, > pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > *pprev = vma_merge(mm, *pprev, start, end, newflags, > vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), > - vma->vm_userfaultfd_ctx); > + vma->vm_userfaultfd_ctx, vma_anon_name(vma)); > if (*pprev) { > vma = *pprev; > VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY); > -- > 2.28.0 > -- Michal Hocko SUSE Labs