linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Anonymous VMA naming patches
@ 2020-08-19 14:16 Sumit Semwal
  2020-08-19 14:16 ` [PATCH v5 1/2] mm: rearrange madvise code to allow for reuse Sumit Semwal
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
  0 siblings, 2 replies; 26+ messages in thread
From: Sumit Semwal @ 2020-08-19 14:16 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan, Jonathan Corbet
  Cc: Mauro Carvalho Chehab, Kees Cook, Michal Hocko, Colin Cross,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Sumit Semwal

Last version v4 of these patches was sent by Colin Cross a long time ago [1]
and [2]. At the time, these patches were not merged, and it looks like they
just fell off the radar since.

In our efforts to run Android on mainline kernels, we realised that since past
some time, this patchset is needed for Android to boot, hence I am re-posting
it to try and get these discussed and hopefully merged.

I have rebased these for v5.9-rc1 and fixed minor updates as required.

[1]: https://lore.kernel.org/linux-mm/1383170047-21074-1-git-send-email-ccross@android.com/
[2]: https://lore.kernel.org/linux-mm/1383170047-21074-2-git-send-email-ccross@android.com/

Best,
Sumit.

Colin Cross (2):
  mm: rearrange madvise code to allow for reuse
  mm: add a field to store names for private anonymous memory

 Documentation/filesystems/proc.rst |   2 +
 fs/proc/task_mmu.c                 |  24 +-
 include/linux/mm.h                 |   5 +-
 include/linux/mm_types.h           |  23 +-
 include/uapi/linux/prctl.h         |   3 +
 kernel/sys.c                       |  32 +++
 mm/interval_tree.c                 |  34 +--
 mm/madvise.c                       | 356 +++++++++++++++++------------
 mm/mempolicy.c                     |   3 +-
 mm/mlock.c                         |   2 +-
 mm/mmap.c                          |  38 +--
 mm/mprotect.c                      |   2 +-
 12 files changed, 340 insertions(+), 184 deletions(-)

-- 
2.28.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 1/2] mm: rearrange madvise code to allow for reuse
  2020-08-19 14:16 [PATCH v5 0/2] Anonymous VMA naming patches Sumit Semwal
@ 2020-08-19 14:16 ` Sumit Semwal
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
  1 sibling, 0 replies; 26+ messages in thread
From: Sumit Semwal @ 2020-08-19 14:16 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan, Jonathan Corbet
  Cc: Mauro Carvalho Chehab, Kees Cook, Michal Hocko, Colin Cross,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Cyrill Gorcunov,
	Serge E. Hallyn, David Rientjes, Hugh Dickins, Rik van Riel,
	Mel Gorman, Tang Chen, Robin Holt, Shaohua Li, Sasha Levin,
	Johannes Weiner, Minchan Kim, Sumit Semwal

From: Colin Cross <ccross@google.com>

Refactor the madvise syscall to allow for parts of it to be reused by a
prctl syscall that affects vmas.

Move the code that walks vmas in a virtual address range into a function
that takes a function pointer as a parameter.  The only caller for now is
sys_madvise, which uses it to call madvise_vma_behavior on each vma, but
the next patch will add an additional caller.

Move handling all vma behaviors inside madvise_behavior, and rename it to
madvise_vma_behavior.

Move the code that updates the flags on a vma, including splitting or
merging the vma as necessary, into a new function called
madvise_update_vma.  The next patch will add support for updating a new
anon_name field as well.

Signed-off-by: Colin Cross <ccross@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Rob Landley <rob@landley.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
  [sumits: rebased over v5.9-rc1]
---
 mm/madvise.c | 312 +++++++++++++++++++++++++++------------------------
 1 file changed, 168 insertions(+), 144 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index dd1d43cf026d..84482c21b029 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -60,76 +60,20 @@ static int madvise_need_mmap_write(int behavior)
 }
 
 /*
- * We can potentially split a vm area into separate
- * areas, each area with its own behavior.
+ * Update the vm_flags on regiion of a vma, splitting it or merging it as
+ * necessary.  Must be called with mmap_sem held for writing;
  */
-static long madvise_behavior(struct vm_area_struct *vma,
-		     struct vm_area_struct **prev,
-		     unsigned long start, unsigned long end, int behavior)
+static int madvise_update_vma(struct vm_area_struct *vma,
+			      struct vm_area_struct **prev, unsigned long start,
+			      unsigned long end, unsigned long new_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	int error = 0;
+	int error;
 	pgoff_t pgoff;
-	unsigned long new_flags = vma->vm_flags;
-
-	switch (behavior) {
-	case MADV_NORMAL:
-		new_flags = new_flags & ~VM_RAND_READ & ~VM_SEQ_READ;
-		break;
-	case MADV_SEQUENTIAL:
-		new_flags = (new_flags & ~VM_RAND_READ) | VM_SEQ_READ;
-		break;
-	case MADV_RANDOM:
-		new_flags = (new_flags & ~VM_SEQ_READ) | VM_RAND_READ;
-		break;
-	case MADV_DONTFORK:
-		new_flags |= VM_DONTCOPY;
-		break;
-	case MADV_DOFORK:
-		if (vma->vm_flags & VM_IO) {
-			error = -EINVAL;
-			goto out;
-		}
-		new_flags &= ~VM_DONTCOPY;
-		break;
-	case MADV_WIPEONFORK:
-		/* MADV_WIPEONFORK is only supported on anonymous memory. */
-		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
-			error = -EINVAL;
-			goto out;
-		}
-		new_flags |= VM_WIPEONFORK;
-		break;
-	case MADV_KEEPONFORK:
-		new_flags &= ~VM_WIPEONFORK;
-		break;
-	case MADV_DONTDUMP:
-		new_flags |= VM_DONTDUMP;
-		break;
-	case MADV_DODUMP:
-		if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) {
-			error = -EINVAL;
-			goto out;
-		}
-		new_flags &= ~VM_DONTDUMP;
-		break;
-	case MADV_MERGEABLE:
-	case MADV_UNMERGEABLE:
-		error = ksm_madvise(vma, start, end, behavior, &new_flags);
-		if (error)
-			goto out_convert_errno;
-		break;
-	case MADV_HUGEPAGE:
-	case MADV_NOHUGEPAGE:
-		error = hugepage_madvise(vma, &new_flags, behavior);
-		if (error)
-			goto out_convert_errno;
-		break;
-	}
 
 	if (new_flags == vma->vm_flags) {
 		*prev = vma;
-		goto out;
+		return 0;
 	}
 
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
@@ -146,21 +90,21 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	if (start != vma->vm_start) {
 		if (unlikely(mm->map_count >= sysctl_max_map_count)) {
 			error = -ENOMEM;
-			goto out;
+			return error;
 		}
 		error = __split_vma(mm, vma, start, 1);
 		if (error)
-			goto out_convert_errno;
+			return error;
 	}
 
 	if (end != vma->vm_end) {
 		if (unlikely(mm->map_count >= sysctl_max_map_count)) {
 			error = -ENOMEM;
-			goto out;
+			return error;
 		}
 		error = __split_vma(mm, vma, end, 0);
 		if (error)
-			goto out_convert_errno;
+			return error;
 	}
 
 success:
@@ -169,15 +113,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	 */
 	vma->vm_flags = new_flags;
 
-out_convert_errno:
-	/*
-	 * madvise() returns EAGAIN if kernel resources, such as
-	 * slab, are temporarily unavailable.
-	 */
-	if (error == -ENOMEM)
-		error = -EAGAIN;
-out:
-	return error;
+	return 0;
 }
 
 #ifdef CONFIG_SWAP
@@ -862,6 +798,93 @@ static long madvise_remove(struct vm_area_struct *vma,
 	return error;
 }
 
+/*
+ * Apply an madvise behavior to a region of a vma.  madvise_update_vma
+ * will handle splitting a vm area into separate areas, each area with its own
+ * behavior.
+ */
+static int madvise_vma_behavior(struct vm_area_struct *vma,
+				struct vm_area_struct **prev,
+				unsigned long start, unsigned long end,
+				unsigned long behavior)
+{
+	int error = 0;
+	unsigned long new_flags = vma->vm_flags;
+
+	switch (behavior) {
+	case MADV_REMOVE:
+		return madvise_remove(vma, prev, start, end);
+	case MADV_WILLNEED:
+		return madvise_willneed(vma, prev, start, end);
+	case MADV_COLD:
+		return madvise_cold(vma, prev, start, end);
+	case MADV_PAGEOUT:
+		return madvise_pageout(vma, prev, start, end);
+	case MADV_FREE:
+	case MADV_DONTNEED:
+		return madvise_dontneed_free(vma, prev, start, end, behavior);
+	case MADV_NORMAL:
+		new_flags = new_flags & ~VM_RAND_READ & ~VM_SEQ_READ;
+		break;
+	case MADV_SEQUENTIAL:
+		new_flags = (new_flags & ~VM_RAND_READ) | VM_SEQ_READ;
+		break;
+	case MADV_RANDOM:
+		new_flags = (new_flags & ~VM_SEQ_READ) | VM_RAND_READ;
+		break;
+	case MADV_DONTFORK:
+		new_flags |= VM_DONTCOPY;
+		break;
+	case MADV_DOFORK:
+		if (vma->vm_flags & VM_IO) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags &= ~VM_DONTCOPY;
+		break;
+	case MADV_WIPEONFORK:
+		/* MADV_WIPEONFORK is only supported on anonymous memory. */
+		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags |= VM_WIPEONFORK;
+		break;
+	case MADV_KEEPONFORK:
+		new_flags &= ~VM_WIPEONFORK;
+		break;
+	case MADV_DONTDUMP:
+		new_flags |= VM_DONTDUMP;
+		break;
+	case MADV_DODUMP:
+		if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags &= ~VM_DONTDUMP;
+		break;
+	case MADV_MERGEABLE:
+	case MADV_UNMERGEABLE:
+		error = ksm_madvise(vma, start, end, behavior, &new_flags);
+		if (error)
+			goto out;
+		break;
+	case MADV_HUGEPAGE:
+	case MADV_NOHUGEPAGE:
+		error = hugepage_madvise(vma, &new_flags, behavior);
+		if (error)
+			goto out;
+		break;
+	}
+
+	error = madvise_update_vma(vma, prev, start, end, new_flags);
+
+out:
+	if (error == -ENOMEM)
+		error = -EAGAIN;
+	return error;
+}
+
 #ifdef CONFIG_MEMORY_FAILURE
 /*
  * Error injection support for memory error handling.
@@ -931,27 +954,6 @@ static int madvise_inject_error(int behavior,
 }
 #endif
 
-static long
-madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
-		unsigned long start, unsigned long end, int behavior)
-{
-	switch (behavior) {
-	case MADV_REMOVE:
-		return madvise_remove(vma, prev, start, end);
-	case MADV_WILLNEED:
-		return madvise_willneed(vma, prev, start, end);
-	case MADV_COLD:
-		return madvise_cold(vma, prev, start, end);
-	case MADV_PAGEOUT:
-		return madvise_pageout(vma, prev, start, end);
-	case MADV_FREE:
-	case MADV_DONTNEED:
-		return madvise_dontneed_free(vma, prev, start, end, behavior);
-	default:
-		return madvise_behavior(vma, prev, start, end, behavior);
-	}
-}
-
 static bool
 madvise_behavior_valid(int behavior)
 {
@@ -990,6 +992,73 @@ madvise_behavior_valid(int behavior)
 	}
 }
 
+/*
+ * Walk the vmas in range [start,end), and call the visit function on each one.
+ * The visit function will get start and end parameters that cover the overlap
+ * between the current vma and the original range.  Any unmapped regions in the
+ * original range will result in this function returning -ENOMEM while still
+ * calling the visit function on all of the existing vmas in the range.
+ * Must be called with the mmap_lock held for reading or writing.
+ */
+static
+int madvise_walk_vmas(unsigned long start, unsigned long end,
+		      unsigned long arg,
+		      int (*visit)(struct vm_area_struct *vma,
+				   struct vm_area_struct **prev, unsigned long start,
+				   unsigned long end, unsigned long arg))
+{
+	struct vm_area_struct *vma;
+	struct vm_area_struct *prev;
+	unsigned long tmp;
+	int unmapped_error = 0;
+
+	/*
+	 * If the interval [start,end) covers some unmapped address
+	 * ranges, just ignore them, but return -ENOMEM at the end.
+	 * - different from the way of handling in mlock etc.
+	 */
+	vma = find_vma_prev(current->mm, start, &prev);
+	if (vma && start > vma->vm_start)
+		prev = vma;
+
+	for (;;) {
+		int error;
+
+		/* Still start < end. */
+		if (!vma)
+			return -ENOMEM;
+
+		/* Here start < (end|vma->vm_end). */
+		if (start < vma->vm_start) {
+			unmapped_error = -ENOMEM;
+			start = vma->vm_start;
+			if (start >= end)
+				break;
+		}
+
+		/* Here vma->vm_start <= start < (end|vma->vm_end) */
+		tmp = vma->vm_end;
+		if (end < tmp)
+			tmp = end;
+
+		/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
+		error = visit(vma, &prev, start, tmp, arg);
+		if (error)
+			return error;
+		start = tmp;
+		if (prev && start < prev->vm_end)
+			start = prev->vm_end;
+		if (start >= end)
+			break;
+		if (prev)
+			vma = prev->vm_next;
+		else	/* madvise_remove dropped mmap_lock */
+			vma = find_vma(current->mm, start);
+	}
+
+	return unmapped_error;
+}
+
 /*
  * The madvise(2) system call.
  *
@@ -1053,9 +1122,7 @@ madvise_behavior_valid(int behavior)
  */
 int do_madvise(unsigned long start, size_t len_in, int behavior)
 {
-	unsigned long end, tmp;
-	struct vm_area_struct *vma, *prev;
-	int unmapped_error = 0;
+	unsigned long end;
 	int error = -EINVAL;
 	int write;
 	size_t len;
@@ -1112,51 +1179,8 @@ int do_madvise(unsigned long start, size_t len_in, int behavior)
 		mmap_read_lock(current->mm);
 	}
 
-	/*
-	 * If the interval [start,end) covers some unmapped address
-	 * ranges, just ignore them, but return -ENOMEM at the end.
-	 * - different from the way of handling in mlock etc.
-	 */
-	vma = find_vma_prev(current->mm, start, &prev);
-	if (vma && start > vma->vm_start)
-		prev = vma;
-
 	blk_start_plug(&plug);
-	for (;;) {
-		/* Still start < end. */
-		error = -ENOMEM;
-		if (!vma)
-			goto out;
-
-		/* Here start < (end|vma->vm_end). */
-		if (start < vma->vm_start) {
-			unmapped_error = -ENOMEM;
-			start = vma->vm_start;
-			if (start >= end)
-				goto out;
-		}
-
-		/* Here vma->vm_start <= start < (end|vma->vm_end) */
-		tmp = vma->vm_end;
-		if (end < tmp)
-			tmp = end;
-
-		/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
-		error = madvise_vma(vma, &prev, start, tmp, behavior);
-		if (error)
-			goto out;
-		start = tmp;
-		if (prev && start < prev->vm_end)
-			start = prev->vm_end;
-		error = unmapped_error;
-		if (start >= end)
-			goto out;
-		if (prev)
-			vma = prev->vm_next;
-		else	/* madvise_remove dropped mmap_lock */
-			vma = find_vma(current->mm, start);
-	}
-out:
+	error = madvise_walk_vmas(start, end, behavior, madvise_vma_behavior);
 	blk_finish_plug(&plug);
 	if (write)
 		mmap_write_unlock(current->mm);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 [PATCH v5 0/2] Anonymous VMA naming patches Sumit Semwal
  2020-08-19 14:16 ` [PATCH v5 1/2] mm: rearrange madvise code to allow for reuse Sumit Semwal
@ 2020-08-19 14:16 ` Sumit Semwal
  2020-08-19 14:37   ` Michal Hocko
                     ` (6 more replies)
  1 sibling, 7 replies; 26+ messages in thread
From: Sumit Semwal @ 2020-08-19 14:16 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan, Jonathan Corbet
  Cc: Mauro Carvalho Chehab, Kees Cook, Michal Hocko, Colin Cross,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Cyrill Gorcunov,
	Serge E. Hallyn, David Rientjes, Hugh Dickins, Rik van Riel,
	Mel Gorman, Tang Chen, Robin Holt, Shaohua Li, Sasha Levin,
	Johannes Weiner, Minchan Kim, Sumit Semwal

From: Colin Cross <ccross@google.com>

In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in use.
 At a minimum there is libc malloc and the stack, and in many cases there
are libc malloc, the stack, direct syscalls to mmap anonymous memory, and
multiple VM heaps (one for small objects, one for big objects, etc.).
Each of these layers usually has its own tools to inspect its usage;
malloc by compiling a debug version, the VM through heap inspection tools,
and for direct syscalls there is usually no way to track them.

On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped
in userspace and slice their usage by process, shared (COW) vs.  unique
mappings, backing, etc.  This can account for real physical memory usage
even in cases like fork without exec (which Android uses heavily to share
as many private COW pages as possible between processes), Kernel SamePage
Merging, and clean zero pages.  It produces a measurement of the pages
that only exist in that process (USS, for unique), and a measurement of
the physical memory usage of that process with the cost of shared pages
being evenly split between processes that share them (PSS).

If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap walking
tool that can understand the heap debugging of every layer, or for every
layer's heap debugging tools to implement the pagemap walking logic, in
which case it is hard to get a consistent view of memory across the whole
system.

Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that
somebody needs to clean up on crashes.  It needs to be readable while
the process is still running, so it has to have some sort of
synchronization with every layer of userspace.  Efficiently tracking
the ranges requires reimplementing something like the kernel vma
trees, and linking to it from every layer of userspace.  It requires
more memory, more syscalls, more runtime cost, and more complexity to
separately track regions that the kernel is already tracking.

This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas.  The names of named anonymous
vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>].

Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.

The name is stored in a user pointer in the shared union in vm_area_struct
that points to a null terminated string inside the user process.  vmas
that point to the same address and are otherwise mergeable will be merged,
but vmas that point to equivalent strings at different addresses will not
be merged.

The idea to store a userspace pointer to reduce the complexity within mm
(at the expense of the complexity of reading /proc/pid/mem) came from Dave
Hansen.  This results in no runtime overhead in the mm subsystem other
than comparing the anon_name pointers when considering vma merging.  The
pointer is stored in a union with fields that are only used on file-backed
mappings, so it does not increase memory usage.
(Upstream changed to remove the union, so this patch adds it back as well)

Signed-off-by: Colin Cross <ccross@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Rob Landley <rob@landley.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
---
 Documentation/filesystems/proc.rst |  2 ++
 fs/proc/task_mmu.c                 | 24 ++++++++++++-
 include/linux/mm.h                 |  5 ++-
 include/linux/mm_types.h           | 23 +++++++++++--
 include/uapi/linux/prctl.h         |  3 ++
 kernel/sys.c                       | 32 ++++++++++++++++++
 mm/interval_tree.c                 | 34 +++++++++----------
 mm/madvise.c                       | 54 +++++++++++++++++++++++++++---
 mm/mempolicy.c                     |  3 +-
 mm/mlock.c                         |  2 +-
 mm/mmap.c                          | 38 ++++++++++++---------
 mm/mprotect.c                      |  2 +-
 12 files changed, 177 insertions(+), 45 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 533c79e8d2cd..41a9cea73b8b 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -429,6 +429,8 @@ is not associated with a file:
  [stack]                    the stack of the main process
  [vdso]                     the "virtual dynamic shared object",
                             the kernel system call handler
+[anon:<name>]               an anonymous mapping that has been
+                            named by userspace
  =======                    ====================================
 
  or if empty, the mapping is anonymous.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5066b0251ed8..136fd3c3ad7b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm,
 	return mm->total_vm;
 }
 
+static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	char anon_name[NAME_MAX + 1];
+	int n;
+
+	n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma),
+			     anon_name, NAME_MAX, 0);
+	if (n > 0) {
+		seq_puts(m, "[anon:");
+		seq_write(m, anon_name, strnlen(anon_name, n));
+		seq_putc(m, ']');
+	}
+}
+
 #ifdef CONFIG_NUMA
 /*
  * Save get_task_policy() for show_numa_map().
@@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 			goto done;
 		}
 
-		if (is_stack(vma))
+		if (is_stack(vma)) {
 			name = "[stack]";
+			goto done;
+		}
+
+		if (vma_anon_name(vma)) {
+			seq_pad(m, ' ');
+			seq_print_vma_name(m, vma);
+		}
 	}
 
 done:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1983e08f5906..c64171529254 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
 extern struct vm_area_struct *vma_merge(struct mm_struct *,
 	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
 	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
-	struct mempolicy *, struct vm_userfaultfd_ctx);
+	struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);
 extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
 extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
 	unsigned long addr, int new_below);
@@ -3123,5 +3123,8 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
 
 extern int sysctl_nr_trim_pages;
 
+int madvise_set_anon_name(unsigned long start, unsigned long len_in,
+			  unsigned long name_addr);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 496c3ff97cce..ac8d687ebfb5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -336,10 +336,18 @@ struct vm_area_struct {
 	/*
 	 * For areas with an address space and backing store,
 	 * linkage into the address_space->i_mmap interval tree.
+	 *
+	 * For private anonymous mappings, a pointer to a null terminated string
+	 * in the user process containing the name given to the vma, or NULL
+	 * if unnamed.
 	 */
-	struct {
-		struct rb_node rb;
-		unsigned long rb_subtree_last;
+
+	union {
+		struct {
+			struct rb_node rb;
+			unsigned long rb_subtree_last;
+		} interval;
+		const char __user *anon_name;
 	} shared;
 
 	/*
@@ -772,4 +780,13 @@ typedef struct {
 	unsigned long val;
 } swp_entry_t;
 
+/* Return the name for an anonymous mapping or NULL for a file-backed mapping */
+static inline const char __user *vma_anon_name(struct vm_area_struct *vma)
+{
+	if (vma->vm_file)
+		return NULL;
+
+	return vma->shared.anon_name;
+}
+
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..10773270f67b 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,7 @@ struct prctl_mm_map {
 #define PR_SET_IO_FLUSHER		57
 #define PR_GET_IO_FLUSHER		58
 
+#define PR_SET_VMA		0x53564d41
+# define PR_SET_VMA_ANON_NAME		0
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index ca11af9d815d..da90837b5ccd 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2280,6 +2280,35 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
 
 #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
 
+#ifdef CONFIG_MMU
+static int prctl_set_vma(unsigned long opt, unsigned long addr,
+			 unsigned long len, unsigned long arg)
+{
+	struct mm_struct *mm = current->mm;
+	int error;
+
+	mmap_write_lock(mm);
+
+	switch (opt) {
+	case PR_SET_VMA_ANON_NAME:
+		error = madvise_set_anon_name(addr, len, arg);
+		break;
+	default:
+		error = -EINVAL;
+	}
+
+	mmap_write_unlock(mm);
+
+	return error;
+}
+#else /* CONFIG_MMU */
+static int prctl_set_vma(unsigned long opt, unsigned long start,
+			 unsigned long len_in, unsigned long arg)
+{
+	return -EINVAL;
+}
+#endif
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2530,6 +2559,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 
 		error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
 		break;
+	case PR_SET_VMA:
+		error = prctl_set_vma(arg2, arg3, arg4, arg5);
+		break;
 	default:
 		error = -EINVAL;
 		break;
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index 11c75fb07584..d684ce0762cd 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -20,8 +20,8 @@ static inline unsigned long vma_last_pgoff(struct vm_area_struct *v)
 	return v->vm_pgoff + vma_pages(v) - 1;
 }
 
-INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
-		     unsigned long, shared.rb_subtree_last,
+INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.interval.rb,
+		     unsigned long, shared.interval.rb_subtree_last,
 		     vma_start_pgoff, vma_last_pgoff,, vma_interval_tree)
 
 /* Insert node immediately after prev in the interval tree */
@@ -35,26 +35,26 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
 
 	VM_BUG_ON_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
 
-	if (!prev->shared.rb.rb_right) {
+	if (!prev->shared.interval.rb.rb_right) {
 		parent = prev;
-		link = &prev->shared.rb.rb_right;
+		link = &prev->shared.interval.rb.rb_right;
 	} else {
-		parent = rb_entry(prev->shared.rb.rb_right,
-				  struct vm_area_struct, shared.rb);
-		if (parent->shared.rb_subtree_last < last)
-			parent->shared.rb_subtree_last = last;
-		while (parent->shared.rb.rb_left) {
-			parent = rb_entry(parent->shared.rb.rb_left,
-				struct vm_area_struct, shared.rb);
-			if (parent->shared.rb_subtree_last < last)
-				parent->shared.rb_subtree_last = last;
+		parent = rb_entry(prev->shared.interval.rb.rb_right,
+				  struct vm_area_struct, shared.interval.rb);
+		if (parent->shared.interval.rb_subtree_last < last)
+			parent->shared.interval.rb_subtree_last = last;
+		while (parent->shared.interval.rb.rb_left) {
+			parent = rb_entry(parent->shared.interval.rb.rb_left,
+					  struct vm_area_struct, shared.interval.rb);
+			if (parent->shared.interval.rb_subtree_last < last)
+				parent->shared.interval.rb_subtree_last = last;
 		}
-		link = &parent->shared.rb.rb_left;
+		link = &parent->shared.interval.rb.rb_left;
 	}
 
-	node->shared.rb_subtree_last = last;
-	rb_link_node(&node->shared.rb, &parent->shared.rb, link);
-	rb_insert_augmented(&node->shared.rb, &root->rb_root,
+	node->shared.interval.rb_subtree_last = last;
+	rb_link_node(&node->shared.interval.rb, &parent->shared.interval.rb, link);
+	rb_insert_augmented(&node->shared.interval.rb, &root->rb_root,
 			    &vma_interval_tree_augment);
 }
 
diff --git a/mm/madvise.c b/mm/madvise.c
index 84482c21b029..7da8493fa6d3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -65,13 +65,14 @@ static int madvise_need_mmap_write(int behavior)
  */
 static int madvise_update_vma(struct vm_area_struct *vma,
 			      struct vm_area_struct **prev, unsigned long start,
-			      unsigned long end, unsigned long new_flags)
+			      unsigned long end, unsigned long new_flags,
+			      const char __user *new_anon_name)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int error;
 	pgoff_t pgoff;
 
-	if (new_flags == vma->vm_flags) {
+	if (new_flags == vma->vm_flags && new_anon_name == vma_anon_name(vma)) {
 		*prev = vma;
 		return 0;
 	}
@@ -79,7 +80,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
 			  vma->vm_file, pgoff, vma_policy(vma),
-			  vma->vm_userfaultfd_ctx);
+			  vma->vm_userfaultfd_ctx, new_anon_name);
 	if (*prev) {
 		vma = *prev;
 		goto success;
@@ -112,10 +113,30 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	 * vm_flags is protected by the mmap_lock held in write mode.
 	 */
 	vma->vm_flags = new_flags;
+	if (!vma->vm_file)
+		vma->shared.anon_name = new_anon_name;
 
 	return 0;
 }
 
+static int madvise_vma_anon_name(struct vm_area_struct *vma,
+				 struct vm_area_struct **prev,
+				 unsigned long start, unsigned long end,
+				 unsigned long name_addr)
+{
+	int error;
+
+	/* Only anonymous mappings can be named */
+	if (vma->vm_file)
+		return -EINVAL;
+
+	error = madvise_update_vma(vma, prev, start, end, vma->vm_flags,
+				   (const char __user *)name_addr);
+	if (error == -ENOMEM)
+		error = -EAGAIN;
+	return error;
+}
+
 #ifdef CONFIG_SWAP
 static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
 	unsigned long end, struct mm_walk *walk)
@@ -877,7 +898,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
 		break;
 	}
 
-	error = madvise_update_vma(vma, prev, start, end, new_flags);
+	error = madvise_update_vma(vma, prev, start, end, new_flags,
+				   vma_anon_name(vma));
 
 out:
 	if (error == -ENOMEM)
@@ -1059,6 +1081,30 @@ int madvise_walk_vmas(unsigned long start, unsigned long end,
 	return unmapped_error;
 }
 
+int madvise_set_anon_name(unsigned long start, unsigned long len_in,
+			  unsigned long name_addr)
+{
+	unsigned long end;
+	unsigned long len;
+
+	if (start & ~PAGE_MASK)
+		return -EINVAL;
+	len = (len_in + ~PAGE_MASK) & PAGE_MASK;
+
+	/* Check to see whether len was rounded up from small -ve to zero */
+	if (len_in && !len)
+		return -EINVAL;
+
+	end = start + len;
+	if (end < start)
+		return -EINVAL;
+
+	if (end == start)
+		return 0;
+
+	return madvise_walk_vmas(start, end, name_addr, madvise_vma_anon_name);
+}
+
 /*
  * The madvise(2) system call.
  *
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index eddbe4e56c73..94338d9bfe57 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -829,7 +829,8 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			((vmstart - vma->vm_start) >> PAGE_SHIFT);
 		prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
 				 vma->anon_vma, vma->vm_file, pgoff,
-				 new_pol, vma->vm_userfaultfd_ctx);
+				 new_pol, vma->vm_userfaultfd_ctx,
+				 vma_anon_name(vma));
 		if (prev) {
 			vma = prev;
 			next = vma->vm_next;
diff --git a/mm/mlock.c b/mm/mlock.c
index 93ca2bf30b4f..8e0046c4642f 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -534,7 +534,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
 			  vma->vm_file, pgoff, vma_policy(vma),
-			  vma->vm_userfaultfd_ctx);
+			  vma->vm_userfaultfd_ctx, vma_anon_name(vma));
 	if (*prev) {
 		vma = *prev;
 		goto success;
diff --git a/mm/mmap.c b/mm/mmap.c
index 40248d84ad5f..8f3cd352a48f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -987,7 +987,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
  */
 static inline int is_mergeable_vma(struct vm_area_struct *vma,
 				struct file *file, unsigned long vm_flags,
-				struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+				struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+				const char __user *anon_name)
 {
 	/*
 	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -1005,6 +1006,8 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
 		return 0;
 	if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
 		return 0;
+	if (vma_anon_name(vma) != anon_name)
+		return 0;
 	return 1;
 }
 
@@ -1037,9 +1040,10 @@ static int
 can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
 		     struct anon_vma *anon_vma, struct file *file,
 		     pgoff_t vm_pgoff,
-		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+		     const char __user *anon_name)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
 	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
 		if (vma->vm_pgoff == vm_pgoff)
 			return 1;
@@ -1058,9 +1062,10 @@ static int
 can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
 		    struct anon_vma *anon_vma, struct file *file,
 		    pgoff_t vm_pgoff,
-		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+		     const char __user *anon_name)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
 	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
 		pgoff_t vm_pglen;
 		vm_pglen = vma_pages(vma);
@@ -1071,9 +1076,9 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
 }
 
 /*
- * Given a mapping request (addr,end,vm_flags,file,pgoff), figure out
- * whether that can be merged with its predecessor or its successor.
- * Or both (it neatly fills a hole).
+ * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
+ * figure out whether that can be merged with its predecessor or its
+ * successor.  Or both (it neatly fills a hole).
  *
  * In most cases - when called for mmap, brk or mremap - [addr,end) is
  * certain not to be mapped by the time vma_merge is called; but when
@@ -1118,7 +1123,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			unsigned long end, unsigned long vm_flags,
 			struct anon_vma *anon_vma, struct file *file,
 			pgoff_t pgoff, struct mempolicy *policy,
-			struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+			struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+			const char __user *anon_name)
 {
 	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
 	struct vm_area_struct *area, *next;
@@ -1151,7 +1157,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			mpol_equal(vma_policy(prev), policy) &&
 			can_vma_merge_after(prev, vm_flags,
 					    anon_vma, file, pgoff,
-					    vm_userfaultfd_ctx)) {
+					    vm_userfaultfd_ctx, anon_name)) {
 		/*
 		 * OK, it can.  Can we now merge in the successor as well?
 		 */
@@ -1160,7 +1166,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 				can_vma_merge_before(next, vm_flags,
 						     anon_vma, file,
 						     pgoff+pglen,
-						     vm_userfaultfd_ctx) &&
+						     vm_userfaultfd_ctx, anon_name) &&
 				is_mergeable_anon_vma(prev->anon_vma,
 						      next->anon_vma, NULL)) {
 							/* cases 1, 6 */
@@ -1183,7 +1189,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			mpol_equal(policy, vma_policy(next)) &&
 			can_vma_merge_before(next, vm_flags,
 					     anon_vma, file, pgoff+pglen,
-					     vm_userfaultfd_ctx)) {
+					     vm_userfaultfd_ctx, anon_name)) {
 		if (prev && addr < prev->vm_end)	/* case 4 */
 			err = __vma_adjust(prev, prev->vm_start,
 					 addr, prev->vm_pgoff, NULL, next);
@@ -1731,7 +1737,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	 * Can we just expand an old mapping?
 	 */
 	vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
-			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
+			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
 	if (vma)
 		goto out;
 
@@ -1779,7 +1785,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		 */
 		if (unlikely(vm_flags != vma->vm_flags && prev)) {
 			merge = vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags,
-				NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX);
+				NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
 			if (merge) {
 				fput(file);
 				vm_area_free(vma);
@@ -3063,7 +3069,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
 
 	/* Can we just expand an old private anonymous mapping? */
 	vma = vma_merge(mm, prev, addr, addr + len, flags,
-			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
+			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
 	if (vma)
 		goto out;
 
@@ -3262,7 +3268,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 		return NULL;	/* should never get here */
 	new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
 			    vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
-			    vma->vm_userfaultfd_ctx);
+			    vma->vm_userfaultfd_ctx, vma_anon_name(vma));
 	if (new_vma) {
 		/*
 		 * Source vma may have been merged into new_vma
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ce8b8a5eacbb..d90c349a3fd9 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -454,7 +454,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 	*pprev = vma_merge(mm, *pprev, start, end, newflags,
 			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
-			   vma->vm_userfaultfd_ctx);
+			   vma->vm_userfaultfd_ctx, vma_anon_name(vma));
 	if (*pprev) {
 		vma = *pprev;
 		VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
@ 2020-08-19 14:37   ` Michal Hocko
  2020-08-19 15:02   ` Matthew Wilcox
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2020-08-19 14:37 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Colin Cross,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Cyrill Gorcunov,
	Serge E. Hallyn, David Rientjes, Hugh Dickins, Rik van Riel,
	Mel Gorman, Tang Chen, Robin Holt, Shaohua Li, Sasha Levin,
	Johannes Weiner, Minchan Kim, linux-api

[Cc linux-api]

On Wed 19-08-20 19:46:50, Sumit Semwal wrote:
> From: Colin Cross <ccross@google.com>
> 
> In many userspace applications, and especially in VM based applications
> like Android uses heavily, there are multiple different allocators in use.
>  At a minimum there is libc malloc and the stack, and in many cases there
> are libc malloc, the stack, direct syscalls to mmap anonymous memory, and
> multiple VM heaps (one for small objects, one for big objects, etc.).
> Each of these layers usually has its own tools to inspect its usage;
> malloc by compiling a debug version, the VM through heap inspection tools,
> and for direct syscalls there is usually no way to track them.
> 
> On Android we heavily use a set of tools that use an extended version of
> the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped
> in userspace and slice their usage by process, shared (COW) vs.  unique
> mappings, backing, etc.  This can account for real physical memory usage
> even in cases like fork without exec (which Android uses heavily to share
> as many private COW pages as possible between processes), Kernel SamePage
> Merging, and clean zero pages.  It produces a measurement of the pages
> that only exist in that process (USS, for unique), and a measurement of
> the physical memory usage of that process with the cost of shared pages
> being evenly split between processes that share them (PSS).
> 
> If all anonymous memory is indistinguishable then figuring out the real
> physical memory usage (PSS) of each heap requires either a pagemap walking
> tool that can understand the heap debugging of every layer, or for every
> layer's heap debugging tools to implement the pagemap walking logic, in
> which case it is hard to get a consistent view of memory across the whole
> system.
> 
> Tracking the information in userspace leads to all sorts of problems.
> It either needs to be stored inside the process, which means every
> process has to have an API to export its current heap information upon
> request, or it has to be stored externally in a filesystem that
> somebody needs to clean up on crashes.  It needs to be readable while
> the process is still running, so it has to have some sort of
> synchronization with every layer of userspace.  Efficiently tracking
> the ranges requires reimplementing something like the kernel vma
> trees, and linking to it from every layer of userspace.  It requires
> more memory, more syscalls, more runtime cost, and more complexity to
> separately track regions that the kernel is already tracking.
> 
> This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
> userspace-provided name for anonymous vmas.  The names of named anonymous
> vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>].
> 
> Userspace can set the name for a region of memory by calling
> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
> Setting the name to NULL clears it.
> 
> The name is stored in a user pointer in the shared union in vm_area_struct
> that points to a null terminated string inside the user process.  vmas
> that point to the same address and are otherwise mergeable will be merged,
> but vmas that point to equivalent strings at different addresses will not
> be merged.
> 
> The idea to store a userspace pointer to reduce the complexity within mm
> (at the expense of the complexity of reading /proc/pid/mem) came from Dave
> Hansen.  This results in no runtime overhead in the mm subsystem other
> than comparing the anon_name pointers when considering vma merging.  The
> pointer is stored in a union with fields that are only used on file-backed
> mappings, so it does not increase memory usage.
> (Upstream changed to remove the union, so this patch adds it back as well)
> 
> Signed-off-by: Colin Cross <ccross@google.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Jan Glauber <jan.glauber@gmail.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Rob Landley <rob@landley.net>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Tang Chen <tangchen@cn.fujitsu.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: Shaohua Li <shli@fusionio.com>
> Cc: Sasha Levin <sasha.levin@oracle.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
> ---
>  Documentation/filesystems/proc.rst |  2 ++
>  fs/proc/task_mmu.c                 | 24 ++++++++++++-
>  include/linux/mm.h                 |  5 ++-
>  include/linux/mm_types.h           | 23 +++++++++++--
>  include/uapi/linux/prctl.h         |  3 ++
>  kernel/sys.c                       | 32 ++++++++++++++++++
>  mm/interval_tree.c                 | 34 +++++++++----------
>  mm/madvise.c                       | 54 +++++++++++++++++++++++++++---
>  mm/mempolicy.c                     |  3 +-
>  mm/mlock.c                         |  2 +-
>  mm/mmap.c                          | 38 ++++++++++++---------
>  mm/mprotect.c                      |  2 +-
>  12 files changed, 177 insertions(+), 45 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 533c79e8d2cd..41a9cea73b8b 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -429,6 +429,8 @@ is not associated with a file:
>   [stack]                    the stack of the main process
>   [vdso]                     the "virtual dynamic shared object",
>                              the kernel system call handler
> +[anon:<name>]               an anonymous mapping that has been
> +                            named by userspace
>   =======                    ====================================
>  
>   or if empty, the mapping is anonymous.
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 5066b0251ed8..136fd3c3ad7b 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm,
>  	return mm->total_vm;
>  }
>  
> +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	char anon_name[NAME_MAX + 1];
> +	int n;
> +
> +	n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma),
> +			     anon_name, NAME_MAX, 0);
> +	if (n > 0) {
> +		seq_puts(m, "[anon:");
> +		seq_write(m, anon_name, strnlen(anon_name, n));
> +		seq_putc(m, ']');
> +	}
> +}
> +
>  #ifdef CONFIG_NUMA
>  /*
>   * Save get_task_policy() for show_numa_map().
> @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
>  			goto done;
>  		}
>  
> -		if (is_stack(vma))
> +		if (is_stack(vma)) {
>  			name = "[stack]";
> +			goto done;
> +		}
> +
> +		if (vma_anon_name(vma)) {
> +			seq_pad(m, ' ');
> +			seq_print_vma_name(m, vma);
> +		}
>  	}
>  
>  done:
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1983e08f5906..c64171529254 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
>  extern struct vm_area_struct *vma_merge(struct mm_struct *,
>  	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
>  	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> -	struct mempolicy *, struct vm_userfaultfd_ctx);
> +	struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);
>  extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
>  extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
>  	unsigned long addr, int new_below);
> @@ -3123,5 +3123,8 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
>  
>  extern int sysctl_nr_trim_pages;
>  
> +int madvise_set_anon_name(unsigned long start, unsigned long len_in,
> +			  unsigned long name_addr);
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 496c3ff97cce..ac8d687ebfb5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -336,10 +336,18 @@ struct vm_area_struct {
>  	/*
>  	 * For areas with an address space and backing store,
>  	 * linkage into the address_space->i_mmap interval tree.
> +	 *
> +	 * For private anonymous mappings, a pointer to a null terminated string
> +	 * in the user process containing the name given to the vma, or NULL
> +	 * if unnamed.
>  	 */
> -	struct {
> -		struct rb_node rb;
> -		unsigned long rb_subtree_last;
> +
> +	union {
> +		struct {
> +			struct rb_node rb;
> +			unsigned long rb_subtree_last;
> +		} interval;
> +		const char __user *anon_name;
>  	} shared;
>  
>  	/*
> @@ -772,4 +780,13 @@ typedef struct {
>  	unsigned long val;
>  } swp_entry_t;
>  
> +/* Return the name for an anonymous mapping or NULL for a file-backed mapping */
> +static inline const char __user *vma_anon_name(struct vm_area_struct *vma)
> +{
> +	if (vma->vm_file)
> +		return NULL;
> +
> +	return vma->shared.anon_name;
> +}
> +
>  #endif /* _LINUX_MM_TYPES_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 07b4f8131e36..10773270f67b 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -238,4 +238,7 @@ struct prctl_mm_map {
>  #define PR_SET_IO_FLUSHER		57
>  #define PR_GET_IO_FLUSHER		58
>  
> +#define PR_SET_VMA		0x53564d41
> +# define PR_SET_VMA_ANON_NAME		0
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index ca11af9d815d..da90837b5ccd 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2280,6 +2280,35 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
>  
>  #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>  
> +#ifdef CONFIG_MMU
> +static int prctl_set_vma(unsigned long opt, unsigned long addr,
> +			 unsigned long len, unsigned long arg)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int error;
> +
> +	mmap_write_lock(mm);
> +
> +	switch (opt) {
> +	case PR_SET_VMA_ANON_NAME:
> +		error = madvise_set_anon_name(addr, len, arg);
> +		break;
> +	default:
> +		error = -EINVAL;
> +	}
> +
> +	mmap_write_unlock(mm);
> +
> +	return error;
> +}
> +#else /* CONFIG_MMU */
> +static int prctl_set_vma(unsigned long opt, unsigned long start,
> +			 unsigned long len_in, unsigned long arg)
> +{
> +	return -EINVAL;
> +}
> +#endif
> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2530,6 +2559,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  
>  		error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
>  		break;
> +	case PR_SET_VMA:
> +		error = prctl_set_vma(arg2, arg3, arg4, arg5);
> +		break;
>  	default:
>  		error = -EINVAL;
>  		break;
> diff --git a/mm/interval_tree.c b/mm/interval_tree.c
> index 11c75fb07584..d684ce0762cd 100644
> --- a/mm/interval_tree.c
> +++ b/mm/interval_tree.c
> @@ -20,8 +20,8 @@ static inline unsigned long vma_last_pgoff(struct vm_area_struct *v)
>  	return v->vm_pgoff + vma_pages(v) - 1;
>  }
>  
> -INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
> -		     unsigned long, shared.rb_subtree_last,
> +INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.interval.rb,
> +		     unsigned long, shared.interval.rb_subtree_last,
>  		     vma_start_pgoff, vma_last_pgoff,, vma_interval_tree)
>  
>  /* Insert node immediately after prev in the interval tree */
> @@ -35,26 +35,26 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  
>  	VM_BUG_ON_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
>  
> -	if (!prev->shared.rb.rb_right) {
> +	if (!prev->shared.interval.rb.rb_right) {
>  		parent = prev;
> -		link = &prev->shared.rb.rb_right;
> +		link = &prev->shared.interval.rb.rb_right;
>  	} else {
> -		parent = rb_entry(prev->shared.rb.rb_right,
> -				  struct vm_area_struct, shared.rb);
> -		if (parent->shared.rb_subtree_last < last)
> -			parent->shared.rb_subtree_last = last;
> -		while (parent->shared.rb.rb_left) {
> -			parent = rb_entry(parent->shared.rb.rb_left,
> -				struct vm_area_struct, shared.rb);
> -			if (parent->shared.rb_subtree_last < last)
> -				parent->shared.rb_subtree_last = last;
> +		parent = rb_entry(prev->shared.interval.rb.rb_right,
> +				  struct vm_area_struct, shared.interval.rb);
> +		if (parent->shared.interval.rb_subtree_last < last)
> +			parent->shared.interval.rb_subtree_last = last;
> +		while (parent->shared.interval.rb.rb_left) {
> +			parent = rb_entry(parent->shared.interval.rb.rb_left,
> +					  struct vm_area_struct, shared.interval.rb);
> +			if (parent->shared.interval.rb_subtree_last < last)
> +				parent->shared.interval.rb_subtree_last = last;
>  		}
> -		link = &parent->shared.rb.rb_left;
> +		link = &parent->shared.interval.rb.rb_left;
>  	}
>  
> -	node->shared.rb_subtree_last = last;
> -	rb_link_node(&node->shared.rb, &parent->shared.rb, link);
> -	rb_insert_augmented(&node->shared.rb, &root->rb_root,
> +	node->shared.interval.rb_subtree_last = last;
> +	rb_link_node(&node->shared.interval.rb, &parent->shared.interval.rb, link);
> +	rb_insert_augmented(&node->shared.interval.rb, &root->rb_root,
>  			    &vma_interval_tree_augment);
>  }
>  
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 84482c21b029..7da8493fa6d3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -65,13 +65,14 @@ static int madvise_need_mmap_write(int behavior)
>   */
>  static int madvise_update_vma(struct vm_area_struct *vma,
>  			      struct vm_area_struct **prev, unsigned long start,
> -			      unsigned long end, unsigned long new_flags)
> +			      unsigned long end, unsigned long new_flags,
> +			      const char __user *new_anon_name)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	int error;
>  	pgoff_t pgoff;
>  
> -	if (new_flags == vma->vm_flags) {
> +	if (new_flags == vma->vm_flags && new_anon_name == vma_anon_name(vma)) {
>  		*prev = vma;
>  		return 0;
>  	}
> @@ -79,7 +80,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
>  	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  	*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
>  			  vma->vm_file, pgoff, vma_policy(vma),
> -			  vma->vm_userfaultfd_ctx);
> +			  vma->vm_userfaultfd_ctx, new_anon_name);
>  	if (*prev) {
>  		vma = *prev;
>  		goto success;
> @@ -112,10 +113,30 @@ static int madvise_update_vma(struct vm_area_struct *vma,
>  	 * vm_flags is protected by the mmap_lock held in write mode.
>  	 */
>  	vma->vm_flags = new_flags;
> +	if (!vma->vm_file)
> +		vma->shared.anon_name = new_anon_name;
>  
>  	return 0;
>  }
>  
> +static int madvise_vma_anon_name(struct vm_area_struct *vma,
> +				 struct vm_area_struct **prev,
> +				 unsigned long start, unsigned long end,
> +				 unsigned long name_addr)
> +{
> +	int error;
> +
> +	/* Only anonymous mappings can be named */
> +	if (vma->vm_file)
> +		return -EINVAL;
> +
> +	error = madvise_update_vma(vma, prev, start, end, vma->vm_flags,
> +				   (const char __user *)name_addr);
> +	if (error == -ENOMEM)
> +		error = -EAGAIN;
> +	return error;
> +}
> +
>  #ifdef CONFIG_SWAP
>  static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
>  	unsigned long end, struct mm_walk *walk)
> @@ -877,7 +898,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
>  		break;
>  	}
>  
> -	error = madvise_update_vma(vma, prev, start, end, new_flags);
> +	error = madvise_update_vma(vma, prev, start, end, new_flags,
> +				   vma_anon_name(vma));
>  
>  out:
>  	if (error == -ENOMEM)
> @@ -1059,6 +1081,30 @@ int madvise_walk_vmas(unsigned long start, unsigned long end,
>  	return unmapped_error;
>  }
>  
> +int madvise_set_anon_name(unsigned long start, unsigned long len_in,
> +			  unsigned long name_addr)
> +{
> +	unsigned long end;
> +	unsigned long len;
> +
> +	if (start & ~PAGE_MASK)
> +		return -EINVAL;
> +	len = (len_in + ~PAGE_MASK) & PAGE_MASK;
> +
> +	/* Check to see whether len was rounded up from small -ve to zero */
> +	if (len_in && !len)
> +		return -EINVAL;
> +
> +	end = start + len;
> +	if (end < start)
> +		return -EINVAL;
> +
> +	if (end == start)
> +		return 0;
> +
> +	return madvise_walk_vmas(start, end, name_addr, madvise_vma_anon_name);
> +}
> +
>  /*
>   * The madvise(2) system call.
>   *
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index eddbe4e56c73..94338d9bfe57 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -829,7 +829,8 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>  			((vmstart - vma->vm_start) >> PAGE_SHIFT);
>  		prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
>  				 vma->anon_vma, vma->vm_file, pgoff,
> -				 new_pol, vma->vm_userfaultfd_ctx);
> +				 new_pol, vma->vm_userfaultfd_ctx,
> +				 vma_anon_name(vma));
>  		if (prev) {
>  			vma = prev;
>  			next = vma->vm_next;
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 93ca2bf30b4f..8e0046c4642f 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -534,7 +534,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
>  	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
>  			  vma->vm_file, pgoff, vma_policy(vma),
> -			  vma->vm_userfaultfd_ctx);
> +			  vma->vm_userfaultfd_ctx, vma_anon_name(vma));
>  	if (*prev) {
>  		vma = *prev;
>  		goto success;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 40248d84ad5f..8f3cd352a48f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -987,7 +987,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
>   */
>  static inline int is_mergeable_vma(struct vm_area_struct *vma,
>  				struct file *file, unsigned long vm_flags,
> -				struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +				struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +				const char __user *anon_name)
>  {
>  	/*
>  	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
> @@ -1005,6 +1006,8 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
>  		return 0;
>  	if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
>  		return 0;
> +	if (vma_anon_name(vma) != anon_name)
> +		return 0;
>  	return 1;
>  }
>  
> @@ -1037,9 +1040,10 @@ static int
>  can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
>  		     struct anon_vma *anon_vma, struct file *file,
>  		     pgoff_t vm_pgoff,
> -		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +		     const char __user *anon_name)
>  {
> -	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
> +	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
>  	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
>  		if (vma->vm_pgoff == vm_pgoff)
>  			return 1;
> @@ -1058,9 +1062,10 @@ static int
>  can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
>  		    struct anon_vma *anon_vma, struct file *file,
>  		    pgoff_t vm_pgoff,
> -		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +		     const char __user *anon_name)
>  {
> -	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
> +	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
>  	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
>  		pgoff_t vm_pglen;
>  		vm_pglen = vma_pages(vma);
> @@ -1071,9 +1076,9 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
>  }
>  
>  /*
> - * Given a mapping request (addr,end,vm_flags,file,pgoff), figure out
> - * whether that can be merged with its predecessor or its successor.
> - * Or both (it neatly fills a hole).
> + * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
> + * figure out whether that can be merged with its predecessor or its
> + * successor.  Or both (it neatly fills a hole).
>   *
>   * In most cases - when called for mmap, brk or mremap - [addr,end) is
>   * certain not to be mapped by the time vma_merge is called; but when
> @@ -1118,7 +1123,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  			unsigned long end, unsigned long vm_flags,
>  			struct anon_vma *anon_vma, struct file *file,
>  			pgoff_t pgoff, struct mempolicy *policy,
> -			struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +			struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +			const char __user *anon_name)
>  {
>  	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
>  	struct vm_area_struct *area, *next;
> @@ -1151,7 +1157,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  			mpol_equal(vma_policy(prev), policy) &&
>  			can_vma_merge_after(prev, vm_flags,
>  					    anon_vma, file, pgoff,
> -					    vm_userfaultfd_ctx)) {
> +					    vm_userfaultfd_ctx, anon_name)) {
>  		/*
>  		 * OK, it can.  Can we now merge in the successor as well?
>  		 */
> @@ -1160,7 +1166,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  				can_vma_merge_before(next, vm_flags,
>  						     anon_vma, file,
>  						     pgoff+pglen,
> -						     vm_userfaultfd_ctx) &&
> +						     vm_userfaultfd_ctx, anon_name) &&
>  				is_mergeable_anon_vma(prev->anon_vma,
>  						      next->anon_vma, NULL)) {
>  							/* cases 1, 6 */
> @@ -1183,7 +1189,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  			mpol_equal(policy, vma_policy(next)) &&
>  			can_vma_merge_before(next, vm_flags,
>  					     anon_vma, file, pgoff+pglen,
> -					     vm_userfaultfd_ctx)) {
> +					     vm_userfaultfd_ctx, anon_name)) {
>  		if (prev && addr < prev->vm_end)	/* case 4 */
>  			err = __vma_adjust(prev, prev->vm_start,
>  					 addr, prev->vm_pgoff, NULL, next);
> @@ -1731,7 +1737,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  	 * Can we just expand an old mapping?
>  	 */
>  	vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
> -			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
> +			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
>  	if (vma)
>  		goto out;
>  
> @@ -1779,7 +1785,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  		 */
>  		if (unlikely(vm_flags != vma->vm_flags && prev)) {
>  			merge = vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags,
> -				NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX);
> +				NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
>  			if (merge) {
>  				fput(file);
>  				vm_area_free(vma);
> @@ -3063,7 +3069,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
>  
>  	/* Can we just expand an old private anonymous mapping? */
>  	vma = vma_merge(mm, prev, addr, addr + len, flags,
> -			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
> +			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
>  	if (vma)
>  		goto out;
>  
> @@ -3262,7 +3268,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>  		return NULL;	/* should never get here */
>  	new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
>  			    vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> -			    vma->vm_userfaultfd_ctx);
> +			    vma->vm_userfaultfd_ctx, vma_anon_name(vma));
>  	if (new_vma) {
>  		/*
>  		 * Source vma may have been merged into new_vma
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ce8b8a5eacbb..d90c349a3fd9 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -454,7 +454,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
>  	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  	*pprev = vma_merge(mm, *pprev, start, end, newflags,
>  			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> -			   vma->vm_userfaultfd_ctx);
> +			   vma->vm_userfaultfd_ctx, vma_anon_name(vma));
>  	if (*pprev) {
>  		vma = *pprev;
>  		VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
> -- 
> 2.28.0
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
  2020-08-19 14:37   ` Michal Hocko
@ 2020-08-19 15:02   ` Matthew Wilcox
  2020-08-19 17:48     ` Sumit Semwal
  2020-08-20 15:46     ` Oleg Nesterov
  2020-08-19 17:14   ` kernel test robot
                     ` (4 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Matthew Wilcox @ 2020-08-19 15:02 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Colin Cross, Alexey Gladkov, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Cyrill Gorcunov,
	Serge E. Hallyn, David Rientjes, Hugh Dickins, Rik van Riel,
	Mel Gorman, Tang Chen, Robin Holt, Shaohua Li, Sasha Levin,
	Johannes Weiner, Minchan Kim

On Wed, Aug 19, 2020 at 07:46:50PM +0530, Sumit Semwal wrote:
> +++ b/include/linux/mm_types.h
> @@ -336,10 +336,18 @@ struct vm_area_struct {
>  	/*
>  	 * For areas with an address space and backing store,
>  	 * linkage into the address_space->i_mmap interval tree.
> +	 *
> +	 * For private anonymous mappings, a pointer to a null terminated string
> +	 * in the user process containing the name given to the vma, or NULL
> +	 * if unnamed.
>  	 */
> -	struct {
> -		struct rb_node rb;
> -		unsigned long rb_subtree_last;
> +
> +	union {
> +		struct {
> +			struct rb_node rb;
> +			unsigned long rb_subtree_last;
> +		} interval;
> +		const char __user *anon_name;
>  	} shared;

You can significantly reduce the size of this patch by doing this instead:

	union {
		struct {
			struct rb_node rb;
			unsigned long rb_subtree_last;
		} shared;
		const char __user *anon_name;
	};



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
  2020-08-19 14:37   ` Michal Hocko
  2020-08-19 15:02   ` Matthew Wilcox
@ 2020-08-19 17:14   ` kernel test robot
  2020-08-19 17:42   ` kernel test robot
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2020-08-19 17:14 UTC (permalink / raw)
  To: Sumit Semwal, Andrew Morton, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet
  Cc: kbuild-all, Linux Memory Management List, Mauro Carvalho Chehab,
	linux-media, Kees Cook, Michal Hocko, Colin Cross

[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]

Hi Sumit,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.9-rc1]
[cannot apply to hnaz-linux-mm/master linux/master next-20200819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sumit-Semwal/Anonymous-VMA-naming-patches/20200819-222011
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 18445bf405cb331117bc98427b1ba6f12418ad17
config: nds32-allnoconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   nds32le-linux-ld: kernel/sys.o: in function `__se_sys_prctl':
>> sys.c:(.text+0x16a6): undefined reference to `madvise_set_anon_name'
>> nds32le-linux-ld: sys.c:(.text+0x16aa): undefined reference to `madvise_set_anon_name'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 5268 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
                     ` (2 preceding siblings ...)
  2020-08-19 17:14   ` kernel test robot
@ 2020-08-19 17:42   ` kernel test robot
  2020-08-20  7:58   ` Michal Hocko
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2020-08-19 17:42 UTC (permalink / raw)
  To: Sumit Semwal, Andrew Morton, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet
  Cc: kbuild-all, Linux Memory Management List, Mauro Carvalho Chehab,
	linux-media, Kees Cook, Michal Hocko, Colin Cross

[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]

Hi Sumit,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.9-rc1]
[cannot apply to hnaz-linux-mm/master linux/master next-20200819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sumit-Semwal/Anonymous-VMA-naming-patches/20200819-222011
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 18445bf405cb331117bc98427b1ba6f12418ad17
config: s390-randconfig-r025-20200818 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   s390-linux-ld: kernel/sys.o: in function `prctl_set_vma':
   kernel/sys.c:2294: undefined reference to `madvise_set_anon_name'
>> s390-linux-ld: kernel/sys.c:2294: undefined reference to `madvise_set_anon_name'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 16281 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 15:02   ` Matthew Wilcox
@ 2020-08-19 17:48     ` Sumit Semwal
  2020-08-20 15:46     ` Oleg Nesterov
  1 sibling, 0 replies; 26+ messages in thread
From: Sumit Semwal @ 2020-08-19 17:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-mm, LKML, Alexey Dobriyan, Jonathan Corbet,
	Mauro Carvalho Chehab, Kees Cook, Michal Hocko, Colin Cross,
	Alexey Gladkov, Jason Gunthorpe, Kirill A . Shutemov,
	Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Cyrill Gorcunov,
	Serge E. Hallyn, David Rientjes, Hugh Dickins, Rik van Riel,
	Mel Gorman, Tang Chen, Robin Holt, Shaohua Li, Sasha Levin,
	Johannes Weiner, Minchan Kim

Hi Matt,

Thanks for the review!
On Wed, 19 Aug 2020 at 20:33, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Aug 19, 2020 at 07:46:50PM +0530, Sumit Semwal wrote:
> > +++ b/include/linux/mm_types.h
> > @@ -336,10 +336,18 @@ struct vm_area_struct {
> >       /*
> >        * For areas with an address space and backing store,
> >        * linkage into the address_space->i_mmap interval tree.
> > +      *
> > +      * For private anonymous mappings, a pointer to a null terminated string
> > +      * in the user process containing the name given to the vma, or NULL
> > +      * if unnamed.
> >        */
> > -     struct {
> > -             struct rb_node rb;
> > -             unsigned long rb_subtree_last;
> > +
> > +     union {
> > +             struct {
> > +                     struct rb_node rb;
> > +                     unsigned long rb_subtree_last;
> > +             } interval;
> > +             const char __user *anon_name;
> >       } shared;
>
> You can significantly reduce the size of this patch by doing this instead:
>
>         union {
>                 struct {
>                         struct rb_node rb;
>                         unsigned long rb_subtree_last;
>                 } shared;
>                 const char __user *anon_name;
>         };
>
Thanks, will update in the next version.

Best,
Sumit.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
                     ` (3 preceding siblings ...)
  2020-08-19 17:42   ` kernel test robot
@ 2020-08-20  7:58   ` Michal Hocko
  2020-08-20 23:28     ` Colin Cross
  2020-08-20 16:00   ` Oleg Nesterov
  2020-08-20 21:40   ` Cyrill Gorcunov
  6 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2020-08-20  7:58 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Colin Cross,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Cyrill Gorcunov,
	Serge E. Hallyn, David Rientjes, Hugh Dickins, Rik van Riel,
	Mel Gorman, Tang Chen, Robin Holt, Shaohua Li, Sasha Levin,
	Johannes Weiner, Minchan Kim

On Wed 19-08-20 19:46:50, Sumit Semwal wrote:
[...]
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 5066b0251ed8..136fd3c3ad7b 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm,
>  	return mm->total_vm;
>  }
>  
> +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	char anon_name[NAME_MAX + 1];
> +	int n;
> +
> +	n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma),
> +			     anon_name, NAME_MAX, 0);
> +	if (n > 0) {
> +		seq_puts(m, "[anon:");
> +		seq_write(m, anon_name, strnlen(anon_name, n));
> +		seq_putc(m, ']');
> +	}
> +}
> +
>  #ifdef CONFIG_NUMA
>  /*
>   * Save get_task_policy() for show_numa_map().
> @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
>  			goto done;
>  		}
>  
> -		if (is_stack(vma))
> +		if (is_stack(vma)) {
>  			name = "[stack]";
> +			goto done;
> +		}
> +
> +		if (vma_anon_name(vma)) {
> +			seq_pad(m, ' ');
> +			seq_print_vma_name(m, vma);
> +		}
>  	}

How can be this safe? access_remote_vm requires mmap_sem (non exlusive).
The same is the case for show_map_vma. So what would happen if a task
sets its own name? IIRC semaphore code doesn't allow read lock nesting
because any exclusive lock request in the mean time would block further
readers. Or is this allowed?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 15:02   ` Matthew Wilcox
  2020-08-19 17:48     ` Sumit Semwal
@ 2020-08-20 15:46     ` Oleg Nesterov
  1 sibling, 0 replies; 26+ messages in thread
From: Oleg Nesterov @ 2020-08-20 15:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sumit Semwal, Andrew Morton, linux-mm, linux-kernel,
	Alexey Dobriyan, Jonathan Corbet, Mauro Carvalho Chehab,
	Kees Cook, Michal Hocko, Colin Cross, Alexey Gladkov,
	Jason Gunthorpe, Kirill A . Shutemov, Michel Lespinasse,
	Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On 08/19, Matthew Wilcox wrote:
>
> You can significantly reduce the size of this patch by doing this instead:
>
> 	union {
> 		struct {
> 			struct rb_node rb;
> 			unsigned long rb_subtree_last;
> 		} shared;
> 		const char __user *anon_name;
> 	};

Agreed,

And to me "unsigned long anon_name" looks better, vma_anon_name() should
return "unsigned long" too. The only thing which reads this string is
seq_print_vma_name() and it has typecast anon_name anyway.

But I won't insist, this is cosmetic and subjective.

Oleg.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
                     ` (4 preceding siblings ...)
  2020-08-20  7:58   ` Michal Hocko
@ 2020-08-20 16:00   ` Oleg Nesterov
  2020-08-20 21:15     ` Kees Cook
  2020-08-21  3:14     ` Sumit Semwal
  2020-08-20 21:40   ` Cyrill Gorcunov
  6 siblings, 2 replies; 26+ messages in thread
From: Oleg Nesterov @ 2020-08-20 16:00 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Colin Cross, Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On 08/19, Sumit Semwal wrote:
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
>  extern struct vm_area_struct *vma_merge(struct mm_struct *,
>  	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
>  	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> -	struct mempolicy *, struct vm_userfaultfd_ctx);
> +	struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);

It seems that you forgot to update the callers in fs/userfaultfd.c ?

Oleg.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 16:00   ` Oleg Nesterov
@ 2020-08-20 21:15     ` Kees Cook
  2020-08-21  3:15       ` Sumit Semwal
  2020-08-21  3:14     ` Sumit Semwal
  1 sibling, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-08-20 21:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Sumit Semwal, Andrew Morton, linux-mm, linux-kernel,
	Alexey Dobriyan, Jonathan Corbet, Mauro Carvalho Chehab,
	Michal Hocko, Colin Cross, Alexey Gladkov, Matthew Wilcox,
	Jason Gunthorpe, Kirill A . Shutemov, Michel Lespinasse,
	Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On Thu, Aug 20, 2020 at 06:00:14PM +0200, Oleg Nesterov wrote:
> On 08/19, Sumit Semwal wrote:
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
> >  extern struct vm_area_struct *vma_merge(struct mm_struct *,
> >  	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
> >  	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> > -	struct mempolicy *, struct vm_userfaultfd_ctx);
> > +	struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);
> 
> It seems that you forgot to update the callers in fs/userfaultfd.c ?

(I recommend including "make allmodconfig && make" in your test workflow.)

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
                     ` (5 preceding siblings ...)
  2020-08-20 16:00   ` Oleg Nesterov
@ 2020-08-20 21:40   ` Cyrill Gorcunov
  2020-08-20 21:45     ` Colin Cross
  2020-08-20 21:45     ` Dave Hansen
  6 siblings, 2 replies; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-08-20 21:40 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Colin Cross, Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On Wed, Aug 19, 2020 at 07:46:50PM +0530, Sumit Semwal wrote:
...
> Userspace can set the name for a region of memory by calling
> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
> Setting the name to NULL clears it.
> 
> The name is stored in a user pointer in the shared union in vm_area_struct
> that points to a null terminated string inside the user process.  vmas
> that point to the same address and are otherwise mergeable will be merged,
> but vmas that point to equivalent strings at different addresses will not
> be merged.
...

Guys, could you please enlighen me, I don't understand -- we pass some
random user-space pointer and save it in vm_area_struct then in procfs
we treat it as "string" and print out? What prevents me to put some crap
here then unmap this pointer the kernel will cause page fault in procfs
output (in best scenario)?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 21:40   ` Cyrill Gorcunov
@ 2020-08-20 21:45     ` Colin Cross
  2020-08-20 22:15       ` Cyrill Gorcunov
  2020-08-20 21:45     ` Dave Hansen
  1 sibling, 1 reply; 26+ messages in thread
From: Colin Cross @ 2020-08-20 21:45 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Sumit Semwal, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On Thu, Aug 20, 2020 at 2:40 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
> On Wed, Aug 19, 2020 at 07:46:50PM +0530, Sumit Semwal wrote:
> ...
> > Userspace can set the name for a region of memory by calling
> > prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
> > Setting the name to NULL clears it.
> >
> > The name is stored in a user pointer in the shared union in vm_area_struct
> > that points to a null terminated string inside the user process.  vmas
> > that point to the same address and are otherwise mergeable will be merged,
> > but vmas that point to equivalent strings at different addresses will not
> > be merged.
> ...
>
> Guys, could you please enlighen me, I don't understand -- we pass some
> random user-space pointer and save it in vm_area_struct then in procfs
> we treat it as "string" and print out? What prevents me to put some crap
> here then unmap this pointer the kernel will cause page fault in procfs
> output (in best scenario)?

This is the same pattern used for /proc/pid/cmdline.
acccess_remote_vm handles addresses in unmapped pages, it will return
0 if no bytes were readable.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 21:40   ` Cyrill Gorcunov
  2020-08-20 21:45     ` Colin Cross
@ 2020-08-20 21:45     ` Dave Hansen
  2020-08-20 21:59       ` Cyrill Gorcunov
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2020-08-20 21:45 UTC (permalink / raw)
  To: Cyrill Gorcunov, Sumit Semwal
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Colin Cross, Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Peter Zijlstra,
	Ingo Molnar, Oleg Nesterov, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Serge E. Hallyn, David Rientjes,
	Hugh Dickins, Rik van Riel, Mel Gorman, Tang Chen, Robin Holt,
	Shaohua Li, Sasha Levin, Johannes Weiner, Minchan Kim

On 8/20/20 2:40 PM, Cyrill Gorcunov wrote:
> On Wed, Aug 19, 2020 at 07:46:50PM +0530, Sumit Semwal wrote:
> ...
>> Userspace can set the name for a region of memory by calling
>> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
>> Setting the name to NULL clears it.
>>
>> The name is stored in a user pointer in the shared union in vm_area_struct
>> that points to a null terminated string inside the user process.  vmas
>> that point to the same address and are otherwise mergeable will be merged,
>> but vmas that point to equivalent strings at different addresses will not
>> be merged.
> ...
> 
> Guys, could you please enlighen me, I don't understand -- we pass some
> random user-space pointer and save it in vm_area_struct then in procfs
> we treat it as "string" and print out? What prevents me to put some crap
> here then unmap this pointer the kernel will cause page fault in procfs
> output (in best scenario)?

Remember, this is virtually identical to what we do for
/proc/$pid/cmdline in get_mm_cmdline().  The kernel goes following a
user-provided pointer into the user address space looking for a string.

If userspace points it to garbage, access_remote_vm() will fail safely.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 21:45     ` Dave Hansen
@ 2020-08-20 21:59       ` Cyrill Gorcunov
  0 siblings, 0 replies; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-08-20 21:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sumit Semwal, Andrew Morton, linux-mm, linux-kernel,
	Alexey Dobriyan, Jonathan Corbet, Mauro Carvalho Chehab,
	Kees Cook, Michal Hocko, Colin Cross, Alexey Gladkov,
	Matthew Wilcox, Jason Gunthorpe, Kirill A . Shutemov,
	Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Peter Zijlstra,
	Ingo Molnar, Oleg Nesterov, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Serge E. Hallyn, David Rientjes,
	Hugh Dickins, Rik van Riel, Mel Gorman, Tang Chen, Robin Holt,
	Shaohua Li, Sasha Levin, Johannes Weiner, Minchan Kim

On Thu, Aug 20, 2020 at 02:45:38PM -0700, Dave Hansen wrote:
> > 
> > Guys, could you please enlighen me, I don't understand -- we pass some
> > random user-space pointer and save it in vm_area_struct then in procfs
> > we treat it as "string" and print out? What prevents me to put some crap
> > here then unmap this pointer the kernel will cause page fault in procfs
> > output (in best scenario)?
> 
> Remember, this is virtually identical to what we do for
> /proc/$pid/cmdline in get_mm_cmdline().  The kernel goes following a
> user-provided pointer into the user address space looking for a string.
> 
> If userspace points it to garbage, access_remote_vm() will fail safely.

Yeah, managed to miss it, thanks!


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 21:45     ` Colin Cross
@ 2020-08-20 22:15       ` Cyrill Gorcunov
  2020-08-20 23:51         ` Colin Cross
  0 siblings, 1 reply; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-08-20 22:15 UTC (permalink / raw)
  To: Colin Cross
  Cc: Sumit Semwal, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On Thu, Aug 20, 2020 at 02:45:27PM -0700, Colin Cross wrote:
> >
> > Guys, could you please enlighen me, I don't understand -- we pass some
> > random user-space pointer and save it in vm_area_struct then in procfs
> > we treat it as "string" and print out? What prevents me to put some crap
> > here then unmap this pointer the kernel will cause page fault in procfs
> > output (in best scenario)?
> 
> This is the same pattern used for /proc/pid/cmdline.
> acccess_remote_vm handles addresses in unmapped pages, it will return
> 0 if no bytes were readable.

Yes, been in this part of code too long ago, managed to forget. You know
I'm wondering do we really need a human readable names here? Maybe it would
be more convenient to keep say u64 number instead? This would eliminate
need to access VM at all. From user space point of view there should be
no difference how to recognize such VMAs (by name or by some ID). Or there
some need for string solely?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20  7:58   ` Michal Hocko
@ 2020-08-20 23:28     ` Colin Cross
  2020-08-21  3:21       ` Sumit Semwal
  0 siblings, 1 reply; 26+ messages in thread
From: Colin Cross @ 2020-08-20 23:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sumit Semwal, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Cyrill Gorcunov,
	Serge E. Hallyn, David Rientjes, Hugh Dickins, Rik van Riel,
	Mel Gorman, Tang Chen, Robin Holt, Shaohua Li, Sasha Levin,
	Johannes Weiner, Minchan Kim

On Thu, Aug 20, 2020 at 12:58 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 19-08-20 19:46:50, Sumit Semwal wrote:
> [...]
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 5066b0251ed8..136fd3c3ad7b 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm,
> >       return mm->total_vm;
> >  }
> >
> > +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
> > +{
> > +     struct mm_struct *mm = vma->vm_mm;
> > +     char anon_name[NAME_MAX + 1];
> > +     int n;
> > +
> > +     n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma),
> > +                          anon_name, NAME_MAX, 0);
> > +     if (n > 0) {
> > +             seq_puts(m, "[anon:");
> > +             seq_write(m, anon_name, strnlen(anon_name, n));
> > +             seq_putc(m, ']');
> > +     }
> > +}
> > +
> >  #ifdef CONFIG_NUMA
> >  /*
> >   * Save get_task_policy() for show_numa_map().
> > @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
> >                       goto done;
> >               }
> >
> > -             if (is_stack(vma))
> > +             if (is_stack(vma)) {
> >                       name = "[stack]";
> > +                     goto done;
> > +             }
> > +
> > +             if (vma_anon_name(vma)) {
> > +                     seq_pad(m, ' ');
> > +                     seq_print_vma_name(m, vma);
> > +             }
> >       }
>
> How can be this safe? access_remote_vm requires mmap_sem (non exlusive).
> The same is the case for show_map_vma. So what would happen if a task
> sets its own name? IIRC semaphore code doesn't allow read lock nesting
> because any exclusive lock request in the mean time would block further
> readers. Or is this allowed?

Good catch.  The version of this patch that has been in use the
Android kernel since 2015 [1] doesn't have this issue because it
doesn't use access_remote_vm, it calls get_user_pages_remote directly.
This would need to call a version of access_remote_vm that assumes the
mmap_sem is already held.

[1] https://android.googlesource.com/kernel/common/+/60500a42286de35f00d2a195f2021bcc029f11a1%5E%21/#F1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 22:15       ` Cyrill Gorcunov
@ 2020-08-20 23:51         ` Colin Cross
  2020-08-21  7:05           ` Cyrill Gorcunov
  0 siblings, 1 reply; 26+ messages in thread
From: Colin Cross @ 2020-08-20 23:51 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Sumit Semwal, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On Thu, Aug 20, 2020 at 3:15 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
> On Thu, Aug 20, 2020 at 02:45:27PM -0700, Colin Cross wrote:
> > >
> > > Guys, could you please enlighen me, I don't understand -- we pass some
> > > random user-space pointer and save it in vm_area_struct then in procfs
> > > we treat it as "string" and print out? What prevents me to put some crap
> > > here then unmap this pointer the kernel will cause page fault in procfs
> > > output (in best scenario)?
> >
> > This is the same pattern used for /proc/pid/cmdline.
> > acccess_remote_vm handles addresses in unmapped pages, it will return
> > 0 if no bytes were readable.
>
> Yes, been in this part of code too long ago, managed to forget. You know
> I'm wondering do we really need a human readable names here? Maybe it would
> be more convenient to keep say u64 number instead? This would eliminate
> need to access VM at all. From user space point of view there should be
> no difference how to recognize such VMAs (by name or by some ID). Or there
> some need for string solely?

Numbers instead of strings would require some central registry to
decode them, which would make it much harder to use.  You can see some
examples of how Android uses it at https://pastebin.com/BQZ1vZnJ for
the cat proces and https://pastebin.com/YNUTvZyz for an ART process.
We label individual stacks with their TIDs (useful since the stack TID
annotation was reverted in 65376df582174ffcec9e6471bf5b0dd79ba05e4a),
the various heaps created by different allocators, and much more.  The
data is consumed manually for debugging, as well as by various memory
stats collection and analysis tools.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 16:00   ` Oleg Nesterov
  2020-08-20 21:15     ` Kees Cook
@ 2020-08-21  3:14     ` Sumit Semwal
  1 sibling, 0 replies; 26+ messages in thread
From: Sumit Semwal @ 2020-08-21  3:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, linux-mm, LKML, Alexey Dobriyan, Jonathan Corbet,
	Mauro Carvalho Chehab, Kees Cook, Michal Hocko, Colin Cross,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

Hello Oleg,

Thanks for the review.

On Thu, 20 Aug 2020 at 21:30, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 08/19, Sumit Semwal wrote:
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
> >  extern struct vm_area_struct *vma_merge(struct mm_struct *,
> >       struct vm_area_struct *prev, unsigned long addr, unsigned long end,
> >       unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> > -     struct mempolicy *, struct vm_userfaultfd_ctx);
> > +     struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);
>
> It seems that you forgot to update the callers in fs/userfaultfd.c ?
Yes, I did :( - and it didn't get caught with the config I was testing
it with. Apologies about that, I will update in the upcoming version.
>
> Oleg.
>

Best,
Sumit.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 21:15     ` Kees Cook
@ 2020-08-21  3:15       ` Sumit Semwal
  0 siblings, 0 replies; 26+ messages in thread
From: Sumit Semwal @ 2020-08-21  3:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Oleg Nesterov, Andrew Morton, linux-mm, LKML, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Michal Hocko,
	Colin Cross, Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Mike Christie, Bart Van Assche,
	Amit Pundir, Thomas Gleixner, Christian Brauner, Daniel Jordan,
	Adrian Reber, Nicolas Viennot, Al Viro, Thomas Cedeno,
	linux-fsdevel, Pekka Enberg, Dave Hansen, Peter Zijlstra,
	Ingo Molnar, Eric W. Biederman, Jan Glauber, John Stultz,
	Rob Landley, Cyrill Gorcunov, Serge E. Hallyn, David Rientjes,
	Hugh Dickins, Mel Gorman, Robin Holt, Shaohua Li,
	Johannes Weiner, Minchan Kim

Hello Kees,

On Fri, 21 Aug 2020 at 02:45, Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Aug 20, 2020 at 06:00:14PM +0200, Oleg Nesterov wrote:
> > On 08/19, Sumit Semwal wrote:
> > >
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
> > >  extern struct vm_area_struct *vma_merge(struct mm_struct *,
> > >     struct vm_area_struct *prev, unsigned long addr, unsigned long end,
> > >     unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> > > -   struct mempolicy *, struct vm_userfaultfd_ctx);
> > > +   struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);
> >
> > It seems that you forgot to update the callers in fs/userfaultfd.c ?
>
> (I recommend including "make allmodconfig && make" in your test workflow.)

Yes, indeed :| - that was stupid of me!
>
> --
> Kees Cook

Best,
Sumit.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 23:28     ` Colin Cross
@ 2020-08-21  3:21       ` Sumit Semwal
  2020-08-21  7:24         ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Sumit Semwal @ 2020-08-21  3:21 UTC (permalink / raw)
  To: Colin Cross
  Cc: Michal Hocko, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Mike Christie, Bart Van Assche,
	Amit Pundir, Thomas Gleixner, Christian Brauner, Daniel Jordan,
	Adrian Reber, Nicolas Viennot, Al Viro, Thomas Cedeno,
	linux-fsdevel, Pekka Enberg, Dave Hansen, Peter Zijlstra,
	Ingo Molnar, Oleg Nesterov, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Mel Gorman, Robin Holt, Shaohua Li,
	Johannes Weiner, Minchan Kim

Hi Colin,

On Fri, 21 Aug 2020 at 04:58, Colin Cross <ccross@android.com> wrote:
>
> On Thu, Aug 20, 2020 at 12:58 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 19-08-20 19:46:50, Sumit Semwal wrote:
> > [...]
> > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > index 5066b0251ed8..136fd3c3ad7b 100644
> > > --- a/fs/proc/task_mmu.c
> > > +++ b/fs/proc/task_mmu.c
> > > @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm,
> > >       return mm->total_vm;
> > >  }
> > >
> > > +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
> > > +{
> > > +     struct mm_struct *mm = vma->vm_mm;
> > > +     char anon_name[NAME_MAX + 1];
> > > +     int n;
> > > +
> > > +     n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma),
> > > +                          anon_name, NAME_MAX, 0);
> > > +     if (n > 0) {
> > > +             seq_puts(m, "[anon:");
> > > +             seq_write(m, anon_name, strnlen(anon_name, n));
> > > +             seq_putc(m, ']');
> > > +     }
> > > +}
> > > +
> > >  #ifdef CONFIG_NUMA
> > >  /*
> > >   * Save get_task_policy() for show_numa_map().
> > > @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
> > >                       goto done;
> > >               }
> > >
> > > -             if (is_stack(vma))
> > > +             if (is_stack(vma)) {
> > >                       name = "[stack]";
> > > +                     goto done;
> > > +             }
> > > +
> > > +             if (vma_anon_name(vma)) {
> > > +                     seq_pad(m, ' ');
> > > +                     seq_print_vma_name(m, vma);
> > > +             }
> > >       }
> >
> > How can be this safe? access_remote_vm requires mmap_sem (non exlusive).
> > The same is the case for show_map_vma. So what would happen if a task
> > sets its own name? IIRC semaphore code doesn't allow read lock nesting
> > because any exclusive lock request in the mean time would block further
> > readers. Or is this allowed?
>
> Good catch.  The version of this patch that has been in use the
> Android kernel since 2015 [1] doesn't have this issue because it
> doesn't use access_remote_vm, it calls get_user_pages_remote directly.
> This would need to call a version of access_remote_vm that assumes the
> mmap_sem is already held.

Indeed. so does it sound ok to add an access_remote_vma_mmap_lockheld() version?

>
> [1] https://android.googlesource.com/kernel/common/+/60500a42286de35f00d2a195f2021bcc029f11a1%5E%21/#F1

Best,
Sumit.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-20 23:51         ` Colin Cross
@ 2020-08-21  7:05           ` Cyrill Gorcunov
  0 siblings, 0 replies; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-08-21  7:05 UTC (permalink / raw)
  To: Colin Cross
  Cc: Sumit Semwal, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook, Michal Hocko,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Thomas Hellstrom, Mike Christie,
	Bart Van Assche, Amit Pundir, Thomas Gleixner, Christian Brauner,
	Daniel Jordan, Adrian Reber, Nicolas Viennot, Al Viro,
	Thomas Cedeno, linux-fsdevel, Pekka Enberg, Dave Hansen,
	Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Eric W. Biederman,
	Jan Glauber, John Stultz, Rob Landley, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Rik van Riel, Mel Gorman,
	Tang Chen, Robin Holt, Shaohua Li, Sasha Levin, Johannes Weiner,
	Minchan Kim

On Thu, Aug 20, 2020 at 04:51:57PM -0700, Colin Cross wrote:
> >
> > Yes, been in this part of code too long ago, managed to forget. You know
> > I'm wondering do we really need a human readable names here? Maybe it would
> > be more convenient to keep say u64 number instead? This would eliminate
> > need to access VM at all. From user space point of view there should be
> > no difference how to recognize such VMAs (by name or by some ID). Or there
> > some need for string solely?
> 
> Numbers instead of strings would require some central registry to
> decode them, which would make it much harder to use.  You can see some

This is not anyhow different from number constants. You simply need a
central registry of strings.

enum anon_codes {
	atexit_handlers		= 1,
	linker_alloc		= 2,
	...
};

ed432000-ed433000 r--p 00000000 00:00 0                                  [anon:1]
ed442000-ed4a6000 r--p 00000000 00:00 0                                  [anon:2]
	...

the thing is that the string constants have no meaning outside of the
user space application which use it. Moreover a user (while would read
it via procfs) must have no assumption about their name meaning at all.

> examples of how Android uses it at https://pastebin.com/BQZ1vZnJ for
> the cat proces and https://pastebin.com/YNUTvZyz for an ART process.
> We label individual stacks with their TIDs (useful since the stack TID
> annotation was reverted in 65376df582174ffcec9e6471bf5b0dd79ba05e4a),
> the various heaps created by different allocators, and much more.  The
> data is consumed manually for debugging, as well as by various memory
> stats collection and analysis tools.

Aha, thanks! I see the labeling. So baically the benefit of string
constants is the following:

 - they are human readable
 - they are 255 bytes long (for now)

While downsides are:

 - a way more longer procfs output

I don't have some strong opinion here. Still thanks a huge for examples.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-21  3:21       ` Sumit Semwal
@ 2020-08-21  7:24         ` Michal Hocko
  2020-08-21  7:53           ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2020-08-21  7:24 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Colin Cross, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Mike Christie, Bart Van Assche,
	Amit Pundir, Thomas Gleixner, Christian Brauner, Daniel Jordan,
	Adrian Reber, Nicolas Viennot, Al Viro, Thomas Cedeno,
	linux-fsdevel, Pekka Enberg, Dave Hansen, Peter Zijlstra,
	Ingo Molnar, Oleg Nesterov, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Mel Gorman, Robin Holt, Shaohua Li,
	Johannes Weiner, Minchan Kim

On Fri 21-08-20 08:51:57, Sumit Semwal wrote:
> Hi Colin,
> 
> On Fri, 21 Aug 2020 at 04:58, Colin Cross <ccross@android.com> wrote:
> >
> > On Thu, Aug 20, 2020 at 12:58 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 19-08-20 19:46:50, Sumit Semwal wrote:
> > > [...]
> > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > > index 5066b0251ed8..136fd3c3ad7b 100644
> > > > --- a/fs/proc/task_mmu.c
> > > > +++ b/fs/proc/task_mmu.c
> > > > @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm,
> > > >       return mm->total_vm;
> > > >  }
> > > >
> > > > +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
> > > > +{
> > > > +     struct mm_struct *mm = vma->vm_mm;
> > > > +     char anon_name[NAME_MAX + 1];
> > > > +     int n;
> > > > +
> > > > +     n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma),
> > > > +                          anon_name, NAME_MAX, 0);
> > > > +     if (n > 0) {
> > > > +             seq_puts(m, "[anon:");
> > > > +             seq_write(m, anon_name, strnlen(anon_name, n));
> > > > +             seq_putc(m, ']');
> > > > +     }
> > > > +}
> > > > +
> > > >  #ifdef CONFIG_NUMA
> > > >  /*
> > > >   * Save get_task_policy() for show_numa_map().
> > > > @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
> > > >                       goto done;
> > > >               }
> > > >
> > > > -             if (is_stack(vma))
> > > > +             if (is_stack(vma)) {
> > > >                       name = "[stack]";
> > > > +                     goto done;
> > > > +             }
> > > > +
> > > > +             if (vma_anon_name(vma)) {
> > > > +                     seq_pad(m, ' ');
> > > > +                     seq_print_vma_name(m, vma);
> > > > +             }
> > > >       }
> > >
> > > How can be this safe? access_remote_vm requires mmap_sem (non exlusive).
> > > The same is the case for show_map_vma. So what would happen if a task
> > > sets its own name? IIRC semaphore code doesn't allow read lock nesting
> > > because any exclusive lock request in the mean time would block further
> > > readers. Or is this allowed?
> >
> > Good catch.  The version of this patch that has been in use the
> > Android kernel since 2015 [1] doesn't have this issue because it
> > doesn't use access_remote_vm, it calls get_user_pages_remote directly.
> > This would need to call a version of access_remote_vm that assumes the
> > mmap_sem is already held.
> 
> Indeed. so does it sound ok to add an access_remote_vma_mmap_lockheld() version?

You will still need to take the lock if the pointer belongs to a remote
address space. But how are you going to find out?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-21  7:24         ` Michal Hocko
@ 2020-08-21  7:53           ` Michal Hocko
  2020-08-21  8:02             ` Sumit Semwal
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2020-08-21  7:53 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Colin Cross, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Mike Christie, Bart Van Assche,
	Amit Pundir, Thomas Gleixner, Christian Brauner, Daniel Jordan,
	Adrian Reber, Nicolas Viennot, Al Viro, Thomas Cedeno,
	linux-fsdevel, Pekka Enberg, Dave Hansen, Peter Zijlstra,
	Ingo Molnar, Oleg Nesterov, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Mel Gorman, Robin Holt, Shaohua Li,
	Johannes Weiner, Minchan Kim

On Fri 21-08-20 09:24:02, Michal Hocko wrote:
> On Fri 21-08-20 08:51:57, Sumit Semwal wrote:
[...]
> > Indeed. so does it sound ok to add an access_remote_vma_mmap_lockheld() version?
> 
> You will still need to take the lock if the pointer belongs to a remote
> address space. But how are you going to find out?

Scratch that. I didn't realize prctl is always called with the current
context. So there will never be a pointer from a remote process. Going
with a variant which doesn't take mmap_sem would be safe.

Sorry about the confusion.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory
  2020-08-21  7:53           ` Michal Hocko
@ 2020-08-21  8:02             ` Sumit Semwal
  0 siblings, 0 replies; 26+ messages in thread
From: Sumit Semwal @ 2020-08-21  8:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Colin Cross, Andrew Morton, Linux-MM, lkml, Alexey Dobriyan,
	Jonathan Corbet, Mauro Carvalho Chehab, Kees Cook,
	Alexey Gladkov, Matthew Wilcox, Jason Gunthorpe,
	Kirill A . Shutemov, Michel Lespinasse, Michal Koutný,
	Song Liu, Huang Ying, Vlastimil Babka, Yang Shi, chenqiwu,
	Mathieu Desnoyers, John Hubbard, Mike Christie, Bart Van Assche,
	Amit Pundir, Thomas Gleixner, Christian Brauner, Daniel Jordan,
	Adrian Reber, Nicolas Viennot, Al Viro, Thomas Cedeno,
	linux-fsdevel, Pekka Enberg, Dave Hansen, Peter Zijlstra,
	Ingo Molnar, Oleg Nesterov, Eric W. Biederman, Jan Glauber,
	John Stultz, Rob Landley, Cyrill Gorcunov, Serge E. Hallyn,
	David Rientjes, Hugh Dickins, Mel Gorman, Robin Holt, Shaohua Li,
	Johannes Weiner, Minchan Kim

Hi Michal,

On Fri, 21 Aug 2020 at 13:23, Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 21-08-20 09:24:02, Michal Hocko wrote:
> > On Fri 21-08-20 08:51:57, Sumit Semwal wrote:
> [...]
> > > Indeed. so does it sound ok to add an access_remote_vma_mmap_lockheld() version?
> >
> > You will still need to take the lock if the pointer belongs to a remote
> > address space. But how are you going to find out?
>
> Scratch that. I didn't realize prctl is always called with the current
> context. So there will never be a pointer from a remote process. Going
> with a variant which doesn't take mmap_sem would be safe.

Thanks much for the review and confirmation. I will prepare the
updated patchset and send it out with the review comments
incorporated.
>
> Sorry about the confusion.

Not at all!

> --
> Michal Hocko
> SUSE Labs

Best,
Sumit.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-08-21  8:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 14:16 [PATCH v5 0/2] Anonymous VMA naming patches Sumit Semwal
2020-08-19 14:16 ` [PATCH v5 1/2] mm: rearrange madvise code to allow for reuse Sumit Semwal
2020-08-19 14:16 ` [PATCH v5 2/2] mm: add a field to store names for private anonymous memory Sumit Semwal
2020-08-19 14:37   ` Michal Hocko
2020-08-19 15:02   ` Matthew Wilcox
2020-08-19 17:48     ` Sumit Semwal
2020-08-20 15:46     ` Oleg Nesterov
2020-08-19 17:14   ` kernel test robot
2020-08-19 17:42   ` kernel test robot
2020-08-20  7:58   ` Michal Hocko
2020-08-20 23:28     ` Colin Cross
2020-08-21  3:21       ` Sumit Semwal
2020-08-21  7:24         ` Michal Hocko
2020-08-21  7:53           ` Michal Hocko
2020-08-21  8:02             ` Sumit Semwal
2020-08-20 16:00   ` Oleg Nesterov
2020-08-20 21:15     ` Kees Cook
2020-08-21  3:15       ` Sumit Semwal
2020-08-21  3:14     ` Sumit Semwal
2020-08-20 21:40   ` Cyrill Gorcunov
2020-08-20 21:45     ` Colin Cross
2020-08-20 22:15       ` Cyrill Gorcunov
2020-08-20 23:51         ` Colin Cross
2020-08-21  7:05           ` Cyrill Gorcunov
2020-08-20 21:45     ` Dave Hansen
2020-08-20 21:59       ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).