All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Mauro Carvalho Chehab" <mchehab+huawei@kernel.org>,
	"Kees Cook" <keescook@chromium.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Colin Cross" <ccross@google.com>,
	"Alexey Gladkov" <gladkov.alexey@gmail.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Michel Lespinasse" <walken@google.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Song Liu" <songliubraving@fb.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Yang Shi" <yang.shi@linux.alibaba.com>,
	chenqiwu <chenqiwu@xiaomi.com>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Mike Christie" <mchristi@redhat.com>,
	"Bart Van Assche" <bvanassche@acm.org>,
	"Amit Pundir" <amit.pundir@linaro.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Adrian Reber" <areber@redhat.com>,
	"Nicolas Viennot" <Nicolas.Viennot@twosigma.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	"John Stultz" <john.stultz@linaro.org>,
	"Pekka Enberg" <penberg@kernel.org>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Jan Glauber" <jan.glauber@gmail.com>,
	"Rob Landley" <rob@landley.net>,
	"Cyrill Gorcunov" <gorcunov@openvz.org>,
	"Serge E. Hallyn" <serge.hallyn@ubuntu.com>,
	"David Rientjes" <rientjes@google.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Rik van Riel" <riel@redhat.com>, "Mel Gorman" <mgorman@suse.de>,
	"Tang Chen" <tangchen@cn.fujitsu.com>,
	"Robin Holt" <holt@sgi.com>, "Shaohua Li" <shli@fusionio.com>,
	"Sasha Levin" <sasha.levin@oracle.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Minchan Kim" <minchan@kernel.org>
Subject: Re: [PATCH v7 3/3] mm: add a field to store names for private anonymous memory
Date: Thu, 3 Sep 2020 16:25:37 +0300	[thread overview]
Message-ID: <20200903132537.mp5e6o6ptgbkghxe@box> (raw)
In-Reply-To: <20200901161459.11772-4-sumit.semwal@linaro.org>

On Tue, Sep 01, 2020 at 09:44:59PM +0530, Sumit Semwal wrote:
> From: Colin Cross <ccross@google.com>
> 
> In many userspace applications, and especially in VM based applications
> like Android uses heavily, there are multiple different allocators in use.
>  At a minimum there is libc malloc and the stack, and in many cases there
> are libc malloc, the stack, direct syscalls to mmap anonymous memory, and
> multiple VM heaps (one for small objects, one for big objects, etc.).
> Each of these layers usually has its own tools to inspect its usage;
> malloc by compiling a debug version, the VM through heap inspection tools,
> and for direct syscalls there is usually no way to track them.
> 
> On Android we heavily use a set of tools that use an extended version of
> the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped
> in userspace and slice their usage by process, shared (COW) vs.  unique
> mappings, backing, etc.  This can account for real physical memory usage
> even in cases like fork without exec (which Android uses heavily to share
> as many private COW pages as possible between processes), Kernel SamePage
> Merging, and clean zero pages.  It produces a measurement of the pages
> that only exist in that process (USS, for unique), and a measurement of
> the physical memory usage of that process with the cost of shared pages
> being evenly split between processes that share them (PSS).
> 
> If all anonymous memory is indistinguishable then figuring out the real
> physical memory usage (PSS) of each heap requires either a pagemap walking
> tool that can understand the heap debugging of every layer, or for every
> layer's heap debugging tools to implement the pagemap walking logic, in
> which case it is hard to get a consistent view of memory across the whole
> system.
> 
> Tracking the information in userspace leads to all sorts of problems.
> It either needs to be stored inside the process, which means every
> process has to have an API to export its current heap information upon
> request, or it has to be stored externally in a filesystem that
> somebody needs to clean up on crashes.  It needs to be readable while
> the process is still running, so it has to have some sort of
> synchronization with every layer of userspace.  Efficiently tracking
> the ranges requires reimplementing something like the kernel vma
> trees, and linking to it from every layer of userspace.  It requires
> more memory, more syscalls, more runtime cost, and more complexity to
> separately track regions that the kernel is already tracking.
> 
> This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
> userspace-provided name for anonymous vmas.  The names of named anonymous
> vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>].

Hm. I guess that there might be tools that expect the field to be empty
for anonymous memory, no?
 
> Userspace can set the name for a region of memory by calling
> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
> Setting the name to NULL clears it.
> 
> The name is stored in a user pointer in the shared union in vm_area_struct
> that points to a null terminated string inside the user process.  vmas
> that point to the same address and are otherwise mergeable will be merged,
> but vmas that point to equivalent strings at different addresses will not
> be merged.
> 
> The idea to store a userspace pointer to reduce the complexity within mm
> (at the expense of the complexity of reading /proc/pid/mem) came from Dave
> Hansen.  This results in no runtime overhead in the mm subsystem other
> than comparing the anon_name pointers when considering vma merging.  The
> pointer is stored in a union with fields that are only used on file-backed
> mappings, so it does not increase memory usage.
> (Upstream changed to remove the union, so this patch adds it back as well)

IIUC, it gives userspace direct control of content of /proc/$PID/maps and
/proc/$PID/smaps. There's no verification of the given string whatsoever.
I'm sure security experts would find clever usage of the feature :P

-- 
 Kirill A. Shutemov


  reply	other threads:[~2020-09-03 13:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 16:14 [PATCH v7 0/3] Anonymous VMA naming patches Sumit Semwal
2020-09-01 16:14 ` [PATCH v7 1/3] mm: rearrange madvise code to allow for reuse Sumit Semwal
2020-09-01 16:14 ` [PATCH v7 2/3] mm: memory: Add access_remote_vm_locked variant Sumit Semwal
2020-09-01 16:14 ` [PATCH v7 3/3] mm: add a field to store names for private anonymous memory Sumit Semwal
2020-09-03 13:25   ` Kirill A. Shutemov [this message]
2020-09-03 13:43     ` Matthew Wilcox
2020-09-03 13:58       ` Kirill A. Shutemov
2020-09-03 15:59         ` Colin Cross
2020-09-03 17:31           ` Kees Cook
2020-09-03 17:54             ` Colin Cross
2020-09-03 18:00               ` Dave Hansen
2020-09-03 18:00   ` Kees Cook
2020-09-03 18:09     ` Dave Hansen
2020-09-03 18:26       ` Colin Cross
2020-09-03 18:40         ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200903132537.mp5e6o6ptgbkghxe@box \
    --to=kirill@shutemov.name \
    --cc=Nicolas.Viennot@twosigma.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.pundir@linaro.org \
    --cc=areber@redhat.com \
    --cc=bvanassche@acm.org \
    --cc=ccross@google.com \
    --cc=chenqiwu@xiaomi.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=corbet@lwn.net \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=ebiederm@xmission.com \
    --cc=gladkov.alexey@gmail.com \
    --cc=gorcunov@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=holt@sgi.com \
    --cc=hughd@google.com \
    --cc=jan.glauber@gmail.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=john.stultz@linaro.org \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mchehab+huawei@kernel.org \
    --cc=mchristi@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=oleg@redhat.com \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=rob@landley.net \
    --cc=sasha.levin@oracle.com \
    --cc=serge.hallyn@ubuntu.com \
    --cc=shli@fusionio.com \
    --cc=songliubraving@fb.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.