linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: ccross@google.com, sumit.semwal@linaro.org, mhocko@suse.com,
	 dave.hansen@intel.com, keescook@chromium.org,
	willy@infradead.org,  kirill.shutemov@linux.intel.com,
	vbabka@suse.cz, hannes@cmpxchg.org,  corbet@lwn.net,
	viro@zeniv.linux.org.uk, rdunlap@infradead.org,
	 kaleshsingh@google.com, peterx@redhat.com, rppt@kernel.org,
	 peterz@infradead.org, catalin.marinas@arm.com,
	vincenzo.frascino@arm.com,  chinwen.chang@mediatek.com,
	axelrasmussen@google.com, aarcange@redhat.com,  jannh@google.com,
	apopple@nvidia.com, jhubbard@nvidia.com, yuzhao@google.com,
	 will@kernel.org, fenghua.yu@intel.com,
	thunder.leizhen@huawei.com,  hughd@google.com,
	feng.tang@intel.com, jgg@ziepe.ca, guro@fb.com,
	 tglx@linutronix.de, krisman@collabora.com,
	chris.hyser@oracle.com,  pcc@google.com, ebiederm@xmission.com,
	axboe@kernel.dk, legion@kernel.org,  eb@emlix.com,
	songmuchun@bytedance.com, viresh.kumar@linaro.org,
	 thomascedeno@google.com, sashal@kernel.org, cxfcosmos@gmail.com,
	 linux@rasmusvillemoes.dk, linux-kernel@vger.kernel.org,
	 linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org,  kernel-team@android.com, surenb@google.com
Subject: [PATCH v8 0/3] Anonymous VMA naming patches
Date: Fri, 27 Aug 2021 12:18:55 -0700	[thread overview]
Message-ID: <20210827191858.2037087-1-surenb@google.com> (raw)

There were a number of previous attempts to upstream support for anonymous
VMA naming. The original submission by Colin Cross [1] implemented a
dictionary of refcounted names to reuse same name strings. Dave Hansen
suggested [2] to use userspace pointers instead and the patch was rewritten
that way. The last v7 version of this patch was posted by Sumit Semwal [3]
and a very similar patch has been used in Android to name anonymous VMAs
for a number of years. Concerns about this patch were raised by Kees Cook
[4] noting the lack of string sanitization and the use of userspace
pointers from the kernel. In conclusion [5], it was suggested to
strndup_user the strings from userspace, perform appropriate checks and
store a copy as a vm_area_struct member. Performance impact from
additional strdup's during fork() should be measured by allocating a large
number (64k) of VMAs with longest names and timing fork()s.

This patchset implements the suggested approach in the first 2 patches and
the 3rd patch implements simple refcounting to avoid strdup'ing the names
during fork() and minimize the regression.

Proposed test was conducted on an ARM64 Android device with CPU frequency
locked at 2.4GHz, performance governor and Android system being stopped
(adb shell stop) to minimize the noise. Test includes 3 different
scenarios. In each scenario a process with 64K named anonymous VMAs forks
children 1000 times while timing each fork and reporting the average time.
The scenarios differ in the VMA content:

1. VMAs are not populated with any data (not realistic scenario but
helps in emphasizing the regression).
2. Each VMA contains 1 page populated with random data.
3. Each VMA contains 10 pages populated with random data.

With the first 2 patches implementing strdup approach, the average fork()
times are:

                              unnamed VMAs      named VMAs      REGRESSION
Unpopulated VMAs              16.73ms           23.34ms         39.51%
VMAs with 1 page of data      51.98ms           59.94ms         15.31%
VMAs with 10 pages of data    66.86ms           76.31ms         14.13%

From the perf results, the regression can be attributed to strlen() and
strdup() calls. The regression shrinking with the increased amount of
populated data can be attributed mostly to anon_vma_fork() and
copy_page_range() consuming more time during fork().

After the refcounting implemented in the last patch of this series the
results are:

                              unnamed VMAs      named VMAs      REGRESSION
Unpopulated VMAs              16.36ms           18.35ms         12.16%%
VMAs with 1 page of data      48.16ms           51.30ms         6.52%
VMAs with 10 pages of data    64.23ms           67.69ms         5.39%

From the perf results, the regression can be attributed to
refcount_inc_checked() (called from kref_get()).

While there is obviously a measurable regression, 64K named anonymous VMAs
is truly a worst case scenario. In the real usage, the only current user of
this feature, namely Android, rarely has processes with the number of VMAs
reaching 4000 (that's the highest I've measured). The regression of forking
a process with that number of VMAs is at the noise level.

1. https://lore.kernel.org/linux-mm/1372901537-31033-1-git-send-email-ccross@android.com/
2. https://lore.kernel.org/linux-mm/51DDFA02.9040707@intel.com/
3. https://lore.kernel.org/linux-mm/20200901161459.11772-1-sumit.semwal@linaro.org/
4. https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
5. https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/

Colin Cross (2):
  mm: rearrange madvise code to allow for reuse
  mm: add a field to store names for private anonymous memory

Suren Baghdasaryan (1):
  mm: add anonymous vma name refcounting

 Documentation/filesystems/proc.rst |   2 +
 fs/proc/task_mmu.c                 |  14 +-
 fs/userfaultfd.c                   |   7 +-
 include/linux/mm.h                 |  13 +-
 include/linux/mm_types.h           |  55 +++-
 include/uapi/linux/prctl.h         |   3 +
 kernel/fork.c                      |   2 +
 kernel/sys.c                       |  48 ++++
 mm/madvise.c                       | 447 +++++++++++++++++++----------
 mm/mempolicy.c                     |   3 +-
 mm/mlock.c                         |   2 +-
 mm/mmap.c                          |  38 +--
 mm/mprotect.c                      |   2 +-
 13 files changed, 462 insertions(+), 174 deletions(-)

-- 
2.33.0.259.gc128427fd7-goog



             reply	other threads:[~2021-08-27 19:19 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 19:18 Suren Baghdasaryan [this message]
2021-08-27 19:18 ` [PATCH v8 1/3] mm: rearrange madvise code to allow for reuse Suren Baghdasaryan
2021-08-28  0:14   ` Kees Cook
2021-08-28  0:58     ` Suren Baghdasaryan
2021-08-28 16:19   ` Cyrill Gorcunov
2021-08-28 21:59     ` Suren Baghdasaryan
2021-08-27 19:18 ` [PATCH v8 2/3] mm: add a field to store names for private anonymous memory Suren Baghdasaryan
2021-08-28  1:47   ` Matthew Wilcox
2021-08-28  5:52     ` Kees Cook
2021-08-28 21:47       ` Suren Baghdasaryan
2021-08-30  8:12         ` Rasmus Villemoes
2021-08-30 16:16           ` Suren Baghdasaryan
2021-08-30 16:59             ` Matthew Wilcox
2021-08-31 17:21               ` Suren Baghdasaryan
2021-08-28 21:28   ` Cyrill Gorcunov
2021-08-28 21:53     ` Suren Baghdasaryan
2021-09-01  8:09   ` Michal Hocko
2021-09-01 15:28     ` Suren Baghdasaryan
2021-09-01  8:10   ` Michal Hocko
2021-09-01 15:42     ` Suren Baghdasaryan
2021-09-03 11:49       ` Michal Hocko
2021-09-03 15:47         ` Suren Baghdasaryan
2021-08-27 19:18 ` [PATCH v8 3/3] mm: add anonymous vma name refcounting Suren Baghdasaryan
2021-08-28  5:28   ` Kees Cook
2021-08-28 21:13     ` Suren Baghdasaryan
2021-08-30  7:03   ` Rolf Eike Beer
2021-08-30 16:12     ` Suren Baghdasaryan
2021-08-28 12:48 ` [PATCH v8 0/3] Anonymous VMA naming patches Pavel Machek
2021-08-28 22:06   ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210827191858.2037087-1-surenb@google.com \
    --to=surenb@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=ccross@google.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=chris.hyser@oracle.com \
    --cc=corbet@lwn.net \
    --cc=cxfcosmos@gmail.com \
    --cc=dave.hansen@intel.com \
    --cc=eb@emlix.com \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kaleshsingh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=krisman@collabora.com \
    --cc=legion@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mhocko@suse.com \
    --cc=pcc@google.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sashal@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=thomascedeno@google.com \
    --cc=thunder.leizhen@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=viresh.kumar@linaro.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).