linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	 dchinner@redhat.com, casey@schaufler-ca.com,
	ben.wolsieffer@hefring.com,  paulmck@kernel.org,
	david@redhat.com, avagin@google.com,  usama.anjum@collabora.com,
	peterx@redhat.com, hughd@google.com,  ryan.roberts@arm.com,
	wangkefeng.wang@huawei.com, Liam.Howlett@Oracle.com,
	 yuzhao@google.com, axelrasmussen@google.com, lstoakes@gmail.com,
	 talumbau@google.com, willy@infradead.org, vbabka@suse.cz,
	 mgorman@techsingularity.net, jhubbard@nvidia.com,
	vishal.moola@gmail.com,  mathieu.desnoyers@efficios.com,
	dhowells@redhat.com, jgg@ziepe.ca,  sidhartha.kumar@oracle.com,
	andriy.shevchenko@linux.intel.com,  yangxingui@huawei.com,
	keescook@chromium.org, linux-kernel@vger.kernel.org,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	kernel-team@android.com,  surenb@google.com
Subject: [RFC 0/3] reading proc/pid/maps under RCU
Date: Mon, 15 Jan 2024 10:38:33 -0800	[thread overview]
Message-ID: <20240115183837.205694-1-surenb@google.com> (raw)

The issue this patchset is trying to address is mmap_lock contention when
a low priority task (monitoring, data collecting, etc.) blocks a higher
priority task from making updated to the address space. The contention is
due to the mmap_lock being held for read when reading proc/pid/maps.
With maple_tree introduction, VMA tree traversals are RCU-safe and per-vma
locks make VMA access RCU-safe. this provides an opportunity for lock-less
reading of proc/pid/maps. We still need to overcome a couple obstacles:
1. Make all VMA pointer fields used for proc/pid/maps content generation
RCU-safe;
2. Ensure that proc/pid/maps data tearing, which is currently possible at
page boundaries only, does not get worse.

The patchset deals with these issues but there is a downside which I would
like to get input on:
This change introduces unfairness towards the reader of proc/pid/maps,
which can be blocked by an overly active/malicious address space modifyer.
A couple of ways I though we can address this issue are:
1. After several lock-less retries (or some time limit) to fall back to
taking mmap_lock.
2. Employ lock-less reading only if the reader has low priority,
indicating that blocking it is not critical.
3. Introducing a separate procfs file which publishes the same data in
lock-less manner.

I imagine a combination of these approaches can also be employed.
I would like to get feedback on this from the Linux community.

Note: mmap_read_lock/mmap_read_unlock sequence inside validate_map()
can be replaced with more efficiend rwsem_wait() proposed by Matthew
in [1].

[1] https://lore.kernel.org/all/ZZ1+ZicgN8dZ3zj3@casper.infradead.org/

Suren Baghdasaryan (3):
  mm: make vm_area_struct anon_name field RCU-safe
  seq_file: add validate() operation to seq_operations
  mm/maps: read proc/pid/maps under RCU

 fs/proc/internal.h        |   3 +
 fs/proc/task_mmu.c        | 130 ++++++++++++++++++++++++++++++++++----
 fs/seq_file.c             |  24 ++++++-
 include/linux/mm_inline.h |  10 ++-
 include/linux/mm_types.h  |   3 +-
 include/linux/seq_file.h  |   1 +
 mm/madvise.c              |  30 +++++++--
 7 files changed, 181 insertions(+), 20 deletions(-)

-- 
2.43.0.381.gb435a96ce8-goog



             reply	other threads:[~2024-01-15 18:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15 18:38 Suren Baghdasaryan [this message]
2024-01-15 18:38 ` [RFC 1/3] mm: make vm_area_struct anon_name field RCU-safe Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 2/3] seq_file: add validate() operation to seq_operations Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 3/3] mm/maps: read proc/pid/maps under RCU Suren Baghdasaryan
2024-01-16 14:42 ` [RFC 0/3] reading " Vlastimil Babka
2024-01-16 14:46   ` Vlastimil Babka
2024-01-16 17:57     ` Suren Baghdasaryan
2024-01-18 17:58       ` Suren Baghdasaryan
2024-01-22  7:23         ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240115183837.205694-1-surenb@google.com \
    --to=surenb@google.com \
    --cc=Liam.Howlett@Oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=avagin@google.com \
    --cc=axelrasmussen@google.com \
    --cc=ben.wolsieffer@hefring.com \
    --cc=brauner@kernel.org \
    --cc=casey@schaufler-ca.com \
    --cc=david@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@techsingularity.net \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=talumbau@google.com \
    --cc=usama.anjum@collabora.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=yangxingui@huawei.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).