From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
dchinner@redhat.com, casey@schaufler-ca.com,
ben.wolsieffer@hefring.com, paulmck@kernel.org,
david@redhat.com, avagin@google.com, usama.anjum@collabora.com,
peterx@redhat.com, hughd@google.com, ryan.roberts@arm.com,
wangkefeng.wang@huawei.com, Liam.Howlett@Oracle.com,
yuzhao@google.com, axelrasmussen@google.com, lstoakes@gmail.com,
talumbau@google.com, willy@infradead.org, vbabka@suse.cz,
mgorman@techsingularity.net, jhubbard@nvidia.com,
vishal.moola@gmail.com, mathieu.desnoyers@efficios.com,
dhowells@redhat.com, jgg@ziepe.ca, sidhartha.kumar@oracle.com,
andriy.shevchenko@linux.intel.com, yangxingui@huawei.com,
keescook@chromium.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
kernel-team@android.com, surenb@google.com
Subject: [RFC 0/3] reading proc/pid/maps under RCU
Date: Mon, 15 Jan 2024 10:38:33 -0800 [thread overview]
Message-ID: <20240115183837.205694-1-surenb@google.com> (raw)
The issue this patchset is trying to address is mmap_lock contention when
a low priority task (monitoring, data collecting, etc.) blocks a higher
priority task from making updated to the address space. The contention is
due to the mmap_lock being held for read when reading proc/pid/maps.
With maple_tree introduction, VMA tree traversals are RCU-safe and per-vma
locks make VMA access RCU-safe. this provides an opportunity for lock-less
reading of proc/pid/maps. We still need to overcome a couple obstacles:
1. Make all VMA pointer fields used for proc/pid/maps content generation
RCU-safe;
2. Ensure that proc/pid/maps data tearing, which is currently possible at
page boundaries only, does not get worse.
The patchset deals with these issues but there is a downside which I would
like to get input on:
This change introduces unfairness towards the reader of proc/pid/maps,
which can be blocked by an overly active/malicious address space modifyer.
A couple of ways I though we can address this issue are:
1. After several lock-less retries (or some time limit) to fall back to
taking mmap_lock.
2. Employ lock-less reading only if the reader has low priority,
indicating that blocking it is not critical.
3. Introducing a separate procfs file which publishes the same data in
lock-less manner.
I imagine a combination of these approaches can also be employed.
I would like to get feedback on this from the Linux community.
Note: mmap_read_lock/mmap_read_unlock sequence inside validate_map()
can be replaced with more efficiend rwsem_wait() proposed by Matthew
in [1].
[1] https://lore.kernel.org/all/ZZ1+ZicgN8dZ3zj3@casper.infradead.org/
Suren Baghdasaryan (3):
mm: make vm_area_struct anon_name field RCU-safe
seq_file: add validate() operation to seq_operations
mm/maps: read proc/pid/maps under RCU
fs/proc/internal.h | 3 +
fs/proc/task_mmu.c | 130 ++++++++++++++++++++++++++++++++++----
fs/seq_file.c | 24 ++++++-
include/linux/mm_inline.h | 10 ++-
include/linux/mm_types.h | 3 +-
include/linux/seq_file.h | 1 +
mm/madvise.c | 30 +++++++--
7 files changed, 181 insertions(+), 20 deletions(-)
--
2.43.0.381.gb435a96ce8-goog
next reply other threads:[~2024-01-15 18:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-15 18:38 Suren Baghdasaryan [this message]
2024-01-15 18:38 ` [RFC 1/3] mm: make vm_area_struct anon_name field RCU-safe Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 2/3] seq_file: add validate() operation to seq_operations Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 3/3] mm/maps: read proc/pid/maps under RCU Suren Baghdasaryan
2024-01-16 14:42 ` [RFC 0/3] reading " Vlastimil Babka
2024-01-16 14:46 ` Vlastimil Babka
2024-01-16 17:57 ` Suren Baghdasaryan
2024-01-18 17:58 ` Suren Baghdasaryan
2024-01-22 7:23 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240115183837.205694-1-surenb@google.com \
--to=surenb@google.com \
--cc=Liam.Howlett@Oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andriy.shevchenko@linux.intel.com \
--cc=avagin@google.com \
--cc=axelrasmussen@google.com \
--cc=ben.wolsieffer@hefring.com \
--cc=brauner@kernel.org \
--cc=casey@schaufler-ca.com \
--cc=david@redhat.com \
--cc=dchinner@redhat.com \
--cc=dhowells@redhat.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=keescook@chromium.org \
--cc=kernel-team@android.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@techsingularity.net \
--cc=paulmck@kernel.org \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=sidhartha.kumar@oracle.com \
--cc=talumbau@google.com \
--cc=usama.anjum@collabora.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=yangxingui@huawei.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).