linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Suren Baghdasaryan <surenb@google.com>, akpm@linux-foundation.org
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	dchinner@redhat.com, casey@schaufler-ca.com,
	ben.wolsieffer@hefring.com, paulmck@kernel.org, david@redhat.com,
	avagin@google.com, usama.anjum@collabora.com, peterx@redhat.com,
	hughd@google.com, ryan.roberts@arm.com,
	wangkefeng.wang@huawei.com, Liam.Howlett@Oracle.com,
	yuzhao@google.com, axelrasmussen@google.com, lstoakes@gmail.com,
	talumbau@google.com, willy@infradead.org,
	mgorman@techsingularity.net, jhubbard@nvidia.com,
	vishal.moola@gmail.com, mathieu.desnoyers@efficios.com,
	dhowells@redhat.com, jgg@ziepe.ca, sidhartha.kumar@oracle.com,
	andriy.shevchenko@linux.intel.com, yangxingui@huawei.com,
	keescook@chromium.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	kernel-team@android.com
Subject: Re: [RFC 0/3] reading proc/pid/maps under RCU
Date: Tue, 16 Jan 2024 15:46:13 +0100	[thread overview]
Message-ID: <74005ee1-b6d8-4ab5-ba97-92bec302cc4b@suse.cz> (raw)
In-Reply-To: <1bc8a5df-b413-4869-8931-98f5b9e82fe5@suse.cz>

On 1/16/24 15:42, Vlastimil Babka wrote:
> On 1/15/24 19:38, Suren Baghdasaryan wrote:
> 
> Hi,
> 
>> The issue this patchset is trying to address is mmap_lock contention when
>> a low priority task (monitoring, data collecting, etc.) blocks a higher
>> priority task from making updated to the address space. The contention is
>> due to the mmap_lock being held for read when reading proc/pid/maps.
>> With maple_tree introduction, VMA tree traversals are RCU-safe and per-vma
>> locks make VMA access RCU-safe. this provides an opportunity for lock-less
>> reading of proc/pid/maps. We still need to overcome a couple obstacles:
>> 1. Make all VMA pointer fields used for proc/pid/maps content generation
>> RCU-safe;
>> 2. Ensure that proc/pid/maps data tearing, which is currently possible at
>> page boundaries only, does not get worse.
> 
> Hm I thought we were to only choose this more complicated in case additional
> tearing becomes a problem, and at first assume that if software can deal
> with page boundary tearing, it can deal with sub-page tearing too?
> 
>> The patchset deals with these issues but there is a downside which I would
>> like to get input on:
>> This change introduces unfairness towards the reader of proc/pid/maps,
>> which can be blocked by an overly active/malicious address space modifyer.
> 
> So this is a consequence of the validate() operation, right? We could avoid
> this if we allowed sub-page tearing.
> 
>> A couple of ways I though we can address this issue are:
>> 1. After several lock-less retries (or some time limit) to fall back to
>> taking mmap_lock.
>> 2. Employ lock-less reading only if the reader has low priority,
>> indicating that blocking it is not critical.
>> 3. Introducing a separate procfs file which publishes the same data in
>> lock-less manner.

Oh and if this option 3 becomes necessary, then such new file shouldn't
validate() either, and whoever wants to avoid the reader contention and
converts their monitoring to the new file will have to account for this
possible extra tearing from the start. So I would suggest trying to change
the existing file with no validate() first, and if existing userspace gets
broken, employ option 3. This would mean no validate() in either case?

>> I imagine a combination of these approaches can also be employed.
>> I would like to get feedback on this from the Linux community.
>> 
>> Note: mmap_read_lock/mmap_read_unlock sequence inside validate_map()
>> can be replaced with more efficiend rwsem_wait() proposed by Matthew
>> in [1].
>> 
>> [1] https://lore.kernel.org/all/ZZ1+ZicgN8dZ3zj3@casper.infradead.org/
>> 
>> Suren Baghdasaryan (3):
>>   mm: make vm_area_struct anon_name field RCU-safe
>>   seq_file: add validate() operation to seq_operations
>>   mm/maps: read proc/pid/maps under RCU
>> 
>>  fs/proc/internal.h        |   3 +
>>  fs/proc/task_mmu.c        | 130 ++++++++++++++++++++++++++++++++++----
>>  fs/seq_file.c             |  24 ++++++-
>>  include/linux/mm_inline.h |  10 ++-
>>  include/linux/mm_types.h  |   3 +-
>>  include/linux/seq_file.h  |   1 +
>>  mm/madvise.c              |  30 +++++++--
>>  7 files changed, 181 insertions(+), 20 deletions(-)
>> 
> 



  reply	other threads:[~2024-01-16 14:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15 18:38 [RFC 0/3] reading proc/pid/maps under RCU Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 1/3] mm: make vm_area_struct anon_name field RCU-safe Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 2/3] seq_file: add validate() operation to seq_operations Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 3/3] mm/maps: read proc/pid/maps under RCU Suren Baghdasaryan
2024-01-16 14:42 ` [RFC 0/3] reading " Vlastimil Babka
2024-01-16 14:46   ` Vlastimil Babka [this message]
2024-01-16 17:57     ` Suren Baghdasaryan
2024-01-18 17:58       ` Suren Baghdasaryan
2024-01-22  7:23         ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74005ee1-b6d8-4ab5-ba97-92bec302cc4b@suse.cz \
    --to=vbabka@suse.cz \
    --cc=Liam.Howlett@Oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=avagin@google.com \
    --cc=axelrasmussen@google.com \
    --cc=ben.wolsieffer@hefring.com \
    --cc=brauner@kernel.org \
    --cc=casey@schaufler-ca.com \
    --cc=david@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@techsingularity.net \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=surenb@google.com \
    --cc=talumbau@google.com \
    --cc=usama.anjum@collabora.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=yangxingui@huawei.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).