linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Davidlohr Bueso <dave@stgolabs.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>
Subject: Re: [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing
Date: Thu, 7 Nov 2019 16:27:18 -0500	[thread overview]
Message-ID: <ba7a97a5-0ca2-2ff6-0b0c-08ed19bfc1fe@redhat.com> (raw)
In-Reply-To: <20191107195441.GF11823@bombadil.infradead.org>

On 11/7/19 2:54 PM, Matthew Wilcox wrote:
> On Thu, Nov 07, 2019 at 02:06:28PM -0500, Waiman Long wrote:
>> A customer with large SMP systems (up to 16 sockets) with application
>> that uses large amount of static hugepages (~500-1500GB) are experiencing
>> random multisecond delays. These delays was caused by the long time it
>> took to scan the VMA interval tree with mmap_sem held.
>>
>> The sharing of huge PMD does not require changes to the i_mmap at all.
>> As a result, we can just take the read lock and let other threads
>> searching for the right VMA to share in parallel. Once the right
>> VMA is found, either the PMD lock (2M huge page for x86-64) or the
>> mm->page_table_lock will be acquired to perform the actual PMD sharing.
>>
>> Lock contention, if present, will happen in the spinlock. That is much
>> better than contention in the rwsem where the time needed to scan the
>> the interval tree is indeterminate.
> I don't think this description really explains the contention argument
> well.  There are _more_ PMD locks than there are i_mmap_sem locks, so
> processes accessing different parts of the same file can work in parallel.

I am sorry for not being clear enough. PMD lock contention here means 2
or more tasks that happens to touch the same PMD. Because of the use of
PMD lock, modification of the same PMD cannot happen in parallel. If
they touch different PMDs, they can do that in parallel. Previously,
they are contending the same rwsem write lock and hence have to be done
serially.

> Are there other current users of the write lock that could use a read lock?
> At first blush, it would seem that unmap_ref_private() also only needs
> a read lock on the i_mmap tree.  I don't think hugetlb_change_protection()
> needs the write lock either.  Nor retract_page_tables().

It is possible that other locking sites can be converted to use read
lock, but it is outside the scope of this patch.

Cheers,
Longman


  reply	other threads:[~2019-11-07 21:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 19:06 [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing Waiman Long
2019-11-07 19:13 ` Waiman Long
2019-11-07 19:15 ` Waiman Long
2019-11-07 19:31 ` Mike Kravetz
2019-11-07 19:42 ` Matthew Wilcox
2019-11-07 21:06   ` Waiman Long
2019-11-07 19:54 ` Matthew Wilcox
2019-11-07 21:27   ` Waiman Long [this message]
2019-11-07 21:49   ` Mike Kravetz
2019-11-07 21:56     ` Mike Kravetz
2019-11-08  2:04     ` Davidlohr Bueso
2019-11-08  3:22       ` Andrew Morton
2019-11-08 19:10       ` Mike Kravetz
2019-11-09  1:47         ` Mike Kravetz
2019-11-12 17:27           ` Waiman Long
2019-11-12 23:11             ` Mike Kravetz
2019-11-13  2:55               ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba7a97a5-0ca2-2ff6-0b0c-08ed19bfc1fe@redhat.com \
    --to=longman@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).