From: Waiman Long <longman@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>
Subject: Re: [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing
Date: Tue, 12 Nov 2019 12:27:40 -0500 [thread overview]
Message-ID: <fd29a337-c067-ebf6-4be2-3b6e2f703ac4@redhat.com> (raw)
In-Reply-To: <5059733e-95aa-2c9e-6f5d-4f45f6a130b3@oracle.com>
On 11/8/19 8:47 PM, Mike Kravetz wrote:
> On 11/8/19 11:10 AM, Mike Kravetz wrote:
>> On 11/7/19 6:04 PM, Davidlohr Bueso wrote:
>>> On Thu, 07 Nov 2019, Mike Kravetz wrote:
>>>
>>>> Note that huge_pmd_share now increments the page count with the semaphore
>>>> held just in read mode. It is OK to do increments in parallel without
>>>> synchronization. However, we don't want anyone else changing the count
>>>> while that check in huge_pmd_unshare is happening. Hence, the need for
>>>> taking the semaphore in write mode.
>>> This would be a nice addition to the changelog methinks.
>> Last night I remembered there is one place where we currently take
>> i_mmap_rwsem in read mode and potentially call huge_pmd_unshare. That
>> is in try_to_unmap_one. Yes, there is a potential race here today.
> Actually there is no race there today. Callers to huge_pmd_unshare
> hold the page table lock. So, this synchronizes those unshare calls
> from page migration and page poisoning.
>
>> But that race is somewhat contained as you need two threads doing some
>> combination of page migration and page poisoning to race. This change
>> now allows migration or poisoning to race with page fault. I would
>> really prefer if we do not open up the race window in this manner.
> But, we do open a race window by changing huge_pmd_share to take the
> i_mmap_rwsem in read mode as in the original patch.
>
> Here is the additional code needed to take the semaphore in write mode
> for the huge_pmd_unshare calls via try_to_unmap_one. We would need to
> combine this with Longman's patch. Please take a look and provide feedback.
> Some of the changes are subtle, especially the exception for MAP_PRIVATE
> mappings, but I tried to add sufficient comments.
>
> From 21735818a520705c8573b8d543b8f91aa187bd5d Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Fri, 8 Nov 2019 17:25:37 -0800
> Subject: [PATCH] Changes needed for taking i_mmap_rwsem in write mode before
> call to huge_pmd_unshare in try_to_unmap_one.
>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
> mm/hugetlb.c | 9 ++++++++-
> mm/memory-failure.c | 28 +++++++++++++++++++++++++++-
> mm/migrate.c | 27 +++++++++++++++++++++++++--
> 3 files changed, 60 insertions(+), 4 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f78891f92765..73d9136549a5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4883,7 +4883,14 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> * indicated by page_count > 1, unmap is achieved by clearing pud and
> * decrementing the ref count. If count == 1, the pte page is not shared.
> *
> - * called with page table lock held.
> + * Must be called while holding page table lock.
> + * In general, the caller should also hold the i_mmap_rwsem in write mode.
> + * This is to prevent races with page faults calling huge_pmd_share which
> + * will not be holding the page table lock, but will be holding i_mmap_rwsem
> + * in read mode. It is possible to call without holding i_mmap_rwsem in
> + * write mode if the caller KNOWS the page table is associated with a private
> + * mapping. This is because private mappings can not share PMDs and can
> + * not race with huge_pmd_share calls during page faults.
So the page table lock here is the huge_pte_lock(). Right? In
huge_pmd_share(), the pte lock has to be taken before one can share it.
So would you mind explaining where exactly is the race?
Thanks,
Longman
> *
> * returns: 1 successfully unmapped a shared pte page
> * 0 the underlying pte page is not shared, or it is the last user
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3151c87dff73..8f52b22cf71b 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1030,7 +1030,33 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> if (kill)
> collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
>
> - unmap_success = try_to_unmap(hpage, ttu);
> + if (!PageHuge(hpage)) {
> + unmap_success = try_to_unmap(hpage, ttu);
> + } else {
> + mapping = page_mapping(hpage);
> + if (mapping) {
> + /*
> + * For hugetlb pages, try_to_unmap could potentially
> + * call huge_pmd_unshare. Because of this, take
> + * semaphore in write mode here and set TTU_RMAP_LOCKED
> + * to indicate we have taken the lock at this higher
> + * level.
> + */
> + i_mmap_lock_write(mapping);
> + unmap_success = try_to_unmap(hpage,
> + ttu|TTU_RMAP_LOCKED);
> + i_mmap_unlock_write(mapping);
> + } else {
> + /*
> + * !mapping implies a MAP_PRIVATE huge page mapping.
> + * Since PMDs will never be shared in a private
> + * mapping, it is safe to let huge_pmd_unshare be
> + * called with the semaphore in read mode.
> + */
> + unmap_success = try_to_unmap(hpage, ttu);
> + }
> + }
> +
> if (!unmap_success)
> pr_err("Memory failure: %#lx: failed to unmap page (mapcount=%d)\n",
> pfn, page_mapcount(hpage));
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4fe45d1428c8..9cae5a4f1e48 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1333,8 +1333,31 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> goto put_anon;
>
> if (page_mapped(hpage)) {
> - try_to_unmap(hpage,
> - TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> + struct address_space *mapping = page_mapping(hpage);
> +
> + if (mapping) {
> + /*
> + * try_to_unmap could potentially call huge_pmd_unshare.
> + * Because of this, take semaphore in write mode here
> + * and set TTU_RMAP_LOCKED to indicate we have taken
> + * the lock at this higher level.
> + */
> + i_mmap_lock_write(mapping);
> + try_to_unmap(hpage,
> + TTU_MIGRATION|TTU_IGNORE_MLOCK|
> + TTU_IGNORE_ACCESS|TTU_RMAP_LOCKED);
> + i_mmap_unlock_write(mapping);
> + } else {
> + /*
> + * !mapping implies a MAP_PRIVATE huge page mapping.
> + * Since PMDs will never be shared in a private
> + * mapping, it is safe to let huge_pmd_unshare be
> + * called with the semaphore in read mode.
> + */
> + try_to_unmap(hpage,
> + TTU_MIGRATION|TTU_IGNORE_MLOCK|
> + TTU_IGNORE_ACCESS);
> + }
> page_was_mapped = 1;
> }
>
next prev parent reply other threads:[~2019-11-12 17:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-07 19:06 [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing Waiman Long
2019-11-07 19:13 ` Waiman Long
2019-11-07 19:15 ` Waiman Long
2019-11-07 19:31 ` Mike Kravetz
2019-11-07 19:42 ` Matthew Wilcox
2019-11-07 21:06 ` Waiman Long
2019-11-07 19:54 ` Matthew Wilcox
2019-11-07 21:27 ` Waiman Long
2019-11-07 21:49 ` Mike Kravetz
2019-11-07 21:56 ` Mike Kravetz
2019-11-08 2:04 ` Davidlohr Bueso
2019-11-08 3:22 ` Andrew Morton
2019-11-08 19:10 ` Mike Kravetz
2019-11-09 1:47 ` Mike Kravetz
2019-11-12 17:27 ` Waiman Long [this message]
2019-11-12 23:11 ` Mike Kravetz
2019-11-13 2:55 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd29a337-c067-ebf6-4be2-3b6e2f703ac4@redhat.com \
--to=longman@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=will.deacon@arm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).