All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Xu <xuyu@linux.alibaba.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hughd@google.com, akpm@linux-foundation.org,
	gavin.dg@linux.alibaba.com
Subject: Re: [PATCH v2] mm, thp: use head page in __migration_entry_wait
Date: Tue, 8 Jun 2021 21:22:28 +0800	[thread overview]
Message-ID: <57e151a8-03b2-3458-0178-21edb4ce97d2@linux.alibaba.com> (raw)
In-Reply-To: <20210608120026.ugfh72ydjeba44bo@box.shutemov.name>

On 6/8/21 8:00 PM, Kirill A. Shutemov wrote:
> On Tue, Jun 08, 2021 at 05:22:39PM +0800, Xu Yu wrote:
>> We notice that hung task happens in a conner but practical scenario when
>> CONFIG_PREEMPT_NONE is enabled, as follows.
>>
>> Process 0                       Process 1                     Process 2..Inf
>> split_huge_page_to_list
>>      unmap_page
>>          split_huge_pmd_address
>>                                  __migration_entry_wait(head)
>>                                                                __migration_entry_wait(tail)
>>      remap_page (roll back)
>>          remove_migration_ptes
>>              rmap_walk_anon
>>                  cond_resched
>>
>> Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
>> copy_to_user in fstat, which will immediately fault again without
>> rescheduling, and thus occupy the cpu fully.
>>
>> When there are too many processes performing __migration_entry_wait on
>> tail page, remap_page will never be done after cond_resched.
>>
>> This makes __migration_entry_wait operate on the compound head page,
>> thus waits for remap_page to complete, whether the THP is split
>> successfully or roll back.
>>
>> Note that put_and_wait_on_page_locked helps to drop the page reference
>> acquired with get_page_unless_zero, as soon as the page is on the wait
>> queue, before actually waiting. So splitting the THP is only prevented
>> for a brief interval.
>>
>> Fixes: ba98828088ad ("thp: add option to setup migration entries during PMD split")
>> Suggested-by: Hugh Dickins <hughd@google.com>
>> Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
>> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> 
> Looks good to me:
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> But there's one quirk: if split succeed we effectively wait on wrong
> page to be unlocked. And it may take indefinite time if split_huge_page()
> was called on the head page.

Inspired by you, I look into the codes, and have a new question (nothing
to do with this patch).

If we split_huge_page_to_list on *tail* page (in fact, I haven't seen
that used yet),

mm/huge_memory.c:2666 checks "VM_BUG_ON_PAGE(!PageLocked(head), head);"
in split_huge_page_to_list(), while

mm/huge_memory.c:2497 does "unlock_page(subpage)", where subpage can
be head in this scenario, in __split_huge_page().

My confusion is
1) how the pin on the @subpage is got outside split_huge_page_to_list()?
    can we ever get tail page?

2) head page is locked outside split_huge_page_to_list(), but unlocked
    in __split_huge_page()?


> 
> Maybe we should consider waking up head waiter on head page, even if it is
> still locked after split?
> 
> Something like this (untested):
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 63ed6b25deaa..f79a38e21e53 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2535,6 +2535,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>   		 */
>   		put_page(subpage);
>   	}
> +
> +	if (page == head)
> +		wake_up_page_bit(page, PG_locked);
>   }
> 
>   int total_mapcount(struct page *page)
> 

-- 
Thanks,
Yu

  parent reply	other threads:[~2021-06-08 13:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-08  9:22 [PATCH v2] mm, thp: use head page in __migration_entry_wait Xu Yu
2021-06-08 12:00 ` Kirill A. Shutemov
2021-06-08 12:32   ` Matthew Wilcox
2021-06-08 12:58     ` Kirill A. Shutemov
2021-06-08 13:35       ` Matthew Wilcox
2021-06-09  9:59         ` Kirill A. Shutemov
2021-06-08 13:22   ` Yu Xu [this message]
2021-06-09 10:08     ` Kirill A. Shutemov
2021-06-08 20:00 ` Hugh Dickins
2021-06-08 20:00   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57e151a8-03b2-3458-0178-21edb4ce97d2@linux.alibaba.com \
    --to=xuyu@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=gavin.dg@linux.alibaba.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.