All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Baoquan He <bhe@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, pifang@redhat.com,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	aarcange@redhat.com, Mel Gorman <mgorman@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>
Subject: Re: Memory hotplug softlock issue
Date: Mon, 19 Nov 2018 13:51:21 +0100	[thread overview]
Message-ID: <20181119125121.GK22247@dhcp22.suse.cz> (raw)
In-Reply-To: <20181119124033.GJ22247@dhcp22.suse.cz>

On Mon 19-11-18 13:40:33, Michal Hocko wrote:
> On Mon 19-11-18 18:52:02, Baoquan He wrote:
> [...]
> 
> There are few stacks directly in the offline path but those should be
> OK.
> The real culprit seems to be the swap in code
> 
> > [  +1.734416] CPU: 255 PID: 5558 Comm: stress Tainted: G             L    4.20.0-rc2+ #7
> > [  +0.007927] Hardware name:  9008/IT91SMUB, BIOS BLXSV512 03/22/2018
> > [  +0.006297] Call Trace:
> > [  +0.002537]  dump_stack+0x46/0x60
> > [  +0.003386]  __migration_entry_wait.cold.65+0x5/0x14
> > [  +0.005043]  do_swap_page+0x84e/0x960
> > [  +0.003727]  ? arch_tlb_finish_mmu+0x29/0xc0
> > [  +0.006412]  __handle_mm_fault+0x933/0x1330
> > [  +0.004265]  handle_mm_fault+0xc4/0x250
> > [  +0.003915]  __do_page_fault+0x2b7/0x510
> > [  +0.003990]  do_page_fault+0x2c/0x110
> > [  +0.003729]  ? page_fault+0x8/0x30
> > [  +0.003462]  page_fault+0x1e/0x30
> 
> There are many traces to this path. We are 
> 	/*
> 	 * Once page cache replacement of page migration started, page_count
> 	 * *must* be zero. And, we don't want to call wait_on_page_locked()
> 	 * against a page without get_page().
> 	 * So, we use get_page_unless_zero(), here. Even failed, page fault
> 	 * will occur again.
> 	 */
> 	if (!get_page_unless_zero(page))
> 		goto out;
> 	pte_unmap_unlock(ptep, ptl);
> 	wait_on_page_locked(page);
> 
> taking a reference to the page under the migration. I have to think
> about this much more but I suspec this is just calling for a problem.
> 
> Cc migration experts. For you background information. We are seeing
> memory offline not being able to converge because few heavily used pages
> fail to migrate away - e.g. http://lkml.kernel.org/r/20181116012433.GU2653@MiWiFi-R3L-srv
> A debugging page to dump stack for these pages http://lkml.kernel.org/r/20181116091409.GD14706@dhcp22.suse.cz
> shows that references are taken from the swap in code (above). How are
> we supposed to converge when the swapin code waits for the migration to
> finish with the reference count elevated?

Just to clarify. This is not only about swapin obviously. Any caller of
__migration_entry_wait is affected the same way AFAICS.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-11-19 12:51 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  7:09 Memory hotplug softlock issue Baoquan He
2018-11-14  7:16 ` Baoquan He
2018-11-14  7:16   ` Baoquan He
2018-11-14  8:18 ` David Hildenbrand
2018-11-14  9:00   ` Baoquan He
2018-11-14  9:25     ` David Hildenbrand
2018-11-14  9:41       ` Michal Hocko
2018-11-14  9:48         ` David Hildenbrand
2018-11-14 10:04           ` Michal Hocko
2018-11-14  9:01   ` Michal Hocko
2018-11-14  9:22     ` David Hildenbrand
2018-11-14  9:37       ` Michal Hocko
2018-11-14  9:39         ` David Hildenbrand
2018-11-14 14:52     ` Baoquan He
2018-11-14 15:00       ` Michal Hocko
2018-11-15  5:10         ` Baoquan He
2018-11-15  7:30           ` Michal Hocko
2018-11-15  7:53             ` Baoquan He
2018-11-15  8:30               ` Michal Hocko
2018-11-15  9:42                 ` David Hildenbrand
2018-11-15  9:52                   ` Baoquan He
2018-11-15  9:53                     ` David Hildenbrand
2018-11-15 13:12                 ` Baoquan He
2018-11-15 13:19                   ` Michal Hocko
2018-11-15 13:23                     ` Baoquan He
2018-11-15 14:25                       ` Michal Hocko
2018-11-15 13:38                     ` Baoquan He
2018-11-15 14:32                       ` Michal Hocko
2018-11-15 14:34                         ` Baoquan He
2018-11-16  1:24                         ` Baoquan He
2018-11-16  9:14                           ` Michal Hocko
2018-11-17  4:22                             ` Baoquan He
2018-11-19 10:52                             ` Baoquan He
2018-11-19 12:40                               ` Michal Hocko
2018-11-19 12:51                                 ` Michal Hocko [this message]
2018-11-19 14:10                                   ` Michal Hocko
2018-11-19 16:36                                     ` Vlastimil Babka
2018-11-19 16:46                                       ` Michal Hocko
2018-11-19 16:46                                         ` Vlastimil Babka
2018-11-19 16:48                                           ` Vlastimil Babka
2018-11-19 17:01                                             ` Michal Hocko
2018-11-19 17:33                                     ` Michal Hocko
2018-11-19 20:34                                       ` Hugh Dickins
2018-11-19 20:59                                         ` Michal Hocko
2018-11-20  1:56                                           ` Baoquan He
2018-11-20  5:44                                             ` Hugh Dickins
2018-11-20 13:38                                               ` Vlastimil Babka
2018-11-20 13:58                                                 ` Baoquan He
2018-11-20 13:58                                                   ` Baoquan He
2018-11-20 14:05                                                   ` Michal Hocko
2018-11-20 14:12                                                     ` Baoquan He
2018-11-21  1:21                                                   ` Hugh Dickins
2018-11-21  1:08                                                 ` Hugh Dickins
2018-11-21  3:20                                                   ` Hugh Dickins
2018-11-21 17:31                                               ` Michal Hocko
2018-11-22  1:53                                                 ` Hugh Dickins
2018-11-14 10:00 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181119125121.GK22247@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=pifang@redhat.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.