All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Baoquan He <bhe@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, aarcange@redhat.com
Subject: Re: Memory hotplug softlock issue
Date: Thu, 15 Nov 2018 09:30:55 +0100	[thread overview]
Message-ID: <20181115083055.GD23831@dhcp22.suse.cz> (raw)
In-Reply-To: <20181115075349.GL2653@MiWiFi-R3L-srv>

On Thu 15-11-18 15:53:56, Baoquan He wrote:
> On 11/15/18 at 08:30am, Michal Hocko wrote:
> > On Thu 15-11-18 13:10:34, Baoquan He wrote:
> > > On 11/14/18 at 04:00pm, Michal Hocko wrote:
> > > > On Wed 14-11-18 22:52:50, Baoquan He wrote:
> > > > > On 11/14/18 at 10:01am, Michal Hocko wrote:
> > > > > > I have seen an issue when the migration cannot make a forward progress
> > > > > > because of a glibc page with a reference count bumping up and down. Most
> > > > > > probable explanation is the faultaround code. I am working on this and
> > > > > > will post a patch soon. In any case the migration should converge and if
> > > > > > it doesn't do then there is a bug lurking somewhere.
> > > > > > 
> > > > > > Failing on ENOMEM is a questionable thing. I haven't seen that happening
> > > > > > wildly but if it is a case then I wouldn't be opposed.
> > > > > 
> > > > > Applied your debugging patches, it helps a lot to printing message.
> > > > > 
> > > > > Below is the dmesg log about the migrating failure. It can't pass
> > > > > migrate_pages() and loop forever.
> > > > > 
> > > > > [  +0.083841] migrating pfn 10fff7d0 failed 
> > > > > [  +0.000005] page:ffffea043ffdf400 count:208 mapcount:201 mapping:ffff888dff4bdda8 index:0x2
> > > > > [  +0.012689] xfs_address_space_operations [xfs] 
> > > > > [  +0.000030] name:"stress" 
> > > > > [  +0.004556] flags: 0x5fffffc0000004(uptodate)
> > > > > [  +0.007339] raw: 005fffffc0000004 ffffc900000e3d80 ffffc900000e3d80 ffff888dff4bdda8
> > > > > [  +0.009488] raw: 0000000000000002 0000000000000000 000000cb000000c8 ffff888e7353d000
> > > > > [  +0.007726] page->mem_cgroup:ffff888e7353d000
> > > > > [  +0.084538] migrating pfn 10fff7d0 failed 
> > > > > [  +0.000006] page:ffffea043ffdf400 count:210 mapcount:201 mapping:ffff888dff4bdda8 index:0x2
> > > > > [  +0.012798] xfs_address_space_operations [xfs] 
> > > > > [  +0.000034] name:"stress" 
> > > > > [  +0.004524] flags: 0x5fffffc0000004(uptodate)
> > > > > [  +0.007068] raw: 005fffffc0000004 ffffc900000e3d80 ffffc900000e3d80 ffff888dff4bdda8
> > > > > [  +0.009359] raw: 0000000000000002 0000000000000000 000000cb000000c8 ffff888e7353d000
> > > > > [  +0.007728] page->mem_cgroup:ffff888e7353d000
> > > > 
> > > > I wouldn't be surprised if this was a similar/same issue I've been
> > > > chasing recently. Could you try to disable faultaround to see if that
> > > > helps. It seems that it helped in my particular case but I am still
> > > > waiting for the final good-to-go to post the patch as I do not own the
> > > > workload which triggered that issue.
> > > 
> > > Tried, still stuck in last block sometime. Usually after several times
> > > of hotplug/unplug. If stop stress program, the last block will be
> > > offlined immediately.
> > 
> > Is the pattern still the same? I mean failing over few pages with
> > reference count jumping up and down between attempts?
> 
> ->count jumping up and down, mapcount stays the same value.
> 
> > 
> > > [root@ ~]# cat /sys/kernel/debug/fault_around_bytes 
> > > 4096
> > 
> > Can you make it 0?
> 
> I executed 'echo 0 > fault_around_bytes', value less than one page size
> will round up to one page.

OK, I have missed that. So then there must be a different source of the
page count volatility. Is it always the same file?

I think we can rule out memory reclaim because that depends on the page
lock. Is the stress test hitting on memory compaction? In other words,
are
grep compact /proc/vmstat
counters changing during the offline test heavily? I am asking because I
do not see compaction pfn walkers skipping over MIGRATE_ISOLATE
pageblocks. But I might be missing something easily.

It would be also good to find out whether this is fs specific. E.g. does
it make any difference if you use a different one for your stress
testing?
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-11-15  8:31 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  7:09 Memory hotplug softlock issue Baoquan He
2018-11-14  7:16 ` Baoquan He
2018-11-14  7:16   ` Baoquan He
2018-11-14  8:18 ` David Hildenbrand
2018-11-14  9:00   ` Baoquan He
2018-11-14  9:25     ` David Hildenbrand
2018-11-14  9:41       ` Michal Hocko
2018-11-14  9:48         ` David Hildenbrand
2018-11-14 10:04           ` Michal Hocko
2018-11-14  9:01   ` Michal Hocko
2018-11-14  9:22     ` David Hildenbrand
2018-11-14  9:37       ` Michal Hocko
2018-11-14  9:39         ` David Hildenbrand
2018-11-14 14:52     ` Baoquan He
2018-11-14 15:00       ` Michal Hocko
2018-11-15  5:10         ` Baoquan He
2018-11-15  7:30           ` Michal Hocko
2018-11-15  7:53             ` Baoquan He
2018-11-15  8:30               ` Michal Hocko [this message]
2018-11-15  9:42                 ` David Hildenbrand
2018-11-15  9:52                   ` Baoquan He
2018-11-15  9:53                     ` David Hildenbrand
2018-11-15 13:12                 ` Baoquan He
2018-11-15 13:19                   ` Michal Hocko
2018-11-15 13:23                     ` Baoquan He
2018-11-15 14:25                       ` Michal Hocko
2018-11-15 13:38                     ` Baoquan He
2018-11-15 14:32                       ` Michal Hocko
2018-11-15 14:34                         ` Baoquan He
2018-11-16  1:24                         ` Baoquan He
2018-11-16  9:14                           ` Michal Hocko
2018-11-17  4:22                             ` Baoquan He
2018-11-19 10:52                             ` Baoquan He
2018-11-19 12:40                               ` Michal Hocko
2018-11-19 12:51                                 ` Michal Hocko
2018-11-19 14:10                                   ` Michal Hocko
2018-11-19 16:36                                     ` Vlastimil Babka
2018-11-19 16:46                                       ` Michal Hocko
2018-11-19 16:46                                         ` Vlastimil Babka
2018-11-19 16:48                                           ` Vlastimil Babka
2018-11-19 17:01                                             ` Michal Hocko
2018-11-19 17:33                                     ` Michal Hocko
2018-11-19 20:34                                       ` Hugh Dickins
2018-11-19 20:59                                         ` Michal Hocko
2018-11-20  1:56                                           ` Baoquan He
2018-11-20  5:44                                             ` Hugh Dickins
2018-11-20 13:38                                               ` Vlastimil Babka
2018-11-20 13:58                                                 ` Baoquan He
2018-11-20 13:58                                                   ` Baoquan He
2018-11-20 14:05                                                   ` Michal Hocko
2018-11-20 14:12                                                     ` Baoquan He
2018-11-21  1:21                                                   ` Hugh Dickins
2018-11-21  1:08                                                 ` Hugh Dickins
2018-11-21  3:20                                                   ` Hugh Dickins
2018-11-21 17:31                                               ` Michal Hocko
2018-11-22  1:53                                                 ` Hugh Dickins
2018-11-14 10:00 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181115083055.GD23831@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.