linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Qian Cai <cai@lca.pw>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap
Date: Thu, 1 Aug 2019 15:51:08 +0900	[thread overview]
Message-ID: <20190801065108.GA179251@google.com> (raw)
In-Reply-To: <1564597080.11067.40.camel@lca.pw>

On Wed, Jul 31, 2019 at 02:18:00PM -0400, Qian Cai wrote:
> On Wed, 2019-07-31 at 12:09 -0400, Qian Cai wrote:
> > On Wed, 2019-07-31 at 14:34 +0900, Minchan Kim wrote:
> > > On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
> > > > OOM workloads with swapping is unable to recover with linux-next since
> > > > next-
> > > > 20190729 due to the commit "mm: account nr_isolated_xxx in
> > > > [isolate|putback]_lru_page" breaks OOM with swap" [1]
> > > > 
> > > > [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kerne
> > > > l.
> > > > org/
> > > > T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
> > > > 
> > > > For example, LTP oom01 test case is stuck for hours, while it finishes in
> > > > a
> > > > few
> > > > minutes here after reverted the above commit. Sometimes, it prints those
> > > > message
> > > > while hanging.
> > > > 
> > > > [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122
> > > > seconds.
> > > > [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
> > > > [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > disables this message.
> > > > [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
> > > > [  509.983513][  T711] Call Trace:
> > > > [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8
> > > > (unreliable)
> > > > [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
> > > > __switch_to+0x3a4/0x520
> > > > [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
> > > > __schedule+0x2fc/0x950
> > > > [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68]
> > > > schedule+0x58/0x150
> > > > [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
> > > > rwsem_down_read_slowpath+0x4b4/0x630
> > > > [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
> > > > down_read+0x12c/0x240
> > > > [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
> > > > __do_page_fault+0x6f8/0xee0
> > > > [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
> > > > handle_page_fault+0x18/0x38
> > > 
> > > Thanks for the testing! No surprise the patch make some bugs because
> > > it's rather tricky.
> > > 
> > > Could you test this patch?
> > 
> > It does help the situation a bit, but the recover speed is still way slower
> > than
> > just reverting the commit "mm: account nr_isolated_xxx in
> > [isolate|putback]_lru_page". For example, on this powerpc system, it used to
> > take 4-min to finish oom01 while now still take 13-min.
> > 
> > The oom02 (testing NUMA mempolicy) takes even longer and I gave up after 26-
> > min
> > with several hang tasks below.
> 
> Also, oom02 is stuck on an x86 machine.

Yeb, above my patch had a bug to test page type after page was freed.
However, after the review, I found other bugs but I don't think it's
related to your problem, either. Okay, then, let's revert the patch.

Andrew, could you revert the below patch?
"mm: account nr_isolated_xxx in [isolate|putback]_lru_page"

It's just clean up patch and isn't related to new madvise hint system call now.
Thus, it shouldn't be blocker.

Anyway, I want to fix the problem when I have available time.
Qian, What's the your config and system configuration on x86?
Is it possible to reproduce in qemu?
It would be really helpful if you tell me reproduce step on x86.

Thanks.


  reply	other threads:[~2019-08-01  6:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-30 16:25 "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap Qian Cai
2019-07-31  5:34 ` Minchan Kim
2019-07-31 16:09   ` Qian Cai
2019-07-31 18:18     ` Qian Cai
2019-08-01  6:51       ` Minchan Kim [this message]
2019-08-01 11:46         ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190801065108.GA179251@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).