From: Mel Gorman <mgorman@suse.de>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone
Date: Thu, 19 Jan 2017 13:11:43 +0000 [thread overview]
Message-ID: <20170119131143.2ze5l5fwheoqdpne@suse.de> (raw)
In-Reply-To: <20170119112336.GN30786@dhcp22.suse.cz>
On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote:
> On Thu 19-01-17 10:07:55, Mel Gorman wrote:
> [...]
> > mm, vmscan: Wait on a waitqueue when too many pages are isolated
> >
> > When too many pages are isolated, direct reclaim waits on congestion to clear
> > for up to a tenth of a second. There is no reason to believe that too many
> > pages are isolated due to dirty pages, reclaim efficiency or congestion.
> > It may simply be because an extremely large number of processes have entered
> > direct reclaim at the same time. However, it is possible for the situation
> > to persist forever and never reach OOM.
> >
> > This patch queues processes a waitqueue when too many pages are isolated.
> > When parallel reclaimers finish shrink_page_list, they wake the waiters
> > to recheck whether too many pages are isolated.
> >
> > The wait on the queue has a timeout as not all sites that isolate pages
> > will do the wakeup. Depending on every isolation of LRU pages to be perfect
> > forever is potentially fragile. The specific wakeups occur for page reclaim
> > and compaction. If too many pages are isolated due to memory failure,
> > hotplug or directly calling migration from a syscall then the waiting
> > processes may wait the full timeout.
> >
> > Note that the timeout allows the use of waitqueue_active() on the basis
> > that a race will cause the full timeout to be reached due to a missed
> > wakeup. This is relatively harmless and still a massive improvement over
> > unconditionally calling congestion_wait.
> >
> > Direct reclaimers that cannot isolate pages within the timeout will consider
> > return to the caller. This is somewhat clunky as it won't return immediately
> > and make go through the other priorities and slab shrinking. Eventually,
> > it'll go through a few iterations of should_reclaim_retry and reach the
> > MAX_RECLAIM_RETRIES limit and consider going OOM.
>
> I cannot really say I would like this. It's just much more complex than
> necessary.
I guess it's a difference in opinion. Miximg per-zone and per-node
information for me is complex. I liked the workqueue because it was an
example of waiting on a specific event instead of relying completely on
time.
> I definitely agree that congestion_wait while waiting for
> too_many_isolated is a crude hack. This patch doesn't really resolve
> my biggest worry, though, that we go OOM with too many pages isolated
> as your patch doesn't alter zone_reclaimable_pages to reflect those
> numbers.
>
Indeed, but such cases are also caught by the no_progress_loop logic to
avoid a premature OOM.
> Anyway, I think both of us are probably overcomplicating things a bit.
> Your waitqueue approach is definitely better semantically than the
> congestion_wait because we are waiting for a different event than the
> API is intended for. On the other hand a mere
> schedule_timeout_interruptible might work equally well in the real life.
> On the other side I might really over emphasise the role of NR_ISOLATED*
> counts. It might really turn out that we can safely ignore them and it
> won't be the end of the world. So what do you think about the following
> as a starting point. If we ever see oom reports with high number of
> NR_ISOLATED* which are part of the oom report then we know we have to do
> something about that. Those changes would at least be driven by a real
> usecase rather than theoretical scenarios.
>
> So what do you think about the following? Tetsuo, would you be willing
> to run this patch through your torture testing please?
I'm fine with treating this as a starting point.
Thanks.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-01-19 13:11 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-18 13:44 [RFC PATCH 0/2] fix unbounded too_many_isolated Michal Hocko
2017-01-18 13:44 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Michal Hocko
2017-01-18 14:46 ` Mel Gorman
2017-01-18 15:15 ` Michal Hocko
2017-01-18 15:54 ` Mel Gorman
2017-01-18 16:17 ` Michal Hocko
2017-01-18 17:00 ` Mel Gorman
2017-01-18 17:29 ` Michal Hocko
2017-01-19 10:07 ` Mel Gorman
2017-01-19 11:23 ` Michal Hocko
2017-01-19 13:11 ` Mel Gorman [this message]
2017-01-20 13:27 ` Tetsuo Handa
2017-01-21 7:42 ` Tetsuo Handa
2017-01-25 10:15 ` Michal Hocko
2017-01-25 10:19 ` Christoph Hellwig
2017-01-25 10:46 ` Michal Hocko
2017-01-25 11:09 ` Tetsuo Handa
2017-01-25 13:00 ` Michal Hocko
2017-01-27 14:49 ` Michal Hocko
2017-01-28 15:27 ` Tetsuo Handa
2017-01-30 8:55 ` Michal Hocko
2017-02-02 10:14 ` Michal Hocko
2017-02-03 10:57 ` Tetsuo Handa
2017-02-03 14:41 ` Michal Hocko
2017-02-03 14:50 ` Michal Hocko
2017-02-03 17:24 ` Brian Foster
2017-02-06 6:29 ` Tetsuo Handa
2017-02-06 14:35 ` Brian Foster
2017-02-06 14:42 ` Michal Hocko
2017-02-06 15:47 ` Brian Foster
2017-02-07 10:30 ` Tetsuo Handa
2017-02-07 16:54 ` Brian Foster
2017-02-03 14:55 ` Michal Hocko
2017-02-05 10:43 ` Tetsuo Handa
2017-02-06 10:34 ` Michal Hocko
2017-02-06 10:39 ` Michal Hocko
2017-02-07 21:12 ` Michal Hocko
2017-02-08 9:24 ` Peter Zijlstra
2017-02-21 9:40 ` Michal Hocko
2017-02-21 14:35 ` Tetsuo Handa
2017-02-21 15:53 ` Michal Hocko
2017-02-22 2:02 ` Tetsuo Handa
2017-02-22 7:54 ` Michal Hocko
2017-02-26 6:30 ` Tetsuo Handa
2017-01-31 11:58 ` Michal Hocko
2017-01-31 12:51 ` Christoph Hellwig
2017-01-31 13:21 ` Michal Hocko
2017-01-25 10:33 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pagesper zone Tetsuo Handa
2017-01-25 12:34 ` Michal Hocko
2017-01-25 13:13 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Tetsuo Handa
2017-01-25 9:53 ` Michal Hocko
2017-01-20 6:42 ` Hillf Danton
2017-01-20 9:25 ` Mel Gorman
2017-01-18 13:44 ` [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever Michal Hocko
2017-01-18 14:50 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170119131143.2ze5l5fwheoqdpne@suse.de \
--to=mgorman@suse.de \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).