All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone
Date: Thu, 19 Jan 2017 13:11:43 +0000	[thread overview]
Message-ID: <20170119131143.2ze5l5fwheoqdpne@suse.de> (raw)
In-Reply-To: <20170119112336.GN30786@dhcp22.suse.cz>

On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote:
> On Thu 19-01-17 10:07:55, Mel Gorman wrote:
> [...]
> > mm, vmscan: Wait on a waitqueue when too many pages are isolated
> > 
> > When too many pages are isolated, direct reclaim waits on congestion to clear
> > for up to a tenth of a second. There is no reason to believe that too many
> > pages are isolated due to dirty pages, reclaim efficiency or congestion.
> > It may simply be because an extremely large number of processes have entered
> > direct reclaim at the same time. However, it is possible for the situation
> > to persist forever and never reach OOM.
> > 
> > This patch queues processes a waitqueue when too many pages are isolated.
> > When parallel reclaimers finish shrink_page_list, they wake the waiters
> > to recheck whether too many pages are isolated.
> > 
> > The wait on the queue has a timeout as not all sites that isolate pages
> > will do the wakeup. Depending on every isolation of LRU pages to be perfect
> > forever is potentially fragile. The specific wakeups occur for page reclaim
> > and compaction. If too many pages are isolated due to memory failure,
> > hotplug or directly calling migration from a syscall then the waiting
> > processes may wait the full timeout.
> > 
> > Note that the timeout allows the use of waitqueue_active() on the basis
> > that a race will cause the full timeout to be reached due to a missed
> > wakeup. This is relatively harmless and still a massive improvement over
> > unconditionally calling congestion_wait.
> > 
> > Direct reclaimers that cannot isolate pages within the timeout will consider
> > return to the caller. This is somewhat clunky as it won't return immediately
> > and make go through the other priorities and slab shrinking. Eventually,
> > it'll go through a few iterations of should_reclaim_retry and reach the
> > MAX_RECLAIM_RETRIES limit and consider going OOM.
> 
> I cannot really say I would like this. It's just much more complex than
> necessary.

I guess it's a difference in opinion. Miximg per-zone and per-node
information for me is complex. I liked the workqueue because it was an
example of waiting on a specific event instead of relying completely on
time.

> I definitely agree that congestion_wait while waiting for
> too_many_isolated is a crude hack. This patch doesn't really resolve
> my biggest worry, though, that we go OOM with too many pages isolated
> as your patch doesn't alter zone_reclaimable_pages to reflect those
> numbers.
> 

Indeed, but such cases are also caught by the no_progress_loop logic to
avoid a premature OOM.

> Anyway, I think both of us are probably overcomplicating things a bit.
> Your waitqueue approach is definitely better semantically than the
> congestion_wait because we are waiting for a different event than the
> API is intended for. On the other hand a mere
> schedule_timeout_interruptible might work equally well in the real life.
> On the other side I might really over emphasise the role of NR_ISOLATED*
> counts. It might really turn out that we can safely ignore them and it
> won't be the end of the world. So what do you think about the following
> as a starting point. If we ever see oom reports with high number of
> NR_ISOLATED* which are part of the oom report then we know we have to do
> something about that. Those changes would at least be driven by a real
> usecase rather than theoretical scenarios.
> 
> So what do you think about the following? Tetsuo, would you be willing
> to run this patch through your torture testing please?

I'm fine with treating this as a starting point.

Thanks.

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone
Date: Thu, 19 Jan 2017 13:11:43 +0000	[thread overview]
Message-ID: <20170119131143.2ze5l5fwheoqdpne@suse.de> (raw)
In-Reply-To: <20170119112336.GN30786@dhcp22.suse.cz>

On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote:
> On Thu 19-01-17 10:07:55, Mel Gorman wrote:
> [...]
> > mm, vmscan: Wait on a waitqueue when too many pages are isolated
> > 
> > When too many pages are isolated, direct reclaim waits on congestion to clear
> > for up to a tenth of a second. There is no reason to believe that too many
> > pages are isolated due to dirty pages, reclaim efficiency or congestion.
> > It may simply be because an extremely large number of processes have entered
> > direct reclaim at the same time. However, it is possible for the situation
> > to persist forever and never reach OOM.
> > 
> > This patch queues processes a waitqueue when too many pages are isolated.
> > When parallel reclaimers finish shrink_page_list, they wake the waiters
> > to recheck whether too many pages are isolated.
> > 
> > The wait on the queue has a timeout as not all sites that isolate pages
> > will do the wakeup. Depending on every isolation of LRU pages to be perfect
> > forever is potentially fragile. The specific wakeups occur for page reclaim
> > and compaction. If too many pages are isolated due to memory failure,
> > hotplug or directly calling migration from a syscall then the waiting
> > processes may wait the full timeout.
> > 
> > Note that the timeout allows the use of waitqueue_active() on the basis
> > that a race will cause the full timeout to be reached due to a missed
> > wakeup. This is relatively harmless and still a massive improvement over
> > unconditionally calling congestion_wait.
> > 
> > Direct reclaimers that cannot isolate pages within the timeout will consider
> > return to the caller. This is somewhat clunky as it won't return immediately
> > and make go through the other priorities and slab shrinking. Eventually,
> > it'll go through a few iterations of should_reclaim_retry and reach the
> > MAX_RECLAIM_RETRIES limit and consider going OOM.
> 
> I cannot really say I would like this. It's just much more complex than
> necessary.

I guess it's a difference in opinion. Miximg per-zone and per-node
information for me is complex. I liked the workqueue because it was an
example of waiting on a specific event instead of relying completely on
time.

> I definitely agree that congestion_wait while waiting for
> too_many_isolated is a crude hack. This patch doesn't really resolve
> my biggest worry, though, that we go OOM with too many pages isolated
> as your patch doesn't alter zone_reclaimable_pages to reflect those
> numbers.
> 

Indeed, but such cases are also caught by the no_progress_loop logic to
avoid a premature OOM.

> Anyway, I think both of us are probably overcomplicating things a bit.
> Your waitqueue approach is definitely better semantically than the
> congestion_wait because we are waiting for a different event than the
> API is intended for. On the other hand a mere
> schedule_timeout_interruptible might work equally well in the real life.
> On the other side I might really over emphasise the role of NR_ISOLATED*
> counts. It might really turn out that we can safely ignore them and it
> won't be the end of the world. So what do you think about the following
> as a starting point. If we ever see oom reports with high number of
> NR_ISOLATED* which are part of the oom report then we know we have to do
> something about that. Those changes would at least be driven by a real
> usecase rather than theoretical scenarios.
> 
> So what do you think about the following? Tetsuo, would you be willing
> to run this patch through your torture testing please?

I'm fine with treating this as a starting point.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-19 13:13 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18 13:44 [RFC PATCH 0/2] fix unbounded too_many_isolated Michal Hocko
2017-01-18 13:44 ` Michal Hocko
2017-01-18 13:44 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Michal Hocko
2017-01-18 13:44   ` Michal Hocko
2017-01-18 14:46   ` Mel Gorman
2017-01-18 14:46     ` Mel Gorman
2017-01-18 15:15     ` Michal Hocko
2017-01-18 15:15       ` Michal Hocko
2017-01-18 15:54       ` Mel Gorman
2017-01-18 15:54         ` Mel Gorman
2017-01-18 16:17         ` Michal Hocko
2017-01-18 16:17           ` Michal Hocko
2017-01-18 17:00           ` Mel Gorman
2017-01-18 17:00             ` Mel Gorman
2017-01-18 17:29             ` Michal Hocko
2017-01-18 17:29               ` Michal Hocko
2017-01-19 10:07               ` Mel Gorman
2017-01-19 10:07                 ` Mel Gorman
2017-01-19 11:23                 ` Michal Hocko
2017-01-19 11:23                   ` Michal Hocko
2017-01-19 13:11                   ` Mel Gorman [this message]
2017-01-19 13:11                     ` Mel Gorman
2017-01-20 13:27                     ` Tetsuo Handa
2017-01-20 13:27                       ` Tetsuo Handa
2017-01-21  7:42                       ` Tetsuo Handa
2017-01-21  7:42                         ` Tetsuo Handa
2017-01-25 10:15                         ` Michal Hocko
2017-01-25 10:15                           ` Michal Hocko
2017-01-25 10:19                           ` Christoph Hellwig
2017-01-25 10:19                             ` Christoph Hellwig
2017-01-25 10:46                             ` Michal Hocko
2017-01-25 10:46                               ` Michal Hocko
2017-01-25 11:09                               ` Tetsuo Handa
2017-01-25 11:09                                 ` Tetsuo Handa
2017-01-25 13:00                                 ` Michal Hocko
2017-01-25 13:00                                   ` Michal Hocko
2017-01-27 14:49                                   ` Michal Hocko
2017-01-27 14:49                                     ` Michal Hocko
2017-01-28 15:27                                     ` Tetsuo Handa
2017-01-28 15:27                                       ` Tetsuo Handa
2017-01-30  8:55                                       ` Michal Hocko
2017-01-30  8:55                                         ` Michal Hocko
2017-02-02 10:14                                         ` Michal Hocko
2017-02-02 10:14                                           ` Michal Hocko
2017-02-03 10:57                                           ` Tetsuo Handa
2017-02-03 10:57                                             ` Tetsuo Handa
2017-02-03 14:41                                             ` Michal Hocko
2017-02-03 14:41                                               ` Michal Hocko
2017-02-03 14:50                                             ` Michal Hocko
2017-02-03 14:50                                               ` Michal Hocko
2017-02-03 17:24                                               ` Brian Foster
2017-02-03 17:24                                                 ` Brian Foster
2017-02-06  6:29                                                 ` Tetsuo Handa
2017-02-06  6:29                                                   ` Tetsuo Handa
2017-02-06 14:35                                                   ` Brian Foster
2017-02-06 14:35                                                     ` Brian Foster
2017-02-06 14:42                                                     ` Michal Hocko
2017-02-06 14:42                                                       ` Michal Hocko
2017-02-06 15:47                                                       ` Brian Foster
2017-02-06 15:47                                                         ` Brian Foster
2017-02-07 10:30                                                     ` Tetsuo Handa
2017-02-07 10:30                                                       ` Tetsuo Handa
2017-02-07 16:54                                                       ` Brian Foster
2017-02-07 16:54                                                         ` Brian Foster
2017-02-03 14:55                                             ` Michal Hocko
2017-02-03 14:55                                               ` Michal Hocko
2017-02-05 10:43                                               ` Tetsuo Handa
2017-02-05 10:43                                                 ` Tetsuo Handa
2017-02-06 10:34                                                 ` Michal Hocko
2017-02-06 10:34                                                   ` Michal Hocko
2017-02-06 10:39                                                 ` Michal Hocko
2017-02-06 10:39                                                   ` Michal Hocko
2017-02-07 21:12                                                   ` Michal Hocko
2017-02-07 21:12                                                     ` Michal Hocko
2017-02-08  9:24                                                     ` Peter Zijlstra
2017-02-08  9:24                                                       ` Peter Zijlstra
2017-02-21  9:40                                             ` Michal Hocko
2017-02-21  9:40                                               ` Michal Hocko
2017-02-21 14:35                                               ` Tetsuo Handa
2017-02-21 14:35                                                 ` Tetsuo Handa
2017-02-21 15:53                                                 ` Michal Hocko
2017-02-21 15:53                                                   ` Michal Hocko
2017-02-22  2:02                                                   ` Tetsuo Handa
2017-02-22  2:02                                                     ` Tetsuo Handa
2017-02-22  7:54                                                     ` Michal Hocko
2017-02-22  7:54                                                       ` Michal Hocko
2017-02-26  6:30                                                       ` Tetsuo Handa
2017-02-26  6:30                                                         ` Tetsuo Handa
2017-01-31 11:58                                   ` Michal Hocko
2017-01-31 11:58                                     ` Michal Hocko
2017-01-31 12:51                                     ` Christoph Hellwig
2017-01-31 12:51                                       ` Christoph Hellwig
2017-01-31 13:21                                       ` Michal Hocko
2017-01-31 13:21                                         ` Michal Hocko
2017-01-25 10:33                           ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pagesper zone Tetsuo Handa
2017-01-25 10:33                             ` Tetsuo Handa
2017-01-25 12:34                             ` Michal Hocko
2017-01-25 12:34                               ` Michal Hocko
2017-01-25 13:13                               ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Tetsuo Handa
2017-01-25 13:13                                 ` Tetsuo Handa
2017-01-25  9:53                       ` Michal Hocko
2017-01-25  9:53                         ` Michal Hocko
2017-01-20  6:42                 ` Hillf Danton
2017-01-20  6:42                   ` Hillf Danton
2017-01-20  9:25                   ` Mel Gorman
2017-01-20  9:25                     ` Mel Gorman
2017-01-18 13:44 ` [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever Michal Hocko
2017-01-18 13:44   ` Michal Hocko
2017-01-18 14:50   ` Mel Gorman
2017-01-18 14:50     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170119131143.2ze5l5fwheoqdpne@suse.de \
    --to=mgorman@suse.de \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.