From: Mel Gorman <mgorman@suse.de> To: Michal Hocko <mhocko@kernel.org> Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Date: Thu, 19 Jan 2017 13:11:43 +0000 [thread overview] Message-ID: <20170119131143.2ze5l5fwheoqdpne@suse.de> (raw) In-Reply-To: <20170119112336.GN30786@dhcp22.suse.cz> On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote: > On Thu 19-01-17 10:07:55, Mel Gorman wrote: > [...] > > mm, vmscan: Wait on a waitqueue when too many pages are isolated > > > > When too many pages are isolated, direct reclaim waits on congestion to clear > > for up to a tenth of a second. There is no reason to believe that too many > > pages are isolated due to dirty pages, reclaim efficiency or congestion. > > It may simply be because an extremely large number of processes have entered > > direct reclaim at the same time. However, it is possible for the situation > > to persist forever and never reach OOM. > > > > This patch queues processes a waitqueue when too many pages are isolated. > > When parallel reclaimers finish shrink_page_list, they wake the waiters > > to recheck whether too many pages are isolated. > > > > The wait on the queue has a timeout as not all sites that isolate pages > > will do the wakeup. Depending on every isolation of LRU pages to be perfect > > forever is potentially fragile. The specific wakeups occur for page reclaim > > and compaction. If too many pages are isolated due to memory failure, > > hotplug or directly calling migration from a syscall then the waiting > > processes may wait the full timeout. > > > > Note that the timeout allows the use of waitqueue_active() on the basis > > that a race will cause the full timeout to be reached due to a missed > > wakeup. This is relatively harmless and still a massive improvement over > > unconditionally calling congestion_wait. > > > > Direct reclaimers that cannot isolate pages within the timeout will consider > > return to the caller. This is somewhat clunky as it won't return immediately > > and make go through the other priorities and slab shrinking. Eventually, > > it'll go through a few iterations of should_reclaim_retry and reach the > > MAX_RECLAIM_RETRIES limit and consider going OOM. > > I cannot really say I would like this. It's just much more complex than > necessary. I guess it's a difference in opinion. Miximg per-zone and per-node information for me is complex. I liked the workqueue because it was an example of waiting on a specific event instead of relying completely on time. > I definitely agree that congestion_wait while waiting for > too_many_isolated is a crude hack. This patch doesn't really resolve > my biggest worry, though, that we go OOM with too many pages isolated > as your patch doesn't alter zone_reclaimable_pages to reflect those > numbers. > Indeed, but such cases are also caught by the no_progress_loop logic to avoid a premature OOM. > Anyway, I think both of us are probably overcomplicating things a bit. > Your waitqueue approach is definitely better semantically than the > congestion_wait because we are waiting for a different event than the > API is intended for. On the other hand a mere > schedule_timeout_interruptible might work equally well in the real life. > On the other side I might really over emphasise the role of NR_ISOLATED* > counts. It might really turn out that we can safely ignore them and it > won't be the end of the world. So what do you think about the following > as a starting point. If we ever see oom reports with high number of > NR_ISOLATED* which are part of the oom report then we know we have to do > something about that. Those changes would at least be driven by a real > usecase rather than theoretical scenarios. > > So what do you think about the following? Tetsuo, would you be willing > to run this patch through your torture testing please? I'm fine with treating this as a starting point. Thanks. -- Mel Gorman SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de> To: Michal Hocko <mhocko@kernel.org> Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Date: Thu, 19 Jan 2017 13:11:43 +0000 [thread overview] Message-ID: <20170119131143.2ze5l5fwheoqdpne@suse.de> (raw) In-Reply-To: <20170119112336.GN30786@dhcp22.suse.cz> On Thu, Jan 19, 2017 at 12:23:36PM +0100, Michal Hocko wrote: > On Thu 19-01-17 10:07:55, Mel Gorman wrote: > [...] > > mm, vmscan: Wait on a waitqueue when too many pages are isolated > > > > When too many pages are isolated, direct reclaim waits on congestion to clear > > for up to a tenth of a second. There is no reason to believe that too many > > pages are isolated due to dirty pages, reclaim efficiency or congestion. > > It may simply be because an extremely large number of processes have entered > > direct reclaim at the same time. However, it is possible for the situation > > to persist forever and never reach OOM. > > > > This patch queues processes a waitqueue when too many pages are isolated. > > When parallel reclaimers finish shrink_page_list, they wake the waiters > > to recheck whether too many pages are isolated. > > > > The wait on the queue has a timeout as not all sites that isolate pages > > will do the wakeup. Depending on every isolation of LRU pages to be perfect > > forever is potentially fragile. The specific wakeups occur for page reclaim > > and compaction. If too many pages are isolated due to memory failure, > > hotplug or directly calling migration from a syscall then the waiting > > processes may wait the full timeout. > > > > Note that the timeout allows the use of waitqueue_active() on the basis > > that a race will cause the full timeout to be reached due to a missed > > wakeup. This is relatively harmless and still a massive improvement over > > unconditionally calling congestion_wait. > > > > Direct reclaimers that cannot isolate pages within the timeout will consider > > return to the caller. This is somewhat clunky as it won't return immediately > > and make go through the other priorities and slab shrinking. Eventually, > > it'll go through a few iterations of should_reclaim_retry and reach the > > MAX_RECLAIM_RETRIES limit and consider going OOM. > > I cannot really say I would like this. It's just much more complex than > necessary. I guess it's a difference in opinion. Miximg per-zone and per-node information for me is complex. I liked the workqueue because it was an example of waiting on a specific event instead of relying completely on time. > I definitely agree that congestion_wait while waiting for > too_many_isolated is a crude hack. This patch doesn't really resolve > my biggest worry, though, that we go OOM with too many pages isolated > as your patch doesn't alter zone_reclaimable_pages to reflect those > numbers. > Indeed, but such cases are also caught by the no_progress_loop logic to avoid a premature OOM. > Anyway, I think both of us are probably overcomplicating things a bit. > Your waitqueue approach is definitely better semantically than the > congestion_wait because we are waiting for a different event than the > API is intended for. On the other hand a mere > schedule_timeout_interruptible might work equally well in the real life. > On the other side I might really over emphasise the role of NR_ISOLATED* > counts. It might really turn out that we can safely ignore them and it > won't be the end of the world. So what do you think about the following > as a starting point. If we ever see oom reports with high number of > NR_ISOLATED* which are part of the oom report then we know we have to do > something about that. Those changes would at least be driven by a real > usecase rather than theoretical scenarios. > > So what do you think about the following? Tetsuo, would you be willing > to run this patch through your torture testing please? I'm fine with treating this as a starting point. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-01-19 13:13 UTC|newest] Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-01-18 13:44 [RFC PATCH 0/2] fix unbounded too_many_isolated Michal Hocko 2017-01-18 13:44 ` Michal Hocko 2017-01-18 13:44 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Michal Hocko 2017-01-18 13:44 ` Michal Hocko 2017-01-18 14:46 ` Mel Gorman 2017-01-18 14:46 ` Mel Gorman 2017-01-18 15:15 ` Michal Hocko 2017-01-18 15:15 ` Michal Hocko 2017-01-18 15:54 ` Mel Gorman 2017-01-18 15:54 ` Mel Gorman 2017-01-18 16:17 ` Michal Hocko 2017-01-18 16:17 ` Michal Hocko 2017-01-18 17:00 ` Mel Gorman 2017-01-18 17:00 ` Mel Gorman 2017-01-18 17:29 ` Michal Hocko 2017-01-18 17:29 ` Michal Hocko 2017-01-19 10:07 ` Mel Gorman 2017-01-19 10:07 ` Mel Gorman 2017-01-19 11:23 ` Michal Hocko 2017-01-19 11:23 ` Michal Hocko 2017-01-19 13:11 ` Mel Gorman [this message] 2017-01-19 13:11 ` Mel Gorman 2017-01-20 13:27 ` Tetsuo Handa 2017-01-20 13:27 ` Tetsuo Handa 2017-01-21 7:42 ` Tetsuo Handa 2017-01-21 7:42 ` Tetsuo Handa 2017-01-25 10:15 ` Michal Hocko 2017-01-25 10:15 ` Michal Hocko 2017-01-25 10:19 ` Christoph Hellwig 2017-01-25 10:19 ` Christoph Hellwig 2017-01-25 10:46 ` Michal Hocko 2017-01-25 10:46 ` Michal Hocko 2017-01-25 11:09 ` Tetsuo Handa 2017-01-25 11:09 ` Tetsuo Handa 2017-01-25 13:00 ` Michal Hocko 2017-01-25 13:00 ` Michal Hocko 2017-01-27 14:49 ` Michal Hocko 2017-01-27 14:49 ` Michal Hocko 2017-01-28 15:27 ` Tetsuo Handa 2017-01-28 15:27 ` Tetsuo Handa 2017-01-30 8:55 ` Michal Hocko 2017-01-30 8:55 ` Michal Hocko 2017-02-02 10:14 ` Michal Hocko 2017-02-02 10:14 ` Michal Hocko 2017-02-03 10:57 ` Tetsuo Handa 2017-02-03 10:57 ` Tetsuo Handa 2017-02-03 14:41 ` Michal Hocko 2017-02-03 14:41 ` Michal Hocko 2017-02-03 14:50 ` Michal Hocko 2017-02-03 14:50 ` Michal Hocko 2017-02-03 17:24 ` Brian Foster 2017-02-03 17:24 ` Brian Foster 2017-02-06 6:29 ` Tetsuo Handa 2017-02-06 6:29 ` Tetsuo Handa 2017-02-06 14:35 ` Brian Foster 2017-02-06 14:35 ` Brian Foster 2017-02-06 14:42 ` Michal Hocko 2017-02-06 14:42 ` Michal Hocko 2017-02-06 15:47 ` Brian Foster 2017-02-06 15:47 ` Brian Foster 2017-02-07 10:30 ` Tetsuo Handa 2017-02-07 10:30 ` Tetsuo Handa 2017-02-07 16:54 ` Brian Foster 2017-02-07 16:54 ` Brian Foster 2017-02-03 14:55 ` Michal Hocko 2017-02-03 14:55 ` Michal Hocko 2017-02-05 10:43 ` Tetsuo Handa 2017-02-05 10:43 ` Tetsuo Handa 2017-02-06 10:34 ` Michal Hocko 2017-02-06 10:34 ` Michal Hocko 2017-02-06 10:39 ` Michal Hocko 2017-02-06 10:39 ` Michal Hocko 2017-02-07 21:12 ` Michal Hocko 2017-02-07 21:12 ` Michal Hocko 2017-02-08 9:24 ` Peter Zijlstra 2017-02-08 9:24 ` Peter Zijlstra 2017-02-21 9:40 ` Michal Hocko 2017-02-21 9:40 ` Michal Hocko 2017-02-21 14:35 ` Tetsuo Handa 2017-02-21 14:35 ` Tetsuo Handa 2017-02-21 15:53 ` Michal Hocko 2017-02-21 15:53 ` Michal Hocko 2017-02-22 2:02 ` Tetsuo Handa 2017-02-22 2:02 ` Tetsuo Handa 2017-02-22 7:54 ` Michal Hocko 2017-02-22 7:54 ` Michal Hocko 2017-02-26 6:30 ` Tetsuo Handa 2017-02-26 6:30 ` Tetsuo Handa 2017-01-31 11:58 ` Michal Hocko 2017-01-31 11:58 ` Michal Hocko 2017-01-31 12:51 ` Christoph Hellwig 2017-01-31 12:51 ` Christoph Hellwig 2017-01-31 13:21 ` Michal Hocko 2017-01-31 13:21 ` Michal Hocko 2017-01-25 10:33 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pagesper zone Tetsuo Handa 2017-01-25 10:33 ` Tetsuo Handa 2017-01-25 12:34 ` Michal Hocko 2017-01-25 12:34 ` Michal Hocko 2017-01-25 13:13 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Tetsuo Handa 2017-01-25 13:13 ` Tetsuo Handa 2017-01-25 9:53 ` Michal Hocko 2017-01-25 9:53 ` Michal Hocko 2017-01-20 6:42 ` Hillf Danton 2017-01-20 6:42 ` Hillf Danton 2017-01-20 9:25 ` Mel Gorman 2017-01-20 9:25 ` Mel Gorman 2017-01-18 13:44 ` [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever Michal Hocko 2017-01-18 13:44 ` Michal Hocko 2017-01-18 14:50 ` Mel Gorman 2017-01-18 14:50 ` Mel Gorman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170119131143.2ze5l5fwheoqdpne@suse.de \ --to=mgorman@suse.de \ --cc=hannes@cmpxchg.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=penguin-kernel@I-love.SAKURA.ne.jp \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.