linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>,
	Satoru Moriya <satoru.moriya@hds.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [patch 2/8] mm: vmscan: disregard swappiness shortly before going OOM
Date: Mon, 17 Dec 2012 17:37:35 +0100	[thread overview]
Message-ID: <20121217163735.GE25432@dhcp22.suse.cz> (raw)
In-Reply-To: <20121215001850.GA21353@cmpxchg.org>

On Fri 14-12-12 19:18:51, Johannes Weiner wrote:
> On Fri, Dec 14, 2012 at 05:13:45PM +0100, Michal Hocko wrote:
> > On Fri 14-12-12 10:43:55, Rik van Riel wrote:
> > > On 12/14/2012 03:37 AM, Michal Hocko wrote:
> > > 
> > > >I can answer the later. Because memsw comes with its price and
> > > >swappiness is much cheaper. On the other hand it makes sense that
> > > >swappiness==0 doesn't swap at all. Or do you think we should get back to
> > > >_almost_ doesn't swap at all?
> > > 
> > > swappiness==0 will swap in emergencies, specifically when we have
> > > almost no page cache left, we will still swap things out:
> > > 
> > >         if (global_reclaim(sc)) {
> > >                 free  = zone_page_state(zone, NR_FREE_PAGES);
> > >                 if (unlikely(file + free <= high_wmark_pages(zone))) {
> > >                         /*
> > >                          * If we have very few page cache pages, force-scan
> > >                          * anon pages.
> > >                          */
> > >                         fraction[0] = 1;
> > >                         fraction[1] = 0;
> > >                         denominator = 1;
> > >                         goto out;
> > > 
> > > This makes sense, because people who set swappiness==0 but
> > > do have swap space available would probably prefer some
> > > emergency swapping over an OOM kill.
> > 
> > Yes, but this is the global reclaim path. I was arguing about
> > swappiness==0 & memcg. As this patch doesn't make a big difference for
> > the global case (as both the changelog and you mentioned) then we should
> > focus on whether this is desirable change for the memcg path. I think it
> > makes sense to keep "no swapping at all for memcg semantic" as we have
> > it currently.
> 
> I would prefer we could agree on one thing, though.  Having global
> reclaim behave different from memcg reclaim violates the principle of
> least surprise. 

Hmm, I think that no swapping at all with swappiness==0 makes some sense
with the global reclaim as well. Why should we swap if admin told us not
to do that?
I am not so strong in that though because the global swappiness has been
more relaxed in the past and people got used to that. We have seen bug
reports already where users were surprised by a high io wait times when
it turned out that they had swappiness set to 0 because that prevented
swapping most of the time in the past but fe35004f changed that.

Usecases for memcg are more natural because memcg allows much better
control over OOM and also requirements for (not) swapping are per group
rather than on swap availability. We shouldn't push users into using
memcg swap accounting to accomplish the same IMHO because the accounting
has some costs and its primary usage is not to disable swapping but
rather to keep it on the leash. The two approaches are also different
from semantic point of view. Swappiness is proportional while the limit
is an absolute number.

> Having the code behave like that implicitely without any mention of
> global_reclaim() and vm_swappiness() is unacceptable.

So what about:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7f30961..e6d4f23 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1750,7 +1750,15 @@ out:
 		unsigned long scan;
 
 		scan = get_lru_size(lruvec, lru);
-		if (sc->priority || noswap || !vmscan_swappiness(sc)) {
+		/*
+		 * Memcg targeted reclaim, unlike the global reclaim, honours
+		 * swappiness==0 and no swapping is allowed even if that would
+		 * lead to an OOM killer which is a) local to the group resp.
+		 * hierarchy and moreover can be handled from userspace which
+		 * makes it different from the global reclaim.
+		 */
+		if (sc->priority || noswap ||
+				(!global_reclaim(sc) && !vmscan_swappiness(sc))) {
 			scan >>= sc->priority;
 			if (!scan && force_scan)
 				scan = SWAP_CLUSTER_MAX;
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2012-12-17 16:37 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-12 21:43 [patch 0/8] page reclaim bits Johannes Weiner
2012-12-12 21:43 ` [patch 1/8] mm: memcg: only evict file pages when we have plenty Johannes Weiner
2012-12-12 21:53   ` Rik van Riel
2012-12-12 22:28     ` Johannes Weiner
2012-12-13 10:07       ` Mel Gorman
2012-12-13 14:44         ` Mel Gorman
2012-12-13 14:55       ` Michal Hocko
2012-12-16  1:21         ` Simon Jeons
2012-12-17 15:54           ` Michal Hocko
2012-12-19  5:21             ` Simon Jeons
2012-12-19  9:20               ` Mel Gorman
2012-12-13  5:36     ` Simon Jeons
2012-12-13  5:34   ` Simon Jeons
2012-12-12 21:43 ` [patch 2/8] mm: vmscan: disregard swappiness shortly before going OOM Johannes Weiner
2012-12-12 22:01   ` Rik van Riel
2012-12-13  5:56   ` Simon Jeons
2012-12-13 10:34   ` Mel Gorman
2012-12-13 15:29     ` Michal Hocko
2012-12-13 16:05       ` Michal Hocko
2012-12-13 22:25         ` Satoru Moriya
2012-12-14  4:50           ` Johannes Weiner
2012-12-14  8:37             ` Michal Hocko
2012-12-14 15:43               ` Rik van Riel
2012-12-14 16:13                 ` Michal Hocko
2012-12-15  0:18                   ` Johannes Weiner
2012-12-17 16:37                     ` Michal Hocko [this message]
2012-12-17 17:54                       ` Johannes Weiner
2012-12-17 19:58                         ` Michal Hocko
2012-12-14 20:17                 ` Satoru Moriya
2012-12-14 19:44               ` Satoru Moriya
2012-12-13 19:05     ` Johannes Weiner
2012-12-13 19:47       ` Mel Gorman
2012-12-12 21:43 ` [patch 3/8] mm: vmscan: save work scanning (almost) empty LRU lists Johannes Weiner
2012-12-12 22:02   ` Rik van Riel
2012-12-13 10:41   ` Mel Gorman
2012-12-13 19:33     ` Johannes Weiner
2012-12-13 15:43   ` Michal Hocko
2012-12-13 19:38     ` Johannes Weiner
2012-12-14  8:46       ` Michal Hocko
2012-12-12 21:43 ` [patch 4/8] mm: vmscan: clarify LRU balancing close to OOM Johannes Weiner
2012-12-12 22:03   ` Rik van Riel
2012-12-13 10:46   ` Mel Gorman
2012-12-12 21:43 ` [patch 5/8] mm: vmscan: improve comment on low-page cache handling Johannes Weiner
2012-12-12 22:04   ` Rik van Riel
2012-12-13 10:47   ` Mel Gorman
2012-12-13 16:07   ` Michal Hocko
2012-12-12 21:43 ` [patch 6/8] mm: vmscan: clean up get_scan_count() Johannes Weiner
2012-12-12 22:06   ` Rik van Riel
2012-12-13 11:07   ` Mel Gorman
2012-12-13 16:18   ` Michal Hocko
2012-12-12 21:43 ` [patch 7/8] mm: vmscan: compaction works against zones, not lruvecs Johannes Weiner
2012-12-12 22:31   ` Rik van Riel
2012-12-13 11:12   ` Mel Gorman
2012-12-13 16:48   ` Michal Hocko
2012-12-12 21:43 ` [patch 8/8] mm: reduce rmap overhead for ex-KSM page copies created on swap faults Johannes Weiner
2012-12-12 22:34   ` Rik van Riel
2012-12-12 21:50 ` [patch 0/8] page reclaim bits Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121217163735.GE25432@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=satoru.moriya@hds.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).