All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	Rik van Riel <riel@redhat.com>, Ying Han <yinghan@google.com>,
	Greg Thelen <gthelen@google.com>, Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Fengguang Wu <fengguang.wu@intel.com>
Subject: Re: [PATCH -mm] memcg: prevent from OOM with too many dirty pages
Date: Tue, 19 Jun 2012 15:00:14 -0700	[thread overview]
Message-ID: <20120619150014.1ebc108c.akpm@linux-foundation.org> (raw)
In-Reply-To: <1340117404-30348-1-git-send-email-mhocko@suse.cz>

On Tue, 19 Jun 2012 16:50:04 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Current implementation of dirty pages throttling is not memcg aware which makes
> it easy to have LRUs full of dirty pages which might lead to memcg OOM if the
> hard limit is small and so the lists are scanned faster than pages written
> back.

This is a bit hard to parse.  I changed it to

: The current implementation of dirty pages throttling is not memcg aware
: which makes it easy to have memcg LRUs full of dirty pages.  Without
: throttling, these LRUs can be scanned faster than the rate of writeback,
: leading to memcg OOM conditions when the hard limit is small.

does that still say what you meant to say?

> The solution is far from being ideal - long term solution is memcg aware
> dirty throttling - but it is meant to be a band aid until we have a real
> fix.

Fair enough I guess.  The fix is small and simple and if it makes the
kernel better, why not?

Would like to see a few more acks though.  Why hasn't everyone been
hitting this?

> We are seeing this happening during nightly backups which are placed into
> containers to prevent from eviction of the real working set.

Well that's a trick which we want to work well.  It's a killer
featurelet for people who wonder what all this memcg crap is for ;)

> The change affects only memcg reclaim and only when we encounter PageReclaim
> pages which is a signal that the reclaim doesn't catch up on with the writers
> so somebody should be throttled. This could be potentially unfair because it
> could be somebody else from the group who gets throttled on behalf of the
> writer but as writers need to allocate as well and they allocate in higher rate
> the probability that only innocent processes would be penalized is not that
> high.

OK.

> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -720,9 +720,20 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
>  
>  		if (PageWriteback(page)) {
> -			nr_writeback++;
> -			unlock_page(page);
> -			goto keep;
> +			/*
> +			 * memcg doesn't have any dirty pages throttling so we
> +			 * could easily OOM just because too many pages are in
> +			 * writeback from reclaim and there is nothing else to
> +			 * reclaim.
> +			 */
> +			if (PageReclaim(page)
> +					&& may_enter_fs && !global_reclaim(sc))
> +				wait_on_page_writeback(page);
> +			else {
> +				nr_writeback++;
> +				unlock_page(page);
> +				goto keep;
> +			}

A couple of things here.

With my gcc and CONFIG_CGROUP_MEM_RES_CTLR=n (for gawd's sake can we
please rename this to CONFIG_MEMCG?), this:

--- a/mm/vmscan.c~memcg-prevent-from-oom-with-too-many-dirty-pages-fix
+++ a/mm/vmscan.c
@@ -726,8 +726,8 @@ static unsigned long shrink_page_list(st
 			 * writeback from reclaim and there is nothing else to
 			 * reclaim.
 			 */
-			if (PageReclaim(page)
-					&& may_enter_fs && !global_reclaim(sc))
+			if (!global_reclaim(sc) && PageReclaim(page) &&
+					may_enter_fs)
 				wait_on_page_writeback(page);
 			else {
 				nr_writeback++;


reduces vmscan.o's .text by 48 bytes(!).  Because the compiler can
avoid generating any code for PageReclaim() and perhaps the
may_enter_fs test.  Because global_reclaim() evaluates to constant
true.  Do you think that's an improvement?

Also, why do we test may_enter_fs here?  I should have been able to
work out your reasoning from either code comments or changelogging but
I cannot (bad).  I don't *think* there's a deadlock issue here?  If the
page is now under writeback, that writeback *will* complete?

Finally, I wonder if there should be some timeout of that wait.  I
don't know why, but I wouldn't be surprised if we hit some glitch which
causes us to add one!



WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	Rik van Riel <riel@redhat.com>, Ying Han <yinghan@google.com>,
	Greg Thelen <gthelen@google.com>, Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Fengguang Wu <fengguang.wu@intel.com>
Subject: Re: [PATCH -mm] memcg: prevent from OOM with too many dirty pages
Date: Tue, 19 Jun 2012 15:00:14 -0700	[thread overview]
Message-ID: <20120619150014.1ebc108c.akpm@linux-foundation.org> (raw)
In-Reply-To: <1340117404-30348-1-git-send-email-mhocko@suse.cz>

On Tue, 19 Jun 2012 16:50:04 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Current implementation of dirty pages throttling is not memcg aware which makes
> it easy to have LRUs full of dirty pages which might lead to memcg OOM if the
> hard limit is small and so the lists are scanned faster than pages written
> back.

This is a bit hard to parse.  I changed it to

: The current implementation of dirty pages throttling is not memcg aware
: which makes it easy to have memcg LRUs full of dirty pages.  Without
: throttling, these LRUs can be scanned faster than the rate of writeback,
: leading to memcg OOM conditions when the hard limit is small.

does that still say what you meant to say?

> The solution is far from being ideal - long term solution is memcg aware
> dirty throttling - but it is meant to be a band aid until we have a real
> fix.

Fair enough I guess.  The fix is small and simple and if it makes the
kernel better, why not?

Would like to see a few more acks though.  Why hasn't everyone been
hitting this?

> We are seeing this happening during nightly backups which are placed into
> containers to prevent from eviction of the real working set.

Well that's a trick which we want to work well.  It's a killer
featurelet for people who wonder what all this memcg crap is for ;)

> The change affects only memcg reclaim and only when we encounter PageReclaim
> pages which is a signal that the reclaim doesn't catch up on with the writers
> so somebody should be throttled. This could be potentially unfair because it
> could be somebody else from the group who gets throttled on behalf of the
> writer but as writers need to allocate as well and they allocate in higher rate
> the probability that only innocent processes would be penalized is not that
> high.

OK.

> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -720,9 +720,20 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
>  
>  		if (PageWriteback(page)) {
> -			nr_writeback++;
> -			unlock_page(page);
> -			goto keep;
> +			/*
> +			 * memcg doesn't have any dirty pages throttling so we
> +			 * could easily OOM just because too many pages are in
> +			 * writeback from reclaim and there is nothing else to
> +			 * reclaim.
> +			 */
> +			if (PageReclaim(page)
> +					&& may_enter_fs && !global_reclaim(sc))
> +				wait_on_page_writeback(page);
> +			else {
> +				nr_writeback++;
> +				unlock_page(page);
> +				goto keep;
> +			}

A couple of things here.

With my gcc and CONFIG_CGROUP_MEM_RES_CTLR=n (for gawd's sake can we
please rename this to CONFIG_MEMCG?), this:

--- a/mm/vmscan.c~memcg-prevent-from-oom-with-too-many-dirty-pages-fix
+++ a/mm/vmscan.c
@@ -726,8 +726,8 @@ static unsigned long shrink_page_list(st
 			 * writeback from reclaim and there is nothing else to
 			 * reclaim.
 			 */
-			if (PageReclaim(page)
-					&& may_enter_fs && !global_reclaim(sc))
+			if (!global_reclaim(sc) && PageReclaim(page) &&
+					may_enter_fs)
 				wait_on_page_writeback(page);
 			else {
 				nr_writeback++;


reduces vmscan.o's .text by 48 bytes(!).  Because the compiler can
avoid generating any code for PageReclaim() and perhaps the
may_enter_fs test.  Because global_reclaim() evaluates to constant
true.  Do you think that's an improvement?

Also, why do we test may_enter_fs here?  I should have been able to
work out your reasoning from either code comments or changelogging but
I cannot (bad).  I don't *think* there's a deadlock issue here?  If the
page is now under writeback, that writeback *will* complete?

Finally, I wonder if there should be some timeout of that wait.  I
don't know why, but I wouldn't be surprised if we hit some glitch which
causes us to add one!


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-06-19 22:00 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-19 14:50 [PATCH -mm] memcg: prevent from OOM with too many dirty pages Michal Hocko
2012-06-19 14:50 ` Michal Hocko
2012-06-19 22:00 ` Andrew Morton [this message]
2012-06-19 22:00   ` Andrew Morton
2012-06-20  8:27   ` Michal Hocko
2012-06-20  8:27     ` Michal Hocko
2012-06-20  9:20   ` Mel Gorman
2012-06-20  9:20     ` Mel Gorman
2012-06-20  9:55     ` Fengguang Wu
2012-06-20  9:55       ` Fengguang Wu
2012-06-20  9:59     ` Michal Hocko
2012-06-20  9:59       ` Michal Hocko
2012-06-20 10:11   ` [PATCH v2 " Michal Hocko
2012-06-20 10:11     ` Michal Hocko
2012-07-12  1:57     ` Hugh Dickins
2012-07-12  1:57       ` Hugh Dickins
2012-07-12  2:21       ` Andrew Morton
2012-07-12  2:21         ` Andrew Morton
2012-07-12  3:13         ` Hugh Dickins
2012-07-12  3:13           ` Hugh Dickins
2012-07-12  7:05       ` Michal Hocko
2012-07-12  7:05         ` Michal Hocko
2012-07-12 21:13         ` Andrew Morton
2012-07-12 21:13           ` Andrew Morton
2012-07-12 22:42           ` Hugh Dickins
2012-07-12 22:42             ` Hugh Dickins
2012-07-13  8:21             ` Michal Hocko
2012-07-13  8:21               ` Michal Hocko
2012-07-16  8:30               ` Hugh Dickins
2012-07-16  8:30                 ` Hugh Dickins
2012-07-16  8:35                 ` [PATCH mmotm] memcg: further prevent " Hugh Dickins
2012-07-16  8:35                   ` Hugh Dickins
2012-07-16  9:26                   ` Michal Hocko
2012-07-16  9:26                     ` Michal Hocko
2012-07-17  4:52                     ` Hugh Dickins
2012-07-17  4:52                       ` Hugh Dickins
2012-07-17  6:33                       ` Michal Hocko
2012-07-17  6:33                         ` Michal Hocko
2012-07-16 21:08                   ` Andrew Morton
2012-07-16 21:08                     ` Andrew Morton
2012-07-16  8:10         ` [PATCH v2 -mm] memcg: prevent from " Hugh Dickins
2012-07-16  8:10           ` Hugh Dickins
2012-07-16  8:48           ` Michal Hocko
2012-07-16  8:48             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120619150014.1ebc108c.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujtisu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.