All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Huang Ying <ying.huang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [patch 09/12] mm: page_alloc: private memory reserves for OOM-killing allocations
Date: Tue, 14 Apr 2015 18:49:40 +0200	[thread overview]
Message-ID: <20150414164939.GJ17160@dhcp22.suse.cz> (raw)
In-Reply-To: <1427264236-17249-10-git-send-email-hannes@cmpxchg.org>

[Sorry for the late reply]

On Wed 25-03-15 02:17:13, Johannes Weiner wrote:
> The OOM killer connects random tasks in the system with unknown
> dependencies between them, and the OOM victim might well get blocked
> behind the task that is trying to allocate.  That means that while
> allocations can issue OOM kills to improve the low memory situation,
> which generally frees more than they are going to take out, they can
> not rely on their *own* OOM kills to make forward progress for them.
> 
> Secondly, we want to avoid a racing allocation swooping in to steal
> the work of the OOM killing allocation, causing spurious allocation
> failures.  The one that put in the work must have priority - if its
> efforts are enough to serve both allocations that's fine, otherwise
> concurrent allocations should be forced to issue their own OOM kills.
> 
> Keep some pages below the min watermark reserved for OOM-killing
> allocations to protect them from blocking victims and concurrent
> allocations not pulling their weight.

Yes, this makes a lot of sense. I am just not sure I am happy about some
details.

[...]
> @@ -3274,6 +3290,7 @@ void show_free_areas(unsigned int filter)
>  		show_node(zone);
>  		printk("%s"
>  			" free:%lukB"
> +			" oom:%lukB"
>  			" min:%lukB"
>  			" low:%lukB"
>  			" high:%lukB"
> @@ -3306,6 +3323,7 @@ void show_free_areas(unsigned int filter)
>  			"\n",
>  			zone->name,
>  			K(zone_page_state(zone, NR_FREE_PAGES)),
> +			K(oom_wmark_pages(zone)),
>  			K(min_wmark_pages(zone)),
>  			K(low_wmark_pages(zone)),
>  			K(high_wmark_pages(zone)),

Do we really need to export the new watermark into the userspace?
How would it help user/admin? OK, maybe show_free_areas could be helpful
for oom reports but why to export it in /proc/zoneinfo which is done
further down?

> @@ -5747,17 +5765,18 @@ static void __setup_per_zone_wmarks(void)
>  
>  			min_pages = zone->managed_pages / 1024;
>  			min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL);
> -			zone->watermark[WMARK_MIN] = min_pages;
> +			zone->watermark[WMARK_OOM] = min_pages;
>  		} else {
>  			/*
>  			 * If it's a lowmem zone, reserve a number of pages
>  			 * proportionate to the zone's size.
>  			 */
> -			zone->watermark[WMARK_MIN] = tmp;
> +			zone->watermark[WMARK_OOM] = tmp;
>  		}
>  
> -		zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + (tmp >> 2);
> -		zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);
> +		zone->watermark[WMARK_MIN]  = oom_wmark_pages(zone) + (tmp >> 3);
> +		zone->watermark[WMARK_LOW]  = oom_wmark_pages(zone) + (tmp >> 2);
> +		zone->watermark[WMARK_HIGH] = oom_wmark_pages(zone) + (tmp >> 1);

This will basically elevate the min watermark, right? And that might lead
to subtle performance differences even when OOM killer is not invoked
because the direct reclaim will start sooner.
Shouldn't we rather give WMARK_OOM half of WMARK_MIN instead?

>  
>  		__mod_zone_page_state(zone, NR_ALLOC_BATCH,
>  			high_wmark_pages(zone) - low_wmark_pages(zone) -
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 1fd0886a389f..a62f16ef524c 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1188,6 +1188,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  	seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
>  	seq_printf(m,
>  		   "\n  pages free     %lu"
> +		   "\n        oom      %lu"
>  		   "\n        min      %lu"
>  		   "\n        low      %lu"
>  		   "\n        high     %lu"
> @@ -1196,6 +1197,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  		   "\n        present  %lu"
>  		   "\n        managed  %lu",
>  		   zone_page_state(zone, NR_FREE_PAGES),
> +		   oom_wmark_pages(zone),
>  		   min_wmark_pages(zone),
>  		   low_wmark_pages(zone),
>  		   high_wmark_pages(zone),
> -- 
> 2.3.3
> 

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Huang Ying <ying.huang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>
Subject: Re: [patch 09/12] mm: page_alloc: private memory reserves for OOM-killing allocations
Date: Tue, 14 Apr 2015 18:49:40 +0200	[thread overview]
Message-ID: <20150414164939.GJ17160@dhcp22.suse.cz> (raw)
In-Reply-To: <1427264236-17249-10-git-send-email-hannes@cmpxchg.org>

[Sorry for the late reply]

On Wed 25-03-15 02:17:13, Johannes Weiner wrote:
> The OOM killer connects random tasks in the system with unknown
> dependencies between them, and the OOM victim might well get blocked
> behind the task that is trying to allocate.  That means that while
> allocations can issue OOM kills to improve the low memory situation,
> which generally frees more than they are going to take out, they can
> not rely on their *own* OOM kills to make forward progress for them.
> 
> Secondly, we want to avoid a racing allocation swooping in to steal
> the work of the OOM killing allocation, causing spurious allocation
> failures.  The one that put in the work must have priority - if its
> efforts are enough to serve both allocations that's fine, otherwise
> concurrent allocations should be forced to issue their own OOM kills.
> 
> Keep some pages below the min watermark reserved for OOM-killing
> allocations to protect them from blocking victims and concurrent
> allocations not pulling their weight.

Yes, this makes a lot of sense. I am just not sure I am happy about some
details.

[...]
> @@ -3274,6 +3290,7 @@ void show_free_areas(unsigned int filter)
>  		show_node(zone);
>  		printk("%s"
>  			" free:%lukB"
> +			" oom:%lukB"
>  			" min:%lukB"
>  			" low:%lukB"
>  			" high:%lukB"
> @@ -3306,6 +3323,7 @@ void show_free_areas(unsigned int filter)
>  			"\n",
>  			zone->name,
>  			K(zone_page_state(zone, NR_FREE_PAGES)),
> +			K(oom_wmark_pages(zone)),
>  			K(min_wmark_pages(zone)),
>  			K(low_wmark_pages(zone)),
>  			K(high_wmark_pages(zone)),

Do we really need to export the new watermark into the userspace?
How would it help user/admin? OK, maybe show_free_areas could be helpful
for oom reports but why to export it in /proc/zoneinfo which is done
further down?

> @@ -5747,17 +5765,18 @@ static void __setup_per_zone_wmarks(void)
>  
>  			min_pages = zone->managed_pages / 1024;
>  			min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL);
> -			zone->watermark[WMARK_MIN] = min_pages;
> +			zone->watermark[WMARK_OOM] = min_pages;
>  		} else {
>  			/*
>  			 * If it's a lowmem zone, reserve a number of pages
>  			 * proportionate to the zone's size.
>  			 */
> -			zone->watermark[WMARK_MIN] = tmp;
> +			zone->watermark[WMARK_OOM] = tmp;
>  		}
>  
> -		zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + (tmp >> 2);
> -		zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);
> +		zone->watermark[WMARK_MIN]  = oom_wmark_pages(zone) + (tmp >> 3);
> +		zone->watermark[WMARK_LOW]  = oom_wmark_pages(zone) + (tmp >> 2);
> +		zone->watermark[WMARK_HIGH] = oom_wmark_pages(zone) + (tmp >> 1);

This will basically elevate the min watermark, right? And that might lead
to subtle performance differences even when OOM killer is not invoked
because the direct reclaim will start sooner.
Shouldn't we rather give WMARK_OOM half of WMARK_MIN instead?

>  
>  		__mod_zone_page_state(zone, NR_ALLOC_BATCH,
>  			high_wmark_pages(zone) - low_wmark_pages(zone) -
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 1fd0886a389f..a62f16ef524c 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1188,6 +1188,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  	seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
>  	seq_printf(m,
>  		   "\n  pages free     %lu"
> +		   "\n        oom      %lu"
>  		   "\n        min      %lu"
>  		   "\n        low      %lu"
>  		   "\n        high     %lu"
> @@ -1196,6 +1197,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  		   "\n        present  %lu"
>  		   "\n        managed  %lu",
>  		   zone_page_state(zone, NR_FREE_PAGES),
> +		   oom_wmark_pages(zone),
>  		   min_wmark_pages(zone),
>  		   low_wmark_pages(zone),
>  		   high_wmark_pages(zone),
> -- 
> 2.3.3
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-04-14 16:49 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-25  6:17 [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Johannes Weiner
2015-03-25  6:17 ` Johannes Weiner
2015-03-25  6:17 ` [patch 01/12] mm: oom_kill: remove unnecessary locking in oom_enable() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  0:51   ` David Rientjes
2015-03-26  0:51     ` David Rientjes
2015-03-26 11:51     ` Michal Hocko
2015-03-26 11:51       ` Michal Hocko
2015-03-26 13:18       ` Michal Hocko
2015-03-26 13:18         ` Michal Hocko
2015-03-26 19:30         ` David Rientjes
2015-03-26 19:30           ` David Rientjes
2015-03-26 11:43   ` Michal Hocko
2015-03-26 11:43     ` Michal Hocko
2015-03-26 20:05   ` David Rientjes
2015-03-26 20:05     ` David Rientjes
2015-03-25  6:17 ` [patch 02/12] mm: oom_kill: clean up victim marking and exiting interfaces Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  3:34   ` David Rientjes
2015-03-26  3:34     ` David Rientjes
2015-03-26 11:54   ` Michal Hocko
2015-03-26 11:54     ` Michal Hocko
2015-03-25  6:17 ` [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  3:31   ` David Rientjes
2015-03-26  3:31     ` David Rientjes
2015-03-26 11:05     ` Johannes Weiner
2015-03-26 11:05       ` Johannes Weiner
2015-03-26 19:50       ` David Rientjes
2015-03-26 19:50         ` David Rientjes
2015-03-30 14:48         ` Michal Hocko
2015-03-30 14:48           ` Michal Hocko
2015-04-02 23:01         ` [patch] android, lmk: avoid setting TIF_MEMDIE if process has already exited David Rientjes
2015-04-02 23:01           ` David Rientjes
2015-04-28 22:50           ` [patch resend] " David Rientjes
2015-04-28 22:50             ` David Rientjes
2015-03-26 11:57   ` [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear Michal Hocko
2015-03-26 11:57     ` Michal Hocko
2015-03-25  6:17 ` [patch 04/12] mm: oom_kill: remove unnecessary locking in exit_oom_victim() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 12:53   ` Michal Hocko
2015-03-26 12:53     ` Michal Hocko
2015-03-26 13:01     ` Michal Hocko
2015-03-26 13:01       ` Michal Hocko
2015-03-26 15:10       ` Johannes Weiner
2015-03-26 15:10         ` Johannes Weiner
2015-03-26 15:04     ` Johannes Weiner
2015-03-26 15:04       ` Johannes Weiner
2015-03-25  6:17 ` [patch 05/12] mm: oom_kill: generalize OOM progress waitqueue Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 13:03   ` Michal Hocko
2015-03-26 13:03     ` Michal Hocko
2015-03-25  6:17 ` [patch 06/12] mm: oom_kill: simplify OOM killer locking Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 13:31   ` Michal Hocko
2015-03-26 13:31     ` Michal Hocko
2015-03-26 15:17     ` Johannes Weiner
2015-03-26 15:17       ` Johannes Weiner
2015-03-26 16:07       ` Michal Hocko
2015-03-26 16:07         ` Michal Hocko
2015-03-25  6:17 ` [patch 07/12] mm: page_alloc: inline should_alloc_retry() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 14:11   ` Michal Hocko
2015-03-26 14:11     ` Michal Hocko
2015-03-26 15:18     ` Johannes Weiner
2015-03-26 15:18       ` Johannes Weiner
2015-03-25  6:17 ` [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-25 14:15   ` Tetsuo Handa
2015-03-25 14:15     ` Tetsuo Handa
2015-03-25 17:01     ` Vlastimil Babka
2015-03-25 17:01       ` Vlastimil Babka
2015-03-26 11:28       ` Johannes Weiner
2015-03-26 11:28         ` Johannes Weiner
2015-03-26 11:24     ` Johannes Weiner
2015-03-26 11:24       ` Johannes Weiner
2015-03-26 14:32       ` Michal Hocko
2015-03-26 14:32         ` Michal Hocko
2015-03-26 15:23         ` Johannes Weiner
2015-03-26 15:23           ` Johannes Weiner
2015-03-26 15:38           ` Michal Hocko
2015-03-26 15:38             ` Michal Hocko
2015-03-26 18:17             ` Johannes Weiner
2015-03-26 18:17               ` Johannes Weiner
2015-03-27 14:01             ` [patch 08/12] mm: page_alloc: wait for OOM killer progressbefore retrying Tetsuo Handa
2015-03-27 14:01               ` Tetsuo Handa
2015-03-26 15:58   ` [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying Michal Hocko
2015-03-26 15:58     ` Michal Hocko
2015-03-26 18:23     ` Johannes Weiner
2015-03-26 18:23       ` Johannes Weiner
2015-03-25  6:17 ` [patch 09/12] mm: page_alloc: private memory reserves for OOM-killing allocations Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-04-14 16:49   ` Michal Hocko [this message]
2015-04-14 16:49     ` Michal Hocko
2015-04-24 19:13     ` Johannes Weiner
2015-04-24 19:13       ` Johannes Weiner
2015-03-25  6:17 ` [patch 10/12] mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-04-14 16:55   ` Michal Hocko
2015-04-14 16:55     ` Michal Hocko
2015-03-25  6:17 ` [patch 11/12] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 14:50   ` Michal Hocko
2015-03-26 14:50     ` Michal Hocko
2015-03-25  6:17 ` [patch 12/12] mm: page_alloc: do not lock up low-order " Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 15:32   ` Michal Hocko
2015-03-26 15:32     ` Michal Hocko
2015-03-26 19:58 ` [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Dave Chinner
2015-03-26 19:58   ` Dave Chinner
2015-03-27 15:05   ` Johannes Weiner
2015-03-27 15:05     ` Johannes Weiner
2015-03-30  0:32     ` Dave Chinner
2015-03-30  0:32       ` Dave Chinner
2015-03-30 19:31       ` Johannes Weiner
2015-03-30 19:31         ` Johannes Weiner
2015-04-01 15:19       ` Michal Hocko
2015-04-01 15:19         ` Michal Hocko
2015-04-01 21:39         ` Dave Chinner
2015-04-01 21:39           ` Dave Chinner
2015-04-02  7:29           ` Michal Hocko
2015-04-02  7:29             ` Michal Hocko
2015-04-07 14:18         ` Johannes Weiner
2015-04-07 14:18           ` Johannes Weiner
2015-04-11  7:29           ` Tetsuo Handa
2015-04-11  7:29             ` Tetsuo Handa
2015-04-13 12:49             ` Michal Hocko
2015-04-13 12:49               ` Michal Hocko
2015-04-13 12:46           ` Michal Hocko
2015-04-13 12:46             ` Michal Hocko
2015-04-14  0:11             ` Dave Chinner
2015-04-14  0:11               ` Dave Chinner
2015-04-14  7:20               ` Michal Hocko
2015-04-14  7:20                 ` Michal Hocko
2015-04-14 10:36             ` Johannes Weiner
2015-04-14 10:36               ` Johannes Weiner
2015-04-14 14:23               ` Michal Hocko
2015-04-14 14:23                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150414164939.GJ17160@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.