All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Satoru Moriya <satoru.moriya@hds.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Satoru Moriya <smoriya@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	Seiji Aguchi <saguchi@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>
Subject: RE: [PATCH -v2 -mm] add extra free kbytes tunable
Date: Fri, 14 Oct 2011 15:46:36 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1110141536520.21305@chino.kir.corp.google.com> (raw)
In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CB4F747AC@USINDEVS02.corp.hds.com>

On Fri, 14 Oct 2011, Satoru Moriya wrote:

> > Satoru was specifically talking about the VM using free memory for 
> > pagecache,
> 
> Yes, because we can't stop increasing pagecache and it 
> occupies RAM where some people want to keep free for bursty
> memory requirement. Usually it works fine but sometimes like
> my test case doesn't work well.
> 
> > so doing echo echo 1 > /proc/sys/vm/drop_caches can mitigate 
> > that almost immediately.  
> 
> I know it and some admins use that kind of tuning. But is it
> proper way? Should we exec the script like above periodically?
> I believe that we should use it for debug only.
> 

Agreed, this was in response to the suggestion for adding a mem_shrink() 
syscall, which would require the same periodic calls or knowledge of the 
application prior to the bursty memory allocations.  I bring up 
drop_caches just to illustrate that it is effectively the same thing for 
the entire address space when pressured by pagecache.  So I don't think 
that syscall would actually help for your scenario.

> > If there were a change to increase the space significantly between the 
> > high and min watermark when min_free_kbytes changes, that would fix the 
> > problem. 
> 
> Right. But min_free_kbytes changes both thresholds, foregroud reclaim
> and background reclaim. I'd like to configure them separately like
> dirty_bytes and dirty_background_bytes for flexibility.
> 

The point I'm trying to make is that if kswapd can be made aware that it 
was kicked by a rt_task() in the page allocator, the same criteria we use 
for ALLOC_HARDER today, or a rt_task() subsequently enters the page 
allocator slowpath while kswapd is running, then not only can we increase 
the scheduling priority of kswapd but it is also possible to reclaim above 
the high watermark for an extra bonus.  I believe we can find a sane 
middle ground that requires no userspace tunable where a _single_ realtime 
application cannot allocate memory faster than kswapd with very high 
priority and reclaiming above the high watermark, whether that's a factor 
of 1.25 or not.

> > The problem is two-fold: that comes at a penalty for systems 
> > or workloads that don't need to reclaim the additional memory, and it's 
> > not clear how much space should exist between those watermarks.
> 
> The required size depends on a system architacture such as kernel,
> applications, storage etc. and so admin who care the whole system
> should configure it based on tests by his own risk.
> 

Doing that comes at a penalty for other workloads that are running on the 
same system, which is the problem with a global tunable that doesn't 
discriminate on an allocator's priority (the min -> high watermarks for 
reclaim do well except for rt-threads, as evidenced by this thread).

WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com>
To: Satoru Moriya <satoru.moriya@hds.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Satoru Moriya <smoriya@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	Seiji Aguchi <saguchi@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>
Subject: RE: [PATCH -v2 -mm] add extra free kbytes tunable
Date: Fri, 14 Oct 2011 15:46:36 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1110141536520.21305@chino.kir.corp.google.com> (raw)
In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CB4F747AC@USINDEVS02.corp.hds.com>

On Fri, 14 Oct 2011, Satoru Moriya wrote:

> > Satoru was specifically talking about the VM using free memory for 
> > pagecache,
> 
> Yes, because we can't stop increasing pagecache and it 
> occupies RAM where some people want to keep free for bursty
> memory requirement. Usually it works fine but sometimes like
> my test case doesn't work well.
> 
> > so doing echo echo 1 > /proc/sys/vm/drop_caches can mitigate 
> > that almost immediately.  
> 
> I know it and some admins use that kind of tuning. But is it
> proper way? Should we exec the script like above periodically?
> I believe that we should use it for debug only.
> 

Agreed, this was in response to the suggestion for adding a mem_shrink() 
syscall, which would require the same periodic calls or knowledge of the 
application prior to the bursty memory allocations.  I bring up 
drop_caches just to illustrate that it is effectively the same thing for 
the entire address space when pressured by pagecache.  So I don't think 
that syscall would actually help for your scenario.

> > If there were a change to increase the space significantly between the 
> > high and min watermark when min_free_kbytes changes, that would fix the 
> > problem. 
> 
> Right. But min_free_kbytes changes both thresholds, foregroud reclaim
> and background reclaim. I'd like to configure them separately like
> dirty_bytes and dirty_background_bytes for flexibility.
> 

The point I'm trying to make is that if kswapd can be made aware that it 
was kicked by a rt_task() in the page allocator, the same criteria we use 
for ALLOC_HARDER today, or a rt_task() subsequently enters the page 
allocator slowpath while kswapd is running, then not only can we increase 
the scheduling priority of kswapd but it is also possible to reclaim above 
the high watermark for an extra bonus.  I believe we can find a sane 
middle ground that requires no userspace tunable where a _single_ realtime 
application cannot allocate memory faster than kswapd with very high 
priority and reclaiming above the high watermark, whether that's a factor 
of 1.25 or not.

> > The problem is two-fold: that comes at a penalty for systems 
> > or workloads that don't need to reclaim the additional memory, and it's 
> > not clear how much space should exist between those watermarks.
> 
> The required size depends on a system architacture such as kernel,
> applications, storage etc. and so admin who care the whole system
> should configure it based on tests by his own risk.
> 

Doing that comes at a penalty for other workloads that are running on the 
same system, which is the problem with a global tunable that doesn't 
discriminate on an allocator's priority (the min -> high watermarks for 
reclaim do well except for rt-threads, as evidenced by this thread).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-10-14 22:46 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-01 14:52 [PATCH -mm] add extra free kbytes tunable Rik van Riel
2011-09-01 14:52 ` Rik van Riel
2011-09-01 17:06 ` Randy Dunlap
2011-09-01 17:06   ` Randy Dunlap
2011-09-01 19:26   ` [PATCH -v2 " Rik van Riel
2011-09-01 19:26     ` Rik van Riel
2011-09-01 21:58     ` Andrew Morton
2011-09-01 21:58       ` Andrew Morton
2011-09-01 22:08       ` David Rientjes
2011-09-01 22:08         ` David Rientjes
2011-09-01 22:16         ` Andrew Morton
2011-09-01 22:16           ` Andrew Morton
2011-09-02 16:31       ` Satoru Moriya
2011-09-02 16:31         ` Satoru Moriya
2011-10-13  7:33         ` Minchan Kim
2011-10-13  7:33           ` Minchan Kim
2011-10-13  8:09           ` KAMEZAWA Hiroyuki
2011-10-13  8:09             ` KAMEZAWA Hiroyuki
     [not found]       ` <E1FA588BC672D846BDBB452FCA1E308C2389B4@USINDEVS02.corp.hds.com>
2011-09-15  3:33         ` Satoru Moriya
2011-09-15  3:33           ` Satoru Moriya
2011-09-01 22:09     ` Andrew Morton
2011-09-01 22:09       ` Andrew Morton
2011-09-02 16:26       ` [PATCH -mm] fixes & cleanups for "add extra free kbytes tunable" Rik van Riel
2011-09-02 16:26         ` Rik van Riel
2011-09-30 21:43     ` [PATCH -v2 -mm] add extra free kbytes tunable Johannes Weiner
2011-09-30 21:43       ` Johannes Weiner
2011-10-08  3:08     ` David Rientjes
2011-10-08  3:08       ` David Rientjes
2011-10-10 22:37       ` Andrew Morton
2011-10-10 22:37         ` Andrew Morton
2011-10-11 19:32         ` Satoru Moriya
2011-10-11 19:32           ` Satoru Moriya
2011-10-11 19:54           ` Andrew Morton
2011-10-11 19:54             ` Andrew Morton
2011-10-11 20:23             ` Satoru Moriya
2011-10-11 20:23               ` Satoru Moriya
2011-10-11 20:54               ` Andrew Morton
2011-10-11 20:54                 ` Andrew Morton
2011-10-12 13:09                 ` Rik van Riel
2011-10-12 13:09                   ` Rik van Riel
2011-10-12 19:20                   ` Andrew Morton
2011-10-12 19:20                     ` Andrew Morton
2011-10-12 19:58                     ` Rik van Riel
2011-10-12 19:58                       ` Rik van Riel
2011-10-12 20:26                       ` David Rientjes
2011-10-12 20:26                         ` David Rientjes
2011-10-21 23:48                       ` Satoru Moriya
2011-10-21 23:48                         ` Satoru Moriya
2011-10-23 21:22                         ` David Rientjes
2011-10-23 21:22                           ` David Rientjes
2011-10-25  2:04                           ` Satoru Moriya
2011-10-25  2:04                             ` Satoru Moriya
2011-10-25 21:50                             ` David Rientjes
2011-10-25 21:50                               ` David Rientjes
2011-10-26 18:59                               ` Satoru Moriya
2011-10-26 18:59                                 ` Satoru Moriya
2011-10-12 21:08                 ` Satoru Moriya
2011-10-12 21:08                   ` Satoru Moriya
2011-10-12 22:41                   ` David Rientjes
2011-10-12 22:41                     ` David Rientjes
2011-10-12 23:52                     ` Satoru Moriya
2011-10-12 23:52                       ` Satoru Moriya
2011-10-13  0:01                       ` David Rientjes
2011-10-13  0:01                         ` David Rientjes
2011-10-13  5:35                         ` KAMEZAWA Hiroyuki
2011-10-13  5:35                           ` KAMEZAWA Hiroyuki
2011-10-13 20:55                           ` David Rientjes
2011-10-13 20:55                             ` David Rientjes
2011-10-14 22:16                             ` Satoru Moriya
2011-10-14 22:16                               ` Satoru Moriya
2011-10-14 22:46                               ` David Rientjes [this message]
2011-10-14 22:46                                 ` David Rientjes
2011-10-14  5:32                           ` Satoru Moriya
2011-10-14  5:32                             ` Satoru Moriya
2011-10-14  5:06                         ` Satoru Moriya
2011-10-14  5:06                           ` Satoru Moriya
2011-10-11 23:22           ` David Rientjes
2011-10-11 23:22             ` David Rientjes
2011-10-13 16:54             ` Satoru Moriya
2011-10-13 16:54               ` Satoru Moriya
2011-10-13 20:48               ` David Rientjes
2011-10-13 20:48                 ` David Rientjes
2011-10-13 21:11                 ` Rik van Riel
2011-10-13 21:11                   ` Rik van Riel
2011-10-13 22:02                   ` David Rientjes
2011-10-13 22:02                     ` David Rientjes
2011-10-11 19:20       ` Satoru Moriya
2011-10-11 19:20         ` Satoru Moriya
2011-10-11 21:04         ` David Rientjes
2011-10-11 21:04           ` David Rientjes
2011-10-12 13:13           ` Rik van Riel
2011-10-12 13:13             ` Rik van Riel
2011-10-12 20:21             ` David Rientjes
2011-10-12 20:21               ` David Rientjes
2011-10-13  4:13               ` Rik van Riel
2011-10-13  4:13                 ` Rik van Riel
2011-10-13  5:22                 ` David Rientjes
2011-10-13  5:22                   ` David Rientjes
2011-10-22  0:11                   ` Satoru Moriya
2011-10-22  0:11                     ` Satoru Moriya
2011-09-09 23:01 Satoru Moriya
2011-09-09 23:01 ` Satoru Moriya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1110141536520.21305@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=rdunlap@xenotime.net \
    --cc=riel@redhat.com \
    --cc=saguchi@redhat.com \
    --cc=satoru.moriya@hds.com \
    --cc=smoriya@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.