All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan.kim@gmail.com>
To: Satoru Moriya <satoru.moriya@hds.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Satoru Moriya <smoriya@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	Seiji Aguchi <saguchi@redhat.com>,
	"hughd@google.com" <hughd@google.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH -v2 -mm] add extra free kbytes tunable
Date: Thu, 13 Oct 2011 16:33:21 +0900	[thread overview]
Message-ID: <20111013073321.GA2784@barrios-desktop> (raw)
In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CAFB42677@USINDEVS02.corp.hds.com>

On Fri, Sep 02, 2011 at 12:31:14PM -0400, Satoru Moriya wrote:
> On 09/01/2011 05:58 PM, Andrew Morton wrote:
> > On Thu, 1 Sep 2011 15:26:50 -0400
> > Rik van Riel <riel@redhat.com> wrote:
> > 
> >> Add a userspace visible knob
> > 
> > argh.  Fear and hostility at new knobs which need to be maintained for 
> > ever, even if the underlying implementation changes.
> > 
> > Unfortunately, this one makes sense.
> > 
> >> to tell the VM to keep an extra amount of memory free, by increasing 
> >> the gap between each zone's min and low watermarks.
> >>
> >> This is useful for realtime applications that call system calls and 
> >> have a bound on the number of allocations that happen in any short 
> >> time period.  In this application, extra_free_kbytes would be left at 
> >> an amount equal to or larger than the maximum number of 
> >> allocations that happen in any burst.
> > 
> > _is_ it useful?  Proof?
> > 
> > Who is requesting this?  Have they tested it?  Results?
> 
> This is interesting for me.
> 
> Some of our customers have realtime applications and they are concerned 
> the fact that Linux uses free memory as pagecache. It means that
> when their application allocate memory, Linux kernel tries to reclaim
> memory at first and then allocate it. This may make memory allocation
> latency bigger.
> 
> In many cases this is not a big issue because Linux has kswapd for
> background reclaim and it is fast enough not to enter direct reclaim
> path if there are a lot of clean cache. But under some situations -
> e.g. Application allocates a lot of memory which is larger than delta
> between watermark_low and watermark_min in a short time and kswapd
> can't reclaim fast enough due to dirty page reclaim, direct reclaim
> is executed and causes big latency.
> 
> We can avoid the issue above by using preallocation and mlock.
> But it can't cover kmalloc used in systemcall. So I'd like to use
> this patch with mlock to avoid memory allocation latency issue as
> low as possible. It may not be a perfect solution but it is important
> for customers in enterprise area to configure the amount of free
> memory at their own risk.

I agree needs for such feature but don't like such primitive interface
exporting to user.

As Satoru said, we can reserve free pages for user through preallocation and mlocking.
The thing is free pages for kernel itself.
Most desirable thing is we have to avoid syscall in critical realtime section.
But if we can't avoid, my crazy idea is to use memcg for kernel pages.
Of course, we should implement it and not simple stuff but AFAIK, memcg people
always consider it and finally will do it. :)
Recently, Glauber try "Basic kernel memory functionality" but I don't have reviewed
it yet. I am not sure we can reuse it, anyway. Kame?

My simple idea is as follows,

We can assign basic revered page pool and/or size of user-determined pages pool
for each task registred at memcg-slab.
The application have to notify start of RT section to memcg before it goes to
RT section. So, memcg could fill up page pool if it is short. In this case,
application can stuck but it's okay as it doesn't go to RT section yet.
The applicatoin have to notify end of RT section to memcg, too so that memcg
could try to fill up reserved page pool in case of shortage.

Why we need such notification is kswapd high prioiry, new knob and others never
can meet application's deadline requirement in some situations(ex,
there are so many dirty pages in LRU or fill up anon pages in non-swap case and so on)
so that application might end up stuck at some point. The somepoint must be out of RT
section of the task.

For implemenation, we might need new watermark setting for each memcg or/and
kswapd prioirity promotion like thing for hurry reclaiming.
Anyway, they are just implementaions and we could enhance/add further more through
various techniques as time goes by.

Personally, I think it could a valuable featue.

-- 
Kinds regards,
Minchan Kim

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan.kim@gmail.com>
To: Satoru Moriya <satoru.moriya@hds.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Satoru Moriya <smoriya@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	Seiji Aguchi <saguchi@redhat.com>,
	"hughd@google.com" <hughd@google.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH -v2 -mm] add extra free kbytes tunable
Date: Thu, 13 Oct 2011 16:33:21 +0900	[thread overview]
Message-ID: <20111013073321.GA2784@barrios-desktop> (raw)
In-Reply-To: <65795E11DBF1E645A09CEC7EAEE94B9CAFB42677@USINDEVS02.corp.hds.com>

On Fri, Sep 02, 2011 at 12:31:14PM -0400, Satoru Moriya wrote:
> On 09/01/2011 05:58 PM, Andrew Morton wrote:
> > On Thu, 1 Sep 2011 15:26:50 -0400
> > Rik van Riel <riel@redhat.com> wrote:
> > 
> >> Add a userspace visible knob
> > 
> > argh.  Fear and hostility at new knobs which need to be maintained for 
> > ever, even if the underlying implementation changes.
> > 
> > Unfortunately, this one makes sense.
> > 
> >> to tell the VM to keep an extra amount of memory free, by increasing 
> >> the gap between each zone's min and low watermarks.
> >>
> >> This is useful for realtime applications that call system calls and 
> >> have a bound on the number of allocations that happen in any short 
> >> time period.  In this application, extra_free_kbytes would be left at 
> >> an amount equal to or larger than the maximum number of 
> >> allocations that happen in any burst.
> > 
> > _is_ it useful?  Proof?
> > 
> > Who is requesting this?  Have they tested it?  Results?
> 
> This is interesting for me.
> 
> Some of our customers have realtime applications and they are concerned 
> the fact that Linux uses free memory as pagecache. It means that
> when their application allocate memory, Linux kernel tries to reclaim
> memory at first and then allocate it. This may make memory allocation
> latency bigger.
> 
> In many cases this is not a big issue because Linux has kswapd for
> background reclaim and it is fast enough not to enter direct reclaim
> path if there are a lot of clean cache. But under some situations -
> e.g. Application allocates a lot of memory which is larger than delta
> between watermark_low and watermark_min in a short time and kswapd
> can't reclaim fast enough due to dirty page reclaim, direct reclaim
> is executed and causes big latency.
> 
> We can avoid the issue above by using preallocation and mlock.
> But it can't cover kmalloc used in systemcall. So I'd like to use
> this patch with mlock to avoid memory allocation latency issue as
> low as possible. It may not be a perfect solution but it is important
> for customers in enterprise area to configure the amount of free
> memory at their own risk.

I agree needs for such feature but don't like such primitive interface
exporting to user.

As Satoru said, we can reserve free pages for user through preallocation and mlocking.
The thing is free pages for kernel itself.
Most desirable thing is we have to avoid syscall in critical realtime section.
But if we can't avoid, my crazy idea is to use memcg for kernel pages.
Of course, we should implement it and not simple stuff but AFAIK, memcg people
always consider it and finally will do it. :)
Recently, Glauber try "Basic kernel memory functionality" but I don't have reviewed
it yet. I am not sure we can reuse it, anyway. Kame?

My simple idea is as follows,

We can assign basic revered page pool and/or size of user-determined pages pool
for each task registred at memcg-slab.
The application have to notify start of RT section to memcg before it goes to
RT section. So, memcg could fill up page pool if it is short. In this case,
application can stuck but it's okay as it doesn't go to RT section yet.
The applicatoin have to notify end of RT section to memcg, too so that memcg
could try to fill up reserved page pool in case of shortage.

Why we need such notification is kswapd high prioiry, new knob and others never
can meet application's deadline requirement in some situations(ex,
there are so many dirty pages in LRU or fill up anon pages in non-swap case and so on)
so that application might end up stuck at some point. The somepoint must be out of RT
section of the task.

For implemenation, we might need new watermark setting for each memcg or/and
kswapd prioirity promotion like thing for hurry reclaiming.
Anyway, they are just implementaions and we could enhance/add further more through
various techniques as time goes by.

Personally, I think it could a valuable featue.

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-10-13  7:33 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-01 14:52 [PATCH -mm] add extra free kbytes tunable Rik van Riel
2011-09-01 14:52 ` Rik van Riel
2011-09-01 17:06 ` Randy Dunlap
2011-09-01 17:06   ` Randy Dunlap
2011-09-01 19:26   ` [PATCH -v2 " Rik van Riel
2011-09-01 19:26     ` Rik van Riel
2011-09-01 21:58     ` Andrew Morton
2011-09-01 21:58       ` Andrew Morton
2011-09-01 22:08       ` David Rientjes
2011-09-01 22:08         ` David Rientjes
2011-09-01 22:16         ` Andrew Morton
2011-09-01 22:16           ` Andrew Morton
2011-09-02 16:31       ` Satoru Moriya
2011-09-02 16:31         ` Satoru Moriya
2011-10-13  7:33         ` Minchan Kim [this message]
2011-10-13  7:33           ` Minchan Kim
2011-10-13  8:09           ` KAMEZAWA Hiroyuki
2011-10-13  8:09             ` KAMEZAWA Hiroyuki
     [not found]       ` <E1FA588BC672D846BDBB452FCA1E308C2389B4@USINDEVS02.corp.hds.com>
2011-09-15  3:33         ` Satoru Moriya
2011-09-15  3:33           ` Satoru Moriya
2011-09-01 22:09     ` Andrew Morton
2011-09-01 22:09       ` Andrew Morton
2011-09-02 16:26       ` [PATCH -mm] fixes & cleanups for "add extra free kbytes tunable" Rik van Riel
2011-09-02 16:26         ` Rik van Riel
2011-09-30 21:43     ` [PATCH -v2 -mm] add extra free kbytes tunable Johannes Weiner
2011-09-30 21:43       ` Johannes Weiner
2011-10-08  3:08     ` David Rientjes
2011-10-08  3:08       ` David Rientjes
2011-10-10 22:37       ` Andrew Morton
2011-10-10 22:37         ` Andrew Morton
2011-10-11 19:32         ` Satoru Moriya
2011-10-11 19:32           ` Satoru Moriya
2011-10-11 19:54           ` Andrew Morton
2011-10-11 19:54             ` Andrew Morton
2011-10-11 20:23             ` Satoru Moriya
2011-10-11 20:23               ` Satoru Moriya
2011-10-11 20:54               ` Andrew Morton
2011-10-11 20:54                 ` Andrew Morton
2011-10-12 13:09                 ` Rik van Riel
2011-10-12 13:09                   ` Rik van Riel
2011-10-12 19:20                   ` Andrew Morton
2011-10-12 19:20                     ` Andrew Morton
2011-10-12 19:58                     ` Rik van Riel
2011-10-12 19:58                       ` Rik van Riel
2011-10-12 20:26                       ` David Rientjes
2011-10-12 20:26                         ` David Rientjes
2011-10-21 23:48                       ` Satoru Moriya
2011-10-21 23:48                         ` Satoru Moriya
2011-10-23 21:22                         ` David Rientjes
2011-10-23 21:22                           ` David Rientjes
2011-10-25  2:04                           ` Satoru Moriya
2011-10-25  2:04                             ` Satoru Moriya
2011-10-25 21:50                             ` David Rientjes
2011-10-25 21:50                               ` David Rientjes
2011-10-26 18:59                               ` Satoru Moriya
2011-10-26 18:59                                 ` Satoru Moriya
2011-10-12 21:08                 ` Satoru Moriya
2011-10-12 21:08                   ` Satoru Moriya
2011-10-12 22:41                   ` David Rientjes
2011-10-12 22:41                     ` David Rientjes
2011-10-12 23:52                     ` Satoru Moriya
2011-10-12 23:52                       ` Satoru Moriya
2011-10-13  0:01                       ` David Rientjes
2011-10-13  0:01                         ` David Rientjes
2011-10-13  5:35                         ` KAMEZAWA Hiroyuki
2011-10-13  5:35                           ` KAMEZAWA Hiroyuki
2011-10-13 20:55                           ` David Rientjes
2011-10-13 20:55                             ` David Rientjes
2011-10-14 22:16                             ` Satoru Moriya
2011-10-14 22:16                               ` Satoru Moriya
2011-10-14 22:46                               ` David Rientjes
2011-10-14 22:46                                 ` David Rientjes
2011-10-14  5:32                           ` Satoru Moriya
2011-10-14  5:32                             ` Satoru Moriya
2011-10-14  5:06                         ` Satoru Moriya
2011-10-14  5:06                           ` Satoru Moriya
2011-10-11 23:22           ` David Rientjes
2011-10-11 23:22             ` David Rientjes
2011-10-13 16:54             ` Satoru Moriya
2011-10-13 16:54               ` Satoru Moriya
2011-10-13 20:48               ` David Rientjes
2011-10-13 20:48                 ` David Rientjes
2011-10-13 21:11                 ` Rik van Riel
2011-10-13 21:11                   ` Rik van Riel
2011-10-13 22:02                   ` David Rientjes
2011-10-13 22:02                     ` David Rientjes
2011-10-11 19:20       ` Satoru Moriya
2011-10-11 19:20         ` Satoru Moriya
2011-10-11 21:04         ` David Rientjes
2011-10-11 21:04           ` David Rientjes
2011-10-12 13:13           ` Rik van Riel
2011-10-12 13:13             ` Rik van Riel
2011-10-12 20:21             ` David Rientjes
2011-10-12 20:21               ` David Rientjes
2011-10-13  4:13               ` Rik van Riel
2011-10-13  4:13                 ` Rik van Riel
2011-10-13  5:22                 ` David Rientjes
2011-10-13  5:22                   ` David Rientjes
2011-10-22  0:11                   ` Satoru Moriya
2011-10-22  0:11                     ` Satoru Moriya
2011-09-09 23:01 Satoru Moriya
2011-09-09 23:01 ` Satoru Moriya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111013073321.GA2784@barrios-desktop \
    --to=minchan.kim@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=rdunlap@xenotime.net \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=saguchi@redhat.com \
    --cc=satoru.moriya@hds.com \
    --cc=smoriya@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.