All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Greg Thelen <gthelen@google.com>
Cc: Jan Kara <jack@suse.cz>,
	"bsingharora@gmail.com" <bsingharora@gmail.com>,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
	Ying Han <yinghan@google.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	lsf-pc@lists.linux-foundation.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: memcg writeback (was Re: [Lsf-pc] [LSF/MM TOPIC] memcg topics.)
Date: Wed, 8 Feb 2012 17:31:20 +0800	[thread overview]
Message-ID: <20120208093120.GA18993@localhost> (raw)
In-Reply-To: <CAHH2K0b-+T4dspJPKq5TH25aH58TEr+7yvq0-HMkbFi0ghqAfA@mail.gmail.com>

On Tue, Feb 07, 2012 at 11:55:05PM -0800, Greg Thelen wrote:
> On Fri, Feb 3, 2012 at 1:40 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > If moving dirty pages out of the memcg to the 20% global dirty pages
> > pool on page reclaim, the above OOM can be avoided. It does change the
> > meaning of memory.limit_in_bytes in that the memcg tasks can now
> > actually consume more pages (up to the shared global 20% dirty limit).
> 
> This seems like an easy change, but unfortunately the global 20% pool
> has some shortcomings for my needs:
> 
> 1. the global 20% pool is not moderated.  One cgroup can dominate it
>     and deny service to other cgroups.

It is moderated by balance_dirty_pages() -- in terms of dirty ratelimit.
And you have the freedom to control the bandwidth allocation with some
async write I/O controller.

Even though there is no direct control of dirty pages, we can roughly
get it as the side effect of rate control. Given

        ratelimit_cgroup_A = 2 * ratelimit_cgroup_B

There will naturally be more dirty pages for cgroup A to be worked by
the flusher. And the dirty pages will be roughly balanced around

        nr_dirty_cgroup_A = 2 * nr_dirty_cgroup_B

when writeout bandwidths for their dirty pages are equal.

> 2. the global 20% pool is free, unaccounted memory.  Ideally cgroups only
>     use the amount of memory specified in their memory.limit_in_bytes.  The
>     goal is to sell portions of a system.  Global resource like the 20% are an
>     undesirable system-wide tax that's shared by jobs that may not even
>     perform buffered writes.

Right, it is the shortcoming.

> 3. Setting aside 20% extra memory for system wide dirty buffers is a lot of
>     memory.  This becomes a larger issue when the global dirty_ratio is
>     higher than 20%.

Yeah the global pool scheme does mean that you'd better allocate at
most 80% memory to individual memory cgroups, otherwise it's possible
for a tiny memcg doing dd writes to push dirty pages to global LRU and
*squeeze* the size of other memcgs.

However I guess it should be mitigated by the fact that

- we typically already reserve some space for the root memcg

- 20% dirty ratio is mostly an overkill for large memory systems.
  It's often enough to hold 10-30s worth of dirty data for them, which
  is 1-3GB for one 100MB/s disk. This is the reason vm.dirty_bytes is
  introduced: someone wants to do some <1% dirty ratio.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-02-08  9:41 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-08  7:55 memcg writeback (was Re: [Lsf-pc] [LSF/MM TOPIC] memcg topics.) Greg Thelen
2012-02-08  9:31 ` Wu Fengguang [this message]
2012-02-08 20:54   ` Ying Han
2012-02-09 13:50     ` Wu Fengguang
2012-02-13 18:40       ` Ying Han
2012-02-10  5:51   ` Greg Thelen
2012-02-10  5:52     ` Greg Thelen
2012-02-10  9:20       ` Wu Fengguang
2012-02-10 11:47     ` Wu Fengguang
2012-02-11 12:44       ` reclaim the LRU lists full of dirty/writeback pages Wu Fengguang
2012-02-11 14:55         ` Rik van Riel
2012-02-12  3:10           ` Wu Fengguang
2012-02-12  6:45             ` Wu Fengguang
2012-02-13 15:43             ` Jan Kara
2012-02-14 10:03               ` Wu Fengguang
2012-02-14 13:29                 ` Jan Kara
2012-02-16  4:00                   ` Wu Fengguang
2012-02-16 12:44                     ` Jan Kara
2012-02-16 13:32                       ` Wu Fengguang
2012-02-16 14:06                         ` Wu Fengguang
2012-02-17 16:41                     ` Wu Fengguang
2012-02-20 14:00                       ` Jan Kara
2012-02-14 10:19         ` Mel Gorman
2012-02-14 13:18           ` Wu Fengguang
2012-02-14 13:35             ` Wu Fengguang
2012-02-14 15:51             ` Mel Gorman
2012-02-16  9:50               ` Wu Fengguang
2012-02-16 17:31                 ` Mel Gorman
2012-02-27 14:24                   ` Fengguang Wu
2012-02-16  0:00             ` KAMEZAWA Hiroyuki
2012-02-16  3:04               ` Wu Fengguang
2012-02-16  3:52                 ` KAMEZAWA Hiroyuki
2012-02-16  4:05                   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120208093120.GA18993@localhost \
    --to=fengguang.wu@intel.com \
    --cc=bsingharora@gmail.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.