All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ying Han <yinghan@google.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>, Jan Kara <jack@suse.cz>,
	"bsingharora@gmail.com" <bsingharora@gmail.com>,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	lsf-pc@lists.linux-foundation.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: memcg writeback (was Re: [Lsf-pc] [LSF/MM TOPIC] memcg topics.)
Date: Wed, 8 Feb 2012 12:54:33 -0800	[thread overview]
Message-ID: <CALWz4izTS_E3uHLLfq3c9=LCuEh_yykmfrRAv4G1gUHumzGDzQ@mail.gmail.com> (raw)
In-Reply-To: <20120208093120.GA18993@localhost>

On Wed, Feb 8, 2012 at 1:31 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Tue, Feb 07, 2012 at 11:55:05PM -0800, Greg Thelen wrote:
>> On Fri, Feb 3, 2012 at 1:40 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > If moving dirty pages out of the memcg to the 20% global dirty pages
>> > pool on page reclaim, the above OOM can be avoided. It does change the
>> > meaning of memory.limit_in_bytes in that the memcg tasks can now
>> > actually consume more pages (up to the shared global 20% dirty limit).
>>
>> This seems like an easy change, but unfortunately the global 20% pool
>> has some shortcomings for my needs:
>>
>> 1. the global 20% pool is not moderated.  One cgroup can dominate it
>>     and deny service to other cgroups.
>
> It is moderated by balance_dirty_pages() -- in terms of dirty ratelimit.
> And you have the freedom to control the bandwidth allocation with some
> async write I/O controller.
>
> Even though there is no direct control of dirty pages, we can roughly
> get it as the side effect of rate control. Given
>
>        ratelimit_cgroup_A = 2 * ratelimit_cgroup_B
>
> There will naturally be more dirty pages for cgroup A to be worked by
> the flusher. And the dirty pages will be roughly balanced around
>
>        nr_dirty_cgroup_A = 2 * nr_dirty_cgroup_B
>
> when writeout bandwidths for their dirty pages are equal.
>
>> 2. the global 20% pool is free, unaccounted memory.  Ideally cgroups only
>>     use the amount of memory specified in their memory.limit_in_bytes.  The
>>     goal is to sell portions of a system.  Global resource like the 20% are an
>>     undesirable system-wide tax that's shared by jobs that may not even
>>     perform buffered writes.
>
> Right, it is the shortcoming.
>
>> 3. Setting aside 20% extra memory for system wide dirty buffers is a lot of
>>     memory.  This becomes a larger issue when the global dirty_ratio is
>>     higher than 20%.
>
> Yeah the global pool scheme does mean that you'd better allocate at
> most 80% memory to individual memory cgroups, otherwise it's possible
> for a tiny memcg doing dd writes to push dirty pages to global LRU and
> *squeeze* the size of other memcgs.
>
> However I guess it should be mitigated by the fact that
>
> - we typically already reserve some space for the root memcg

Can you give more details on that? AFAIK, we don't treat root cgroup
differently than other sub-cgroups, except root cgroup doesn't have
limit.

In general, I don't like the idea of shared pool in root for all the
dirty pages.

Imagining a system which has nothing running under root and every
application runs within sub-cgroup. It is easy to track and limit each
cgroup's memory usage, but not the pages being moved to root. We have
been experiencing difficulties of tracking pages being re-parented to
root, and this will make it even harder.

--Ying

>
> - 20% dirty ratio is mostly an overkill for large memory systems.
>  It's often enough to hold 10-30s worth of dirty data for them, which
>  is 1-3GB for one 100MB/s disk. This is the reason vm.dirty_bytes is
>  introduced: someone wants to do some <1% dirty ratio.
>
> Thanks,
> Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-02-08 20:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-08  7:55 memcg writeback (was Re: [Lsf-pc] [LSF/MM TOPIC] memcg topics.) Greg Thelen
2012-02-08  9:31 ` Wu Fengguang
2012-02-08 20:54   ` Ying Han [this message]
2012-02-09 13:50     ` Wu Fengguang
2012-02-13 18:40       ` Ying Han
2012-02-10  5:51   ` Greg Thelen
2012-02-10  5:52     ` Greg Thelen
2012-02-10  9:20       ` Wu Fengguang
2012-02-10 11:47     ` Wu Fengguang
2012-02-11 12:44       ` reclaim the LRU lists full of dirty/writeback pages Wu Fengguang
2012-02-11 14:55         ` Rik van Riel
2012-02-12  3:10           ` Wu Fengguang
2012-02-12  6:45             ` Wu Fengguang
2012-02-13 15:43             ` Jan Kara
2012-02-14 10:03               ` Wu Fengguang
2012-02-14 13:29                 ` Jan Kara
2012-02-16  4:00                   ` Wu Fengguang
2012-02-16 12:44                     ` Jan Kara
2012-02-16 13:32                       ` Wu Fengguang
2012-02-16 14:06                         ` Wu Fengguang
2012-02-17 16:41                     ` Wu Fengguang
2012-02-20 14:00                       ` Jan Kara
2012-02-14 10:19         ` Mel Gorman
2012-02-14 13:18           ` Wu Fengguang
2012-02-14 13:35             ` Wu Fengguang
2012-02-14 15:51             ` Mel Gorman
2012-02-16  9:50               ` Wu Fengguang
2012-02-16 17:31                 ` Mel Gorman
2012-02-27 14:24                   ` Fengguang Wu
2012-02-16  0:00             ` KAMEZAWA Hiroyuki
2012-02-16  3:04               ` Wu Fengguang
2012-02-16  3:52                 ` KAMEZAWA Hiroyuki
2012-02-16  4:05                   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALWz4izTS_E3uHLLfq3c9=LCuEh_yykmfrRAv4G1gUHumzGDzQ@mail.gmail.com' \
    --to=yinghan@google.com \
    --cc=bsingharora@gmail.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.