All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"containers@lists.osdl.org" <containers@lists.osdl.org>,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v4 02/11] memcg: document cgroup dirty memory interfaces
Date: Fri, 29 Oct 2010 14:35:50 -0700	[thread overview]
Message-ID: <xr9339rolm15.fsf@ninji.mtv.corp.google.com> (raw)
In-Reply-To: <20101029110331.GA29774@localhost> (Wu Fengguang's message of "Fri, 29 Oct 2010 19:03:31 +0800")

Wu Fengguang <fengguang.wu@intel.com> writes:

> Hi Greg,
>
> On Fri, Oct 29, 2010 at 03:09:05PM +0800, Greg Thelen wrote:
>
>> Document cgroup dirty memory interfaces and statistics.
>> 
>> Signed-off-by: Andrea Righi <arighi@develer.com>
>> Signed-off-by: Greg Thelen <gthelen@google.com>
>> ---
>
>> +Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
>> +page cache used by a cgroup.  So, in case of multiple cgroup writers, they will
>> +not be able to consume more than their designated share of dirty pages and will
>> +be forced to perform write-out if they cross that limit.
>
> It's more pertinent to say "will be throttled", as "perform write-out"
> is some implementation behavior that will change soon. 

Good point.  I will update reword the docs to be less specific about
where the write-out occurs.  The important point is that the writer is
throttled.

>> +- memory.dirty_limit_in_bytes: the amount of dirty memory (expressed in bytes)
>> +  in the cgroup at which a process generating dirty pages will start itself
>> +  writing out dirty data.  Suffix (k, K, m, M, g, or G) can be used to indicate
>> +  that value is kilo, mega or gigabytes.
>
> The suffix feature is handy, thanks! It makes sense to also add this
> for the global interfaces, perhaps in a standalone patch.

I agree that this would also be useful for the global interfaces.  I
will submit an independent patch for the global interfaces.

>> +A cgroup may contain more dirty memory than its dirty limit.  This is possible
>> +because of the principle that the first cgroup to touch a page is charged for
>> +it.  Subsequent page counting events (dirty, writeback, nfs_unstable) are also
>> +counted to the originally charged cgroup.
>> +
>> +Example: If page is allocated by a cgroup A task, then the page is charged to
>> +cgroup A.  If the page is later dirtied by a task in cgroup B, then the cgroup A
>> +dirty count will be incremented.  If cgroup A is over its dirty limit but cgroup
>> +B is not, then dirtying a cgroup A page from a cgroup B task may push cgroup A
>> +over its dirty limit without throttling the dirtying cgroup B task.
>
> It's good to document the above "misbehavior". But why not throttling
> the dirtying cgroup B task? Is it simply not implemented or makes no
> sense to do so at all?

Ideally cgroup B would be throttled.  Note, even with this misbehavior,
the system dirty limit will keep cgroup B from exceeding system-wide
limits.

The challenge here is that when the current system increments dirty
counters using account_page_dirtied() which does not immediately check
against dirty limits.  Later balance_dirty_pages() checks to see if any
limits were exceeded, but only after a batch of pages may have been
dirtied.  The task may have written many pages in many different memcg.
So checking all possible memcg that may have been written in the mapping
may be a large set.  I do not like this approach.

memcontrol.c can easily detect when memcg other than the current task's
memcg is charged for a dirty page.  It does not record this today, but
it could.  When such a foreign page dirty event occurs the associated
memcg could be linked into the dirtying address_space so that
balance_dirty_pages() could check the limits of all foreign memcg.  In
the common case I think the task is dirtying pages that have been
charged to the task's cgroup, so the address_space's foreign_memcg list
would be empty.  But when such foreign memcg are dirtied
balance_dirty_pages() would have access to references to all memcg that
need dirty limits checking.  This approach might work.  Comments?

> Thanks,
> Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm\@kvack.org" <linux-mm@kvack.org>,
	"containers\@lists.osdl.org" <containers@lists.osdl.org>,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v4 02/11] memcg: document cgroup dirty memory interfaces
Date: Fri, 29 Oct 2010 14:35:50 -0700	[thread overview]
Message-ID: <xr9339rolm15.fsf@ninji.mtv.corp.google.com> (raw)
In-Reply-To: <20101029110331.GA29774@localhost> (Wu Fengguang's message of "Fri, 29 Oct 2010 19:03:31 +0800")

Wu Fengguang <fengguang.wu@intel.com> writes:

> Hi Greg,
>
> On Fri, Oct 29, 2010 at 03:09:05PM +0800, Greg Thelen wrote:
>
>> Document cgroup dirty memory interfaces and statistics.
>> 
>> Signed-off-by: Andrea Righi <arighi@develer.com>
>> Signed-off-by: Greg Thelen <gthelen@google.com>
>> ---
>
>> +Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
>> +page cache used by a cgroup.  So, in case of multiple cgroup writers, they will
>> +not be able to consume more than their designated share of dirty pages and will
>> +be forced to perform write-out if they cross that limit.
>
> It's more pertinent to say "will be throttled", as "perform write-out"
> is some implementation behavior that will change soon. 

Good point.  I will update reword the docs to be less specific about
where the write-out occurs.  The important point is that the writer is
throttled.

>> +- memory.dirty_limit_in_bytes: the amount of dirty memory (expressed in bytes)
>> +  in the cgroup at which a process generating dirty pages will start itself
>> +  writing out dirty data.  Suffix (k, K, m, M, g, or G) can be used to indicate
>> +  that value is kilo, mega or gigabytes.
>
> The suffix feature is handy, thanks! It makes sense to also add this
> for the global interfaces, perhaps in a standalone patch.

I agree that this would also be useful for the global interfaces.  I
will submit an independent patch for the global interfaces.

>> +A cgroup may contain more dirty memory than its dirty limit.  This is possible
>> +because of the principle that the first cgroup to touch a page is charged for
>> +it.  Subsequent page counting events (dirty, writeback, nfs_unstable) are also
>> +counted to the originally charged cgroup.
>> +
>> +Example: If page is allocated by a cgroup A task, then the page is charged to
>> +cgroup A.  If the page is later dirtied by a task in cgroup B, then the cgroup A
>> +dirty count will be incremented.  If cgroup A is over its dirty limit but cgroup
>> +B is not, then dirtying a cgroup A page from a cgroup B task may push cgroup A
>> +over its dirty limit without throttling the dirtying cgroup B task.
>
> It's good to document the above "misbehavior". But why not throttling
> the dirtying cgroup B task? Is it simply not implemented or makes no
> sense to do so at all?

Ideally cgroup B would be throttled.  Note, even with this misbehavior,
the system dirty limit will keep cgroup B from exceeding system-wide
limits.

The challenge here is that when the current system increments dirty
counters using account_page_dirtied() which does not immediately check
against dirty limits.  Later balance_dirty_pages() checks to see if any
limits were exceeded, but only after a batch of pages may have been
dirtied.  The task may have written many pages in many different memcg.
So checking all possible memcg that may have been written in the mapping
may be a large set.  I do not like this approach.

memcontrol.c can easily detect when memcg other than the current task's
memcg is charged for a dirty page.  It does not record this today, but
it could.  When such a foreign page dirty event occurs the associated
memcg could be linked into the dirtying address_space so that
balance_dirty_pages() could check the limits of all foreign memcg.  In
the common case I think the task is dirtying pages that have been
charged to the task's cgroup, so the address_space's foreign_memcg list
would be empty.  But when such foreign memcg are dirtied
balance_dirty_pages() would have access to references to all memcg that
need dirty limits checking.  This approach might work.  Comments?

> Thanks,
> Fengguang

  reply	other threads:[~2010-10-29 21:35 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-29  7:09 [PATCH v4 00/11] memcg: per cgroup dirty page accounting Greg Thelen
2010-10-29  7:09 ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 01/11] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 02/11] memcg: document cgroup dirty memory interfaces Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29 11:03   ` Wu Fengguang
2010-10-29 11:03     ` Wu Fengguang
2010-10-29 21:35     ` Greg Thelen [this message]
2010-10-29 21:35       ` Greg Thelen
2010-10-30  3:02       ` Wu Fengguang
2010-10-30  3:02         ` Wu Fengguang
2010-10-29 20:19   ` Andrew Morton
2010-10-29 20:19     ` Andrew Morton
2010-10-29 21:37     ` Greg Thelen
2010-10-29 21:37       ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 03/11] memcg: create extensible page stat update routines Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-31 14:48   ` Ciju Rajan K
2010-10-31 14:48     ` Ciju Rajan K
2010-10-31 20:11     ` Greg Thelen
2010-10-31 20:11       ` Greg Thelen
2010-11-01 20:16       ` Ciju Rajan K
2010-11-01 20:16         ` Ciju Rajan K
2010-11-02 19:35       ` Ciju Rajan K
2010-11-02 19:35         ` Ciju Rajan K
2010-10-29  7:09 ` [PATCH v4 04/11] memcg: add lock to synchronize page accounting and migration Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 05/11] writeback: create dirty_info structure Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29  7:50   ` KAMEZAWA Hiroyuki
2010-10-29  7:50     ` KAMEZAWA Hiroyuki
2010-11-18  0:49   ` Andrew Morton
2010-11-18  0:49     ` Andrew Morton
2010-11-18  0:50     ` Andrew Morton
2010-11-18  0:50       ` Andrew Morton
2010-11-18  0:50       ` Andrew Morton
2010-11-18  2:02     ` Greg Thelen
2010-11-18  2:02       ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 06/11] memcg: add dirty page accounting infrastructure Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29 11:13   ` Wu Fengguang
2010-10-29 11:13     ` Wu Fengguang
2010-10-29 11:17     ` KAMEZAWA Hiroyuki
2010-10-29 11:17       ` KAMEZAWA Hiroyuki
2010-10-29  7:09 ` [PATCH v4 07/11] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 08/11] memcg: add dirty limits to mem_cgroup Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29  7:41   ` KAMEZAWA Hiroyuki
2010-10-29  7:41     ` KAMEZAWA Hiroyuki
2010-10-29 16:00     ` Greg Thelen
2010-10-29 16:00       ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 09/11] memcg: CPU hotplug lockdep warning fix Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29 20:19   ` Andrew Morton
2010-10-29 20:19     ` Andrew Morton
2010-10-29  7:09 ` [PATCH v4 10/11] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29  7:43   ` KAMEZAWA Hiroyuki
2010-10-29  7:43     ` KAMEZAWA Hiroyuki
2010-10-29  7:09 ` [PATCH v4 11/11] memcg: check memcg dirty limits in page writeback Greg Thelen
2010-10-29  7:09   ` Greg Thelen
2010-10-29  7:48   ` KAMEZAWA Hiroyuki
2010-10-29  7:48     ` KAMEZAWA Hiroyuki
2010-10-29 16:06     ` Greg Thelen
2010-10-29 16:06       ` Greg Thelen
2010-10-31 20:03       ` Wu Fengguang
2010-10-31 20:03         ` Wu Fengguang
2010-10-29 20:19 ` [PATCH v4 00/11] memcg: per cgroup dirty page accounting Andrew Morton
2010-10-29 20:19   ` Andrew Morton
2010-10-30 21:46   ` Greg Thelen
2010-10-30 21:46     ` Greg Thelen
2010-11-02 19:33     ` Ciju Rajan K
2010-11-02 19:33       ` Ciju Rajan K

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xr9339rolm15.fsf@ninji.mtv.corp.google.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=fengguang.wu@intel.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.