All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	Greg Thelen <gthelen@google.com>
Subject: [PATCH v8 01/12] memcg: document cgroup dirty memory interfaces
Date: Fri,  3 Jun 2011 09:12:07 -0700	[thread overview]
Message-ID: <1307117538-14317-2-git-send-email-gthelen@google.com> (raw)
In-Reply-To: <1307117538-14317-1-git-send-email-gthelen@google.com>

Document cgroup dirty memory interfaces and statistics.

The implementation for these new interfaces routines comes in a series
of following patches.

Signed-off-by: Andrea Righi <arighi@develer.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 Documentation/cgroups/memory.txt |   70 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 43b9e46..15019a3 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -395,6 +395,10 @@ soft_direct_steal- # of pages reclaimed in global hierarchical reclaim from
 		direct reclaim
 soft_direct_scan- # of pages scanned in global hierarchical reclaim from
 		direct reclaim
+dirty		- # of bytes that are waiting to get written back to the disk.
+writeback	- # of bytes that are actively being written back to the disk.
+nfs_unstable	- # of bytes sent to the NFS server, but not yet committed to
+		the actual storage.
 inactive_anon	- # of bytes of anonymous memory and swap cache memory on
 		LRU list.
 active_anon	- # of bytes of anonymous and swap cache memory on active
@@ -420,6 +424,9 @@ total_soft_kswapd_steal	- sum of all children's "soft_kswapd_steal"
 total_soft_kswapd_scan	- sum of all children's "soft_kswapd_scan"
 total_soft_direct_steal	- sum of all children's "soft_direct_steal"
 total_soft_direct_scan	- sum of all children's "soft_direct_scan"
+total_dirty		- sum of all children's "dirty"
+total_writeback		- sum of all children's "writeback"
+total_nfs_unstable	- sum of all children's "nfs_unstable"
 total_inactive_anon	- sum of all children's "inactive_anon"
 total_active_anon	- sum of all children's "active_anon"
 total_inactive_file	- sum of all children's "inactive_file"
@@ -476,6 +483,69 @@ value for efficient access. (Of course, when necessary, it's synchronized.)
 If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
 value in memory.stat(see 5.2).
 
+5.6 dirty memory
+
+Control the maximum amount of dirty pages a cgroup can have at any given time.
+
+Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
+page cache used by a cgroup.  So, in case of multiple cgroup writers, they will
+not be able to consume more than their designated share of dirty pages and will
+be throttled if they cross that limit.  System-wide dirty limits are also
+consulted.  Dirty memory consumption is checked against both system-wide and
+per-cgroup dirty limits.
+
+The interface is similar to the procfs interface: /proc/sys/vm/dirty_*.  It is
+possible to configure a limit to trigger throttling of a dirtier or queue
+background writeback.  The root cgroup memory.dirty_* control files are
+read-only and match the contents of the /proc/sys/vm/dirty_* files.
+
+Per-cgroup dirty limits can be set using the following files in the cgroupfs:
+
+- memory.dirty_ratio: the amount of dirty memory (expressed as a percentage of
+  cgroup memory) at which a process generating dirty pages will be throttled.
+  The default value is the system-wide dirty ratio, /proc/sys/vm/dirty_ratio.
+
+- memory.dirty_limit_in_bytes: the amount of dirty memory (expressed in bytes)
+  in the cgroup at which a process generating dirty pages will be throttled.
+  Suffix (k, K, m, M, g, or G) can be used to indicate that value is kilo, mega
+  or gigabytes.  The default value is the system-wide dirty limit,
+  /proc/sys/vm/dirty_bytes.
+
+  Note: memory.dirty_limit_in_bytes is the counterpart of memory.dirty_ratio.
+  Only one may be specified at a time.  When one is written it is immediately
+  taken into account to evaluate the dirty memory limits and the other appears
+  as 0 when read.
+
+- memory.dirty_background_ratio: the amount of dirty memory of the cgroup
+  (expressed as a percentage of cgroup memory) at which background writeback
+  kernel threads will start writing out dirty data.  The default value is the
+  system-wide background dirty ratio, /proc/sys/vm/dirty_background_ratio.
+
+- memory.dirty_background_limit_in_bytes: the amount of dirty memory (expressed
+  in bytes) in the cgroup at which background writeback kernel threads will
+  start writing out dirty data.  Suffix (k, K, m, M, g, or G) can be used to
+  indicate that value is kilo, mega or gigabytes.  The default value is the
+  system-wide dirty background limit, /proc/sys/vm/dirty_background_bytes.
+
+  Note: memory.dirty_background_limit_in_bytes is the counterpart of
+  memory.dirty_background_ratio.  Only one may be specified at a time.  When one
+  is written it is immediately taken into account to evaluate the dirty memory
+  limits and the other appears as 0 when read.
+
+A cgroup may contain more dirty memory than its dirty limit.  This is possible
+because of the principle that the first cgroup to touch a page is charged for
+it.  Subsequent page counting events (dirty, writeback, nfs_unstable) are also
+counted to the originally charged cgroup.  Example: If page is allocated by a
+cgroup A task, then the page is charged to cgroup A.  If the page is later
+dirtied by a task in cgroup B, then the cgroup A dirty count will be
+incremented.  If cgroup A is over its dirty limit but cgroup B is not, then
+dirtying a cgroup A page from a cgroup B task may push cgroup A over its dirty
+limit without throttling the dirtying cgroup B task.
+
+When use_hierarchy=0, each cgroup has independent dirty memory usage and limits.
+When use_hierarchy=1 the dirty limits of parent cgroups are also checked to
+ensure that no dirty limit is exceeded.
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.7.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	Greg Thelen <gthelen@google.com>
Subject: [PATCH v8 01/12] memcg: document cgroup dirty memory interfaces
Date: Fri,  3 Jun 2011 09:12:07 -0700	[thread overview]
Message-ID: <1307117538-14317-2-git-send-email-gthelen@google.com> (raw)
In-Reply-To: <1307117538-14317-1-git-send-email-gthelen@google.com>

Document cgroup dirty memory interfaces and statistics.

The implementation for these new interfaces routines comes in a series
of following patches.

Signed-off-by: Andrea Righi <arighi@develer.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 Documentation/cgroups/memory.txt |   70 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 43b9e46..15019a3 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -395,6 +395,10 @@ soft_direct_steal- # of pages reclaimed in global hierarchical reclaim from
 		direct reclaim
 soft_direct_scan- # of pages scanned in global hierarchical reclaim from
 		direct reclaim
+dirty		- # of bytes that are waiting to get written back to the disk.
+writeback	- # of bytes that are actively being written back to the disk.
+nfs_unstable	- # of bytes sent to the NFS server, but not yet committed to
+		the actual storage.
 inactive_anon	- # of bytes of anonymous memory and swap cache memory on
 		LRU list.
 active_anon	- # of bytes of anonymous and swap cache memory on active
@@ -420,6 +424,9 @@ total_soft_kswapd_steal	- sum of all children's "soft_kswapd_steal"
 total_soft_kswapd_scan	- sum of all children's "soft_kswapd_scan"
 total_soft_direct_steal	- sum of all children's "soft_direct_steal"
 total_soft_direct_scan	- sum of all children's "soft_direct_scan"
+total_dirty		- sum of all children's "dirty"
+total_writeback		- sum of all children's "writeback"
+total_nfs_unstable	- sum of all children's "nfs_unstable"
 total_inactive_anon	- sum of all children's "inactive_anon"
 total_active_anon	- sum of all children's "active_anon"
 total_inactive_file	- sum of all children's "inactive_file"
@@ -476,6 +483,69 @@ value for efficient access. (Of course, when necessary, it's synchronized.)
 If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
 value in memory.stat(see 5.2).
 
+5.6 dirty memory
+
+Control the maximum amount of dirty pages a cgroup can have at any given time.
+
+Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
+page cache used by a cgroup.  So, in case of multiple cgroup writers, they will
+not be able to consume more than their designated share of dirty pages and will
+be throttled if they cross that limit.  System-wide dirty limits are also
+consulted.  Dirty memory consumption is checked against both system-wide and
+per-cgroup dirty limits.
+
+The interface is similar to the procfs interface: /proc/sys/vm/dirty_*.  It is
+possible to configure a limit to trigger throttling of a dirtier or queue
+background writeback.  The root cgroup memory.dirty_* control files are
+read-only and match the contents of the /proc/sys/vm/dirty_* files.
+
+Per-cgroup dirty limits can be set using the following files in the cgroupfs:
+
+- memory.dirty_ratio: the amount of dirty memory (expressed as a percentage of
+  cgroup memory) at which a process generating dirty pages will be throttled.
+  The default value is the system-wide dirty ratio, /proc/sys/vm/dirty_ratio.
+
+- memory.dirty_limit_in_bytes: the amount of dirty memory (expressed in bytes)
+  in the cgroup at which a process generating dirty pages will be throttled.
+  Suffix (k, K, m, M, g, or G) can be used to indicate that value is kilo, mega
+  or gigabytes.  The default value is the system-wide dirty limit,
+  /proc/sys/vm/dirty_bytes.
+
+  Note: memory.dirty_limit_in_bytes is the counterpart of memory.dirty_ratio.
+  Only one may be specified at a time.  When one is written it is immediately
+  taken into account to evaluate the dirty memory limits and the other appears
+  as 0 when read.
+
+- memory.dirty_background_ratio: the amount of dirty memory of the cgroup
+  (expressed as a percentage of cgroup memory) at which background writeback
+  kernel threads will start writing out dirty data.  The default value is the
+  system-wide background dirty ratio, /proc/sys/vm/dirty_background_ratio.
+
+- memory.dirty_background_limit_in_bytes: the amount of dirty memory (expressed
+  in bytes) in the cgroup at which background writeback kernel threads will
+  start writing out dirty data.  Suffix (k, K, m, M, g, or G) can be used to
+  indicate that value is kilo, mega or gigabytes.  The default value is the
+  system-wide dirty background limit, /proc/sys/vm/dirty_background_bytes.
+
+  Note: memory.dirty_background_limit_in_bytes is the counterpart of
+  memory.dirty_background_ratio.  Only one may be specified at a time.  When one
+  is written it is immediately taken into account to evaluate the dirty memory
+  limits and the other appears as 0 when read.
+
+A cgroup may contain more dirty memory than its dirty limit.  This is possible
+because of the principle that the first cgroup to touch a page is charged for
+it.  Subsequent page counting events (dirty, writeback, nfs_unstable) are also
+counted to the originally charged cgroup.  Example: If page is allocated by a
+cgroup A task, then the page is charged to cgroup A.  If the page is later
+dirtied by a task in cgroup B, then the cgroup A dirty count will be
+incremented.  If cgroup A is over its dirty limit but cgroup B is not, then
+dirtying a cgroup A page from a cgroup B task may push cgroup A over its dirty
+limit without throttling the dirtying cgroup B task.
+
+When use_hierarchy=0, each cgroup has independent dirty memory usage and limits.
+When use_hierarchy=1 the dirty limits of parent cgroups are also checked to
+ensure that no dirty limit is exceeded.
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.7.3.1


  reply	other threads:[~2011-06-03 16:12 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-03 16:12 [PATCH v8 00/12] memcg: per cgroup dirty page accounting Greg Thelen
2011-06-03 16:12 ` Greg Thelen
2011-06-03 16:12 ` Greg Thelen [this message]
2011-06-03 16:12   ` [PATCH v8 01/12] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-06-04  9:54   ` Minchan Kim
2011-06-04  9:54     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 02/12] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04  9:56   ` Minchan Kim
2011-06-04  9:56     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 03/12] memcg: add mem_cgroup_mark_inode_dirty() Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-03 23:09   ` Andrea Righi
2011-06-03 23:09     ` Andrea Righi
2011-06-03 23:45     ` Greg Thelen
2011-06-03 23:45       ` Greg Thelen
2011-06-07  7:27   ` KAMEZAWA Hiroyuki
2011-06-07  7:27     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 04/12] memcg: add dirty page accounting infrastructure Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 10:11   ` Minchan Kim
2011-06-04 10:11     ` Minchan Kim
2011-06-07  7:28   ` KAMEZAWA Hiroyuki
2011-06-07  7:28     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 05/12] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 15:42   ` Minchan Kim
2011-06-04 15:42     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 06/12] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 15:57   ` Minchan Kim
2011-06-04 15:57     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 07/12] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 16:04   ` Minchan Kim
2011-06-04 16:04     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 08/12] memcg: dirty page accounting support routines Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-07  7:44   ` KAMEZAWA Hiroyuki
2011-06-07  7:44     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 09/12] memcg: create support routines for writeback Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  2:46   ` Minchan Kim
2011-06-05  2:46     ` Minchan Kim
2011-06-07  7:46   ` KAMEZAWA Hiroyuki
2011-06-07  7:46     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 10/12] memcg: create support routines for page-writeback Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  3:11   ` Minchan Kim
2011-06-05  3:11     ` Minchan Kim
2011-06-06 18:47     ` Greg Thelen
2011-06-06 18:47       ` Greg Thelen
2011-06-07  8:50   ` KAMEZAWA Hiroyuki
2011-06-07  8:50     ` KAMEZAWA Hiroyuki
2011-06-07 15:58     ` Greg Thelen
2011-06-07 15:58       ` Greg Thelen
2011-06-08  0:01       ` KAMEZAWA Hiroyuki
2011-06-08  0:01         ` KAMEZAWA Hiroyuki
2011-06-08  1:50         ` Greg Thelen
2011-06-08  1:50           ` Greg Thelen
2011-06-03 16:12 ` [PATCH v8 11/12] writeback: make background writeback cgroup aware Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  4:11   ` Minchan Kim
2011-06-05  4:11     ` Minchan Kim
2011-06-06 18:51     ` Greg Thelen
2011-06-06 18:51       ` Greg Thelen
2011-06-07  8:56   ` KAMEZAWA Hiroyuki
2011-06-07  8:56     ` KAMEZAWA Hiroyuki
2011-06-07 19:38   ` Vivek Goyal
2011-06-07 19:38     ` Vivek Goyal
2011-06-07 19:42     ` Vivek Goyal
2011-06-07 19:42       ` Vivek Goyal
2011-06-07 20:43     ` Greg Thelen
2011-06-07 20:43       ` Greg Thelen
2011-06-07 21:05       ` Vivek Goyal
2011-06-07 21:05         ` Vivek Goyal
2011-06-08  0:18         ` KAMEZAWA Hiroyuki
2011-06-08  0:18           ` KAMEZAWA Hiroyuki
2011-06-08  0:18           ` KAMEZAWA Hiroyuki
2011-06-08  4:02           ` Greg Thelen
2011-06-08  4:02             ` Greg Thelen
2011-06-08  4:02             ` Greg Thelen
2011-06-08  4:03             ` KAMEZAWA Hiroyuki
2011-06-08  4:03               ` KAMEZAWA Hiroyuki
2011-06-08  4:03               ` KAMEZAWA Hiroyuki
2011-06-08  5:20               ` Greg Thelen
2011-06-08  5:20                 ` Greg Thelen
2011-06-08 20:42               ` Vivek Goyal
2011-06-08 20:42                 ` Vivek Goyal
2011-06-08 20:42                 ` Vivek Goyal
2011-06-08 20:39             ` Vivek Goyal
2011-06-08 20:39               ` Vivek Goyal
2011-06-09 17:55               ` Greg Thelen
2011-06-09 17:55                 ` Greg Thelen
2011-06-09 21:26                 ` Vivek Goyal
2011-06-09 21:26                   ` Vivek Goyal
2011-06-09 21:26                   ` Vivek Goyal
2011-06-09 22:21                   ` Greg Thelen
2011-06-09 22:21                     ` Greg Thelen
2011-06-09 22:21                     ` Greg Thelen
2011-06-03 22:46 ` [PATCH v8 00/12] memcg: per cgroup dirty page accounting Hiroyuki Kamezawa
2011-06-03 22:46   ` Hiroyuki Kamezawa
2011-06-03 22:50   ` Greg Thelen
2011-06-03 22:50     ` Greg Thelen
2011-06-03 22:50     ` Greg Thelen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1307117538-14317-2-git-send-email-gthelen@google.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.