From: Greg Thelen <gthelen@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, containers@lists.osdl.org, Andrea Righi <arighi@develer.com>, Balbir Singh <balbir@linux.vnet.ibm.com>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>, Minchan Kim <minchan.kim@gmail.com>, Ciju Rajan K <ciju@linux.vnet.ibm.com>, David Rientjes <rientjes@google.com>, Wu Fengguang <fengguang.wu@intel.com>, Greg Thelen <gthelen@google.com> Subject: [PATCH v4 00/11] memcg: per cgroup dirty page accounting Date: Fri, 29 Oct 2010 00:09:03 -0700 [thread overview] Message-ID: <1288336154-23256-1-git-send-email-gthelen@google.com> (raw) Changes since v3: - Refactored balance_dirty_pages() dirtying checking to use new struct dirty_info, which is used to compare both system and memcg dirty limits against usage. - Disabled memcg dirty limits when memory.use_hierarchy=1. An enhancement is needed to check the chain of parents to ensure that no dirty limit is exceeded. - Ported to mmotm-2010-10-22-16-36. Changes since v2: - Rather than disabling softirq in lock_page_cgroup(), introduce a separate lock to synchronize between memcg page accounting and migration. This only affects patch 4 of the series. Patch 4 used to disable softirq, now it introduces the new lock. Changes since v1: - Renamed "nfs"/"total_nfs" to "nfs_unstable"/"total_nfs_unstable" in per cgroup memory.stat to match /proc/meminfo. - Avoid lockdep warnings by using rcu_read_[un]lock() in mem_cgroup_has_dirty_limit(). - Fixed lockdep issue in mem_cgroup_read_stat() which is exposed by these patches. - Remove redundant comments. - Rename (for clarity): - mem_cgroup_write_page_stat_item -> mem_cgroup_page_stat_item - mem_cgroup_read_page_stat_item -> mem_cgroup_nr_pages_item - Renamed newly created proc files: - memory.dirty_bytes -> memory.dirty_limit_in_bytes - memory.dirty_background_bytes -> memory.dirty_background_limit_in_bytes - Removed unnecessary get_ prefix from get_xxx() functions. - Allow [kKmMgG] suffixes for newly created dirty limit value cgroupfs files. - Disable softirq rather than hardirq in lock_page_cgroup() - Made mem_cgroup_move_account_page_stat() inline. - Ported patches to mmotm-2010-10-13-17-13. This patch set provides the ability for each cgroup to have independent dirty page limits. Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim) page cache used by a cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. The patches are based on a series proposed by Andrea Righi in Mar 2010. Overview: - Add page_cgroup flags to record when pages are dirty, in writeback, or nfs unstable. - Extend mem_cgroup to record the total number of pages in each of the interesting dirty states (dirty, writeback, unstable_nfs). - Add dirty parameters similar to the system-wide /proc/sys/vm/dirty_* limits to mem_cgroup. The mem_cgroup dirty parameters are accessible via cgroupfs control files. - Consider both system and per-memcg dirty limits in page writeback when deciding to queue background writeback or block for foreground writeback. Known shortcomings: - When a cgroup dirty limit is exceeded, then bdi writeback is employed to writeback dirty inodes. Bdi writeback considers inodes from any cgroup, not just inodes contributing dirty pages to the cgroup exceeding its limit. - When memory.use_hierarchy is set, then dirty limits are disabled. This is a implementation detail. An enhanced implementation is needed to check the chain of parents to ensure that no dirty limit is exceeded. Performance data: - A page fault microbenchmark workload was used to measure performance, which can be called in read or write mode: f = open(foo. $cpu) truncate(f, 4096) alarm(60) while (1) { p = mmap(f, 4096) if (write) *p = 1 else x = *p munmap(p) } - The workload was called for several points in the patch series in different modes: - s_read is a single threaded reader - s_write is a single threaded writer - p_read is a 16 thread reader, each operating on a different file - p_write is a 16 thread writer, each operating on a different file - Measurements were collected on a 16 core non-numa system using "perf stat --repeat 3". The -a option was used for parallel (p_*) runs. - All numbers are page fault rate (M/sec). Higher is better. - To compare the performance of a kernel without non-memcg compare the first and last rows, neither has memcg configured. The first row does not include any of these memcg patches. - To compare the performance of using memcg dirty limits, compare the baseline (2nd row titled "w/ memcg") with the the code and memcg enabled (2nd to last row titled "all patches"). root_cgroup child_cgroup s_read s_write p_read p_write s_read s_write p_read p_write mmotm w/o memcg 0.428 0.390 0.429 0.388 mmotm w/ memcg 0.411 0.378 0.391 0.362 0.412 0.377 0.385 0.363 all patches 0.384 0.360 0.370 0.348 0.381 0.363 0.368 0.347 all patches 0.431 0.402 0.427 0.395 w/o memcg Balbir Singh (1): memcg: CPU hotplug lockdep warning fix Greg Thelen (9): memcg: add page_cgroup flags for dirty page tracking memcg: document cgroup dirty memory interfaces memcg: create extensible page stat update routines writeback: create dirty_info structure memcg: add dirty page accounting infrastructure memcg: add kernel calls for memcg dirty page stats memcg: add dirty limits to mem_cgroup memcg: add cgroupfs interface to memcg dirty limits memcg: check memcg dirty limits in page writeback KAMEZAWA Hiroyuki (1): memcg: add lock to synchronize page accounting and migration Documentation/cgroups/memory.txt | 73 ++++++ fs/fs-writeback.c | 7 +- fs/nfs/write.c | 4 + include/linux/memcontrol.h | 64 +++++- include/linux/page_cgroup.h | 54 ++++- include/linux/writeback.h | 9 +- mm/backing-dev.c | 12 +- mm/filemap.c | 1 + mm/memcontrol.c | 477 ++++++++++++++++++++++++++++++++++++-- mm/page-writeback.c | 135 ++++++++---- mm/rmap.c | 4 +- mm/truncate.c | 1 + mm/vmstat.c | 6 +- 13 files changed, 764 insertions(+), 83 deletions(-) -- 1.7.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, containers@lists.osdl.org, Andrea Righi <arighi@develer.com>, Balbir Singh <balbir@linux.vnet.ibm.com>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>, Minchan Kim <minchan.kim@gmail.com>, Ciju Rajan K <ciju@linux.vnet.ibm.com>, David Rientjes <rientjes@google.com>, Wu Fengguang <fengguang.wu@intel.com>, Greg Thelen <gthelen@google.com> Subject: [PATCH v4 00/11] memcg: per cgroup dirty page accounting Date: Fri, 29 Oct 2010 00:09:03 -0700 [thread overview] Message-ID: <1288336154-23256-1-git-send-email-gthelen@google.com> (raw) Changes since v3: - Refactored balance_dirty_pages() dirtying checking to use new struct dirty_info, which is used to compare both system and memcg dirty limits against usage. - Disabled memcg dirty limits when memory.use_hierarchy=1. An enhancement is needed to check the chain of parents to ensure that no dirty limit is exceeded. - Ported to mmotm-2010-10-22-16-36. Changes since v2: - Rather than disabling softirq in lock_page_cgroup(), introduce a separate lock to synchronize between memcg page accounting and migration. This only affects patch 4 of the series. Patch 4 used to disable softirq, now it introduces the new lock. Changes since v1: - Renamed "nfs"/"total_nfs" to "nfs_unstable"/"total_nfs_unstable" in per cgroup memory.stat to match /proc/meminfo. - Avoid lockdep warnings by using rcu_read_[un]lock() in mem_cgroup_has_dirty_limit(). - Fixed lockdep issue in mem_cgroup_read_stat() which is exposed by these patches. - Remove redundant comments. - Rename (for clarity): - mem_cgroup_write_page_stat_item -> mem_cgroup_page_stat_item - mem_cgroup_read_page_stat_item -> mem_cgroup_nr_pages_item - Renamed newly created proc files: - memory.dirty_bytes -> memory.dirty_limit_in_bytes - memory.dirty_background_bytes -> memory.dirty_background_limit_in_bytes - Removed unnecessary get_ prefix from get_xxx() functions. - Allow [kKmMgG] suffixes for newly created dirty limit value cgroupfs files. - Disable softirq rather than hardirq in lock_page_cgroup() - Made mem_cgroup_move_account_page_stat() inline. - Ported patches to mmotm-2010-10-13-17-13. This patch set provides the ability for each cgroup to have independent dirty page limits. Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim) page cache used by a cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. The patches are based on a series proposed by Andrea Righi in Mar 2010. Overview: - Add page_cgroup flags to record when pages are dirty, in writeback, or nfs unstable. - Extend mem_cgroup to record the total number of pages in each of the interesting dirty states (dirty, writeback, unstable_nfs). - Add dirty parameters similar to the system-wide /proc/sys/vm/dirty_* limits to mem_cgroup. The mem_cgroup dirty parameters are accessible via cgroupfs control files. - Consider both system and per-memcg dirty limits in page writeback when deciding to queue background writeback or block for foreground writeback. Known shortcomings: - When a cgroup dirty limit is exceeded, then bdi writeback is employed to writeback dirty inodes. Bdi writeback considers inodes from any cgroup, not just inodes contributing dirty pages to the cgroup exceeding its limit. - When memory.use_hierarchy is set, then dirty limits are disabled. This is a implementation detail. An enhanced implementation is needed to check the chain of parents to ensure that no dirty limit is exceeded. Performance data: - A page fault microbenchmark workload was used to measure performance, which can be called in read or write mode: f = open(foo. $cpu) truncate(f, 4096) alarm(60) while (1) { p = mmap(f, 4096) if (write) *p = 1 else x = *p munmap(p) } - The workload was called for several points in the patch series in different modes: - s_read is a single threaded reader - s_write is a single threaded writer - p_read is a 16 thread reader, each operating on a different file - p_write is a 16 thread writer, each operating on a different file - Measurements were collected on a 16 core non-numa system using "perf stat --repeat 3". The -a option was used for parallel (p_*) runs. - All numbers are page fault rate (M/sec). Higher is better. - To compare the performance of a kernel without non-memcg compare the first and last rows, neither has memcg configured. The first row does not include any of these memcg patches. - To compare the performance of using memcg dirty limits, compare the baseline (2nd row titled "w/ memcg") with the the code and memcg enabled (2nd to last row titled "all patches"). root_cgroup child_cgroup s_read s_write p_read p_write s_read s_write p_read p_write mmotm w/o memcg 0.428 0.390 0.429 0.388 mmotm w/ memcg 0.411 0.378 0.391 0.362 0.412 0.377 0.385 0.363 all patches 0.384 0.360 0.370 0.348 0.381 0.363 0.368 0.347 all patches 0.431 0.402 0.427 0.395 w/o memcg Balbir Singh (1): memcg: CPU hotplug lockdep warning fix Greg Thelen (9): memcg: add page_cgroup flags for dirty page tracking memcg: document cgroup dirty memory interfaces memcg: create extensible page stat update routines writeback: create dirty_info structure memcg: add dirty page accounting infrastructure memcg: add kernel calls for memcg dirty page stats memcg: add dirty limits to mem_cgroup memcg: add cgroupfs interface to memcg dirty limits memcg: check memcg dirty limits in page writeback KAMEZAWA Hiroyuki (1): memcg: add lock to synchronize page accounting and migration Documentation/cgroups/memory.txt | 73 ++++++ fs/fs-writeback.c | 7 +- fs/nfs/write.c | 4 + include/linux/memcontrol.h | 64 +++++- include/linux/page_cgroup.h | 54 ++++- include/linux/writeback.h | 9 +- mm/backing-dev.c | 12 +- mm/filemap.c | 1 + mm/memcontrol.c | 477 ++++++++++++++++++++++++++++++++++++-- mm/page-writeback.c | 135 ++++++++---- mm/rmap.c | 4 +- mm/truncate.c | 1 + mm/vmstat.c | 6 +- 13 files changed, 764 insertions(+), 83 deletions(-) -- 1.7.3.1
next reply other threads:[~2010-10-29 7:09 UTC|newest] Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-10-29 7:09 Greg Thelen [this message] 2010-10-29 7:09 ` [PATCH v4 00/11] memcg: per cgroup dirty page accounting Greg Thelen 2010-10-29 7:09 ` [PATCH v4 01/11] memcg: add page_cgroup flags for dirty page tracking Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 7:09 ` [PATCH v4 02/11] memcg: document cgroup dirty memory interfaces Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 11:03 ` Wu Fengguang 2010-10-29 11:03 ` Wu Fengguang 2010-10-29 21:35 ` Greg Thelen 2010-10-29 21:35 ` Greg Thelen 2010-10-30 3:02 ` Wu Fengguang 2010-10-30 3:02 ` Wu Fengguang 2010-10-29 20:19 ` Andrew Morton 2010-10-29 20:19 ` Andrew Morton 2010-10-29 21:37 ` Greg Thelen 2010-10-29 21:37 ` Greg Thelen 2010-10-29 7:09 ` [PATCH v4 03/11] memcg: create extensible page stat update routines Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-31 14:48 ` Ciju Rajan K 2010-10-31 14:48 ` Ciju Rajan K 2010-10-31 20:11 ` Greg Thelen 2010-10-31 20:11 ` Greg Thelen 2010-11-01 20:16 ` Ciju Rajan K 2010-11-01 20:16 ` Ciju Rajan K 2010-11-02 19:35 ` Ciju Rajan K 2010-11-02 19:35 ` Ciju Rajan K 2010-10-29 7:09 ` [PATCH v4 04/11] memcg: add lock to synchronize page accounting and migration Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 7:09 ` [PATCH v4 05/11] writeback: create dirty_info structure Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 7:50 ` KAMEZAWA Hiroyuki 2010-10-29 7:50 ` KAMEZAWA Hiroyuki 2010-11-18 0:49 ` Andrew Morton 2010-11-18 0:49 ` Andrew Morton 2010-11-18 0:50 ` Andrew Morton 2010-11-18 0:50 ` Andrew Morton 2010-11-18 0:50 ` Andrew Morton 2010-11-18 2:02 ` Greg Thelen 2010-11-18 2:02 ` Greg Thelen 2010-10-29 7:09 ` [PATCH v4 06/11] memcg: add dirty page accounting infrastructure Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 11:13 ` Wu Fengguang 2010-10-29 11:13 ` Wu Fengguang 2010-10-29 11:17 ` KAMEZAWA Hiroyuki 2010-10-29 11:17 ` KAMEZAWA Hiroyuki 2010-10-29 7:09 ` [PATCH v4 07/11] memcg: add kernel calls for memcg dirty page stats Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 7:09 ` [PATCH v4 08/11] memcg: add dirty limits to mem_cgroup Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 7:41 ` KAMEZAWA Hiroyuki 2010-10-29 7:41 ` KAMEZAWA Hiroyuki 2010-10-29 16:00 ` Greg Thelen 2010-10-29 16:00 ` Greg Thelen 2010-10-29 7:09 ` [PATCH v4 09/11] memcg: CPU hotplug lockdep warning fix Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 20:19 ` Andrew Morton 2010-10-29 20:19 ` Andrew Morton 2010-10-29 7:09 ` [PATCH v4 10/11] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 7:43 ` KAMEZAWA Hiroyuki 2010-10-29 7:43 ` KAMEZAWA Hiroyuki 2010-10-29 7:09 ` [PATCH v4 11/11] memcg: check memcg dirty limits in page writeback Greg Thelen 2010-10-29 7:09 ` Greg Thelen 2010-10-29 7:48 ` KAMEZAWA Hiroyuki 2010-10-29 7:48 ` KAMEZAWA Hiroyuki 2010-10-29 16:06 ` Greg Thelen 2010-10-29 16:06 ` Greg Thelen 2010-10-31 20:03 ` Wu Fengguang 2010-10-31 20:03 ` Wu Fengguang 2010-10-29 20:19 ` [PATCH v4 00/11] memcg: per cgroup dirty page accounting Andrew Morton 2010-10-29 20:19 ` Andrew Morton 2010-10-30 21:46 ` Greg Thelen 2010-10-30 21:46 ` Greg Thelen 2010-11-02 19:33 ` Ciju Rajan K 2010-11-02 19:33 ` Ciju Rajan K
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1288336154-23256-1-git-send-email-gthelen@google.com \ --to=gthelen@google.com \ --cc=akpm@linux-foundation.org \ --cc=arighi@develer.com \ --cc=balbir@linux.vnet.ibm.com \ --cc=ciju@linux.vnet.ibm.com \ --cc=containers@lists.osdl.org \ --cc=fengguang.wu@intel.com \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=minchan.kim@gmail.com \ --cc=nishimura@mxp.nes.nec.co.jp \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.