All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] memcg: add memory.vmscan_stat
@ 2011-07-22  8:15 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-22  8:15 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, nishimura, Michal Hocko, akpm, abrestic

[PATCH] add memory.vmscan_stat

commit log of commit 0ae5e89 " memcg: count the soft_limit reclaim in..."
says it adds scanning stats to memory.stat file. But it doesn't because
we considered we needed to make a concensus for such new APIs.

This patch is a trial to add memory.scan_stat. This shows
  - the number of scanned pages(total, anon, file)
  - the number of rotated pages(total, anon, file)
  - the number of freed pages(total, anon, file)
  - the number of elaplsed time (including sleep/pause time)

  for both of direct/soft reclaim.

The biggest difference with oringinal Ying's one is that this file
can be reset by some write, as

  # echo 0 ...../memory.scan_stat

Example of output is here. This is a result after make -j 6 kernel
under 300M limit.

[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
scanned_pages_by_limit 9471864
scanned_anon_pages_by_limit 6640629
scanned_file_pages_by_limit 2831235
rotated_pages_by_limit 4243974
rotated_anon_pages_by_limit 3971968
rotated_file_pages_by_limit 272006
freed_pages_by_limit 2318492
freed_anon_pages_by_limit 962052
freed_file_pages_by_limit 1356440
elapsed_ns_by_limit 351386416101
scanned_pages_by_system 0
scanned_anon_pages_by_system 0
scanned_file_pages_by_system 0
rotated_pages_by_system 0
rotated_anon_pages_by_system 0
rotated_file_pages_by_system 0
freed_pages_by_system 0
freed_anon_pages_by_system 0
freed_file_pages_by_system 0
elapsed_ns_by_system 0
scanned_pages_by_limit_under_hierarchy 9471864
scanned_anon_pages_by_limit_under_hierarchy 6640629
scanned_file_pages_by_limit_under_hierarchy 2831235
rotated_pages_by_limit_under_hierarchy 4243974
rotated_anon_pages_by_limit_under_hierarchy 3971968
rotated_file_pages_by_limit_under_hierarchy 272006
freed_pages_by_limit_under_hierarchy 2318492
freed_anon_pages_by_limit_under_hierarchy 962052
freed_file_pages_by_limit_under_hierarchy 1356440
elapsed_ns_by_limit_under_hierarchy 351386416101
scanned_pages_by_system_under_hierarchy 0
scanned_anon_pages_by_system_under_hierarchy 0
scanned_file_pages_by_system_under_hierarchy 0
rotated_pages_by_system_under_hierarchy 0
rotated_anon_pages_by_system_under_hierarchy 0
rotated_file_pages_by_system_under_hierarchy 0
freed_pages_by_system_under_hierarchy 0
freed_anon_pages_by_system_under_hierarchy 0
freed_file_pages_by_system_under_hierarchy 0
elapsed_ns_by_system_under_hierarchy 0


total_xxxx is for hierarchy management.

This will be useful for further memcg developments and need to be
developped before we do some complicated rework on LRU/softlimit
management.

This patch adds a new struct memcg_scanrecord into scan_control struct.
sc->nr_scanned at el is not designed for exporting information. For example,
nr_scanned is reset frequentrly and incremented +2 at scanning mapped pages.

For avoiding complexity, I added a new param in scan_control which is for
exporting scanning score.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Changelog:
  - fixed the trigger for recording nr_freed in shrink_inactive_list()
Changelog:
  - renamed as vmscan_stat
  - handle file/anon
  - added "rotated"
  - changed names of param in vmscan_stat.
---
 Documentation/cgroups/memory.txt |   85 +++++++++++++++++++
 include/linux/memcontrol.h       |   19 ++++
 include/linux/swap.h             |    6 -
 mm/memcontrol.c                  |  172 +++++++++++++++++++++++++++++++++++++--
 mm/vmscan.c                      |   39 +++++++-
 5 files changed, 303 insertions(+), 18 deletions(-)

Index: mmotm-0710/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-0710.orig/Documentation/cgroups/memory.txt
+++ mmotm-0710/Documentation/cgroups/memory.txt
@@ -380,7 +380,7 @@ will be charged as a new owner of it.
 
 5.2 stat file
 
-memory.stat file includes following statistics
+5.2.1 memory.stat file includes following statistics
 
 # per-memory cgroup local status
 cache		- # of bytes of page cache memory.
@@ -438,6 +438,89 @@ Note:
 	 file_mapped is accounted only when the memory cgroup is owner of page
 	 cache.)
 
+5.2.2 memory.vmscan_stat
+
+memory.vmscan_stat includes statistics information for memory scanning and
+freeing, reclaiming. The statistics shows memory scanning information since
+memory cgroup creation and can be reset to 0 by writing 0 as
+
+ #echo 0 > ../memory.vmscan_stat
+
+This file contains following statistics.
+
+[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
+[param]_elapsed_ns_by_[reason]_[under_hierarchy]
+
+For example,
+
+  scanned_file_pages_by_limit indicates the number of scanned
+  file pages at vmscan.
+
+Now, 3 parameters are supported
+
+  scanned - the number of pages scanned by vmscan
+  rotated - the number of pages activated at vmscan
+  freed   - the number of pages freed by vmscan
+
+If "rotated" is high against scanned/freed, the memcg seems busy.
+
+Now, 2 reason are supported
+
+  limit - the memory cgroup's limit
+  system - global memory pressure + softlimit
+           (global memory pressure not under softlimit is not handled now)
+
+When under_hierarchy is added in the tail, the number indicates the
+total memcg scan of its children and itself.
+
+elapsed_ns is a elapsed time in nanosecond. This may include sleep time
+and not indicates CPU usage. So, please take this as just showing
+latency.
+
+Here is an example.
+
+# cat /cgroup/memory/A/memory.vmscan_stat
+scanned_pages_by_limit 9471864
+scanned_anon_pages_by_limit 6640629
+scanned_file_pages_by_limit 2831235
+rotated_pages_by_limit 4243974
+rotated_anon_pages_by_limit 3971968
+rotated_file_pages_by_limit 272006
+freed_pages_by_limit 2318492
+freed_anon_pages_by_limit 962052
+freed_file_pages_by_limit 1356440
+elapsed_ns_by_limit 351386416101
+scanned_pages_by_system 0
+scanned_anon_pages_by_system 0
+scanned_file_pages_by_system 0
+rotated_pages_by_system 0
+rotated_anon_pages_by_system 0
+rotated_file_pages_by_system 0
+freed_pages_by_system 0
+freed_anon_pages_by_system 0
+freed_file_pages_by_system 0
+elapsed_ns_by_system 0
+scanned_pages_by_limit_under_hierarchy 9471864
+scanned_anon_pages_by_limit_under_hierarchy 6640629
+scanned_file_pages_by_limit_under_hierarchy 2831235
+rotated_pages_by_limit_under_hierarchy 4243974
+rotated_anon_pages_by_limit_under_hierarchy 3971968
+rotated_file_pages_by_limit_under_hierarchy 272006
+freed_pages_by_limit_under_hierarchy 2318492
+freed_anon_pages_by_limit_under_hierarchy 962052
+freed_file_pages_by_limit_under_hierarchy 1356440
+elapsed_ns_by_limit_under_hierarchy 351386416101
+scanned_pages_by_system_under_hierarchy 0
+scanned_anon_pages_by_system_under_hierarchy 0
+scanned_file_pages_by_system_under_hierarchy 0
+rotated_pages_by_system_under_hierarchy 0
+rotated_anon_pages_by_system_under_hierarchy 0
+rotated_file_pages_by_system_under_hierarchy 0
+freed_pages_by_system_under_hierarchy 0
+freed_anon_pages_by_system_under_hierarchy 0
+freed_file_pages_by_system_under_hierarchy 0
+elapsed_ns_by_system_under_hierarchy 0
+
 5.3 swappiness
 
 Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
Index: mmotm-0710/include/linux/memcontrol.h
===================================================================
--- mmotm-0710.orig/include/linux/memcontrol.h
+++ mmotm-0710/include/linux/memcontrol.h
@@ -39,6 +39,16 @@ extern unsigned long mem_cgroup_isolate_
 					struct mem_cgroup *mem_cont,
 					int active, int file);
 
+struct memcg_scanrecord {
+	struct mem_cgroup *mem; /* scanend memory cgroup */
+	struct mem_cgroup *root; /* scan target hierarchy root */
+	int context;		/* scanning context (see memcontrol.c) */
+	unsigned long nr_scanned[2]; /* the number of scanned pages */
+	unsigned long nr_rotated[2]; /* the number of rotated pages */
+	unsigned long nr_freed[2]; /* the number of freed pages */
+	unsigned long elapsed; /* nsec of time elapsed while scanning */
+};
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 /*
  * All "charge" functions with gfp_mask should use GFP_KERNEL or
@@ -117,6 +127,15 @@ mem_cgroup_get_reclaim_stat_from_page(st
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 					struct task_struct *p);
 
+extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
+						  gfp_t gfp_mask, bool noswap,
+						  struct memcg_scanrecord *rec);
+extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
+						gfp_t gfp_mask, bool noswap,
+						struct zone *zone,
+						struct memcg_scanrecord *rec,
+						unsigned long *nr_scanned);
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
 #endif
Index: mmotm-0710/include/linux/swap.h
===================================================================
--- mmotm-0710.orig/include/linux/swap.h
+++ mmotm-0710/include/linux/swap.h
@@ -253,12 +253,6 @@ static inline void lru_cache_add_file(st
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
-extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
-						  gfp_t gfp_mask, bool noswap);
-extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						unsigned long *nr_scanned);
 extern int __isolate_lru_page(struct page *page, int mode, int file);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
Index: mmotm-0710/mm/memcontrol.c
===================================================================
--- mmotm-0710.orig/mm/memcontrol.c
+++ mmotm-0710/mm/memcontrol.c
@@ -204,6 +204,50 @@ struct mem_cgroup_eventfd_list {
 static void mem_cgroup_threshold(struct mem_cgroup *mem);
 static void mem_cgroup_oom_notify(struct mem_cgroup *mem);
 
+enum {
+	SCAN_BY_LIMIT,
+	SCAN_BY_SYSTEM,
+	NR_SCAN_CONTEXT,
+	SCAN_BY_SHRINK,	/* not recorded now */
+};
+
+enum {
+	SCAN,
+	SCAN_ANON,
+	SCAN_FILE,
+	ROTATE,
+	ROTATE_ANON,
+	ROTATE_FILE,
+	FREED,
+	FREED_ANON,
+	FREED_FILE,
+	ELAPSED,
+	NR_SCANSTATS,
+};
+
+struct scanstat {
+	spinlock_t	lock;
+	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
+	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
+};
+
+const char *scanstat_string[NR_SCANSTATS] = {
+	"scanned_pages",
+	"scanned_anon_pages",
+	"scanned_file_pages",
+	"rotated_pages",
+	"rotated_anon_pages",
+	"rotated_file_pages",
+	"freed_pages",
+	"freed_anon_pages",
+	"freed_file_pages",
+	"elapsed_ns",
+};
+#define SCANSTAT_WORD_LIMIT	"_by_limit"
+#define SCANSTAT_WORD_SYSTEM	"_by_system"
+#define SCANSTAT_WORD_HIERARCHY	"_under_hierarchy"
+
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -266,7 +310,8 @@ struct mem_cgroup {
 
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
-
+	/* For recording LRU-scan statistics */
+	struct scanstat scanstat;
 	/*
 	 * Should we move charges of a task when a task is moved into this
 	 * mem_cgroup ? And what type of charges should we move ?
@@ -1619,6 +1664,44 @@ bool mem_cgroup_reclaimable(struct mem_c
 }
 #endif
 
+static void __mem_cgroup_record_scanstat(unsigned long *stats,
+			   struct memcg_scanrecord *rec)
+{
+
+	stats[SCAN] += rec->nr_scanned[0] + rec->nr_scanned[1];
+	stats[SCAN_ANON] += rec->nr_scanned[0];
+	stats[SCAN_FILE] += rec->nr_scanned[1];
+
+	stats[ROTATE] += rec->nr_rotated[0] + rec->nr_rotated[1];
+	stats[ROTATE_ANON] += rec->nr_rotated[0];
+	stats[ROTATE_FILE] += rec->nr_rotated[1];
+
+	stats[FREED] += rec->nr_freed[0] + rec->nr_freed[1];
+	stats[FREED_ANON] += rec->nr_freed[0];
+	stats[FREED_FILE] += rec->nr_freed[1];
+
+	stats[ELAPSED] += rec->elapsed;
+}
+
+static void mem_cgroup_record_scanstat(struct memcg_scanrecord *rec)
+{
+	struct mem_cgroup *mem;
+	int context = rec->context;
+
+	if (context >= NR_SCAN_CONTEXT)
+		return;
+
+	mem = rec->mem;
+	spin_lock(&mem->scanstat.lock);
+	__mem_cgroup_record_scanstat(mem->scanstat.stats[context], rec);
+	spin_unlock(&mem->scanstat.lock);
+
+	mem = rec->root;
+	spin_lock(&mem->scanstat.lock);
+	__mem_cgroup_record_scanstat(mem->scanstat.rootstats[context], rec);
+	spin_unlock(&mem->scanstat.lock);
+}
+
 /*
  * Scan the hierarchy if needed to reclaim memory. We remember the last child
  * we reclaimed from, so that we don't end up penalizing one child extensively
@@ -1643,8 +1726,9 @@ static int mem_cgroup_hierarchical_recla
 	bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP;
 	bool shrink = reclaim_options & MEM_CGROUP_RECLAIM_SHRINK;
 	bool check_soft = reclaim_options & MEM_CGROUP_RECLAIM_SOFT;
+	struct memcg_scanrecord rec;
 	unsigned long excess;
-	unsigned long nr_scanned;
+	unsigned long scanned;
 
 	excess = res_counter_soft_limit_excess(&root_mem->res) >> PAGE_SHIFT;
 
@@ -1652,6 +1736,15 @@ static int mem_cgroup_hierarchical_recla
 	if (!check_soft && root_mem->memsw_is_minimum)
 		noswap = true;
 
+	if (shrink)
+		rec.context = SCAN_BY_SHRINK;
+	else if (check_soft)
+		rec.context = SCAN_BY_SYSTEM;
+	else
+		rec.context = SCAN_BY_LIMIT;
+
+	rec.root = root_mem;
+
 	while (1) {
 		victim = mem_cgroup_select_victim(root_mem);
 		if (victim == root_mem) {
@@ -1692,14 +1785,23 @@ static int mem_cgroup_hierarchical_recla
 			css_put(&victim->css);
 			continue;
 		}
+		rec.mem = victim;
+		rec.nr_scanned[0] = 0;
+		rec.nr_scanned[1] = 0;
+		rec.nr_rotated[0] = 0;
+		rec.nr_rotated[1] = 0;
+		rec.nr_freed[0] = 0;
+		rec.nr_freed[1] = 0;
+		rec.elapsed = 0;
 		/* we use swappiness of local cgroup */
 		if (check_soft) {
 			ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
-				noswap, zone, &nr_scanned);
-			*total_scanned += nr_scanned;
+				noswap, zone, &rec, &scanned);
+			*total_scanned += scanned;
 		} else
 			ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
-						noswap);
+						noswap, &rec);
+		mem_cgroup_record_scanstat(&rec);
 		css_put(&victim->css);
 		/*
 		 * At shrinking usage, we can't check we should stop here or
@@ -3688,14 +3790,18 @@ try_to_free:
 	/* try to free all pages in this cgroup */
 	shrink = 1;
 	while (nr_retries && mem->res.usage > 0) {
+		struct memcg_scanrecord rec;
 		int progress;
 
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			goto out;
 		}
+		rec.context = SCAN_BY_SHRINK;
+		rec.mem = mem;
+		rec.root = mem;
 		progress = try_to_free_mem_cgroup_pages(mem, GFP_KERNEL,
-						false);
+						false, &rec);
 		if (!progress) {
 			nr_retries--;
 			/* maybe some writeback is necessary */
@@ -4539,6 +4645,54 @@ static int mem_control_numa_stat_open(st
 }
 #endif /* CONFIG_NUMA */
 
+static int mem_cgroup_vmscan_stat_read(struct cgroup *cgrp,
+				struct cftype *cft,
+				struct cgroup_map_cb *cb)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+	char string[64];
+	int i;
+
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_LIMIT);
+		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_LIMIT][i]);
+	}
+
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_SYSTEM);
+		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_SYSTEM][i]);
+	}
+
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_LIMIT);
+		strcat(string, SCANSTAT_WORD_HIERARCHY);
+		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_LIMIT][i]);
+	}
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_SYSTEM);
+		strcat(string, SCANSTAT_WORD_HIERARCHY);
+		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
+	}
+	return 0;
+}
+
+static int mem_cgroup_reset_vmscan_stat(struct cgroup *cgrp,
+				unsigned int event)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+	spin_lock(&mem->scanstat.lock);
+	memset(&mem->scanstat.stats, 0, sizeof(mem->scanstat.stats));
+	memset(&mem->scanstat.rootstats, 0, sizeof(mem->scanstat.rootstats));
+	spin_unlock(&mem->scanstat.lock);
+	return 0;
+}
+
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -4609,6 +4763,11 @@ static struct cftype mem_cgroup_files[] 
 		.mode = S_IRUGO,
 	},
 #endif
+	{
+		.name = "vmscan_stat",
+		.read_map = mem_cgroup_vmscan_stat_read,
+		.trigger = mem_cgroup_reset_vmscan_stat,
+	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -4872,6 +5031,7 @@ mem_cgroup_create(struct cgroup_subsys *
 	atomic_set(&mem->refcnt, 1);
 	mem->move_charge_at_immigrate = 0;
 	mutex_init(&mem->thresholds_lock);
+	spin_lock_init(&mem->scanstat.lock);
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
Index: mmotm-0710/mm/vmscan.c
===================================================================
--- mmotm-0710.orig/mm/vmscan.c
+++ mmotm-0710/mm/vmscan.c
@@ -105,6 +105,7 @@ struct scan_control {
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
+	struct memcg_scanrecord *memcg_record;
 
 	/*
 	 * Nodemask of nodes allowed by the caller. If NULL, all nodes
@@ -1307,6 +1308,8 @@ putback_lru_pages(struct zone *zone, str
 			int file = is_file_lru(lru);
 			int numpages = hpage_nr_pages(page);
 			reclaim_stat->recent_rotated[file] += numpages;
+			if (!scanning_global_lru(sc))
+				sc->memcg_record->nr_rotated[file] += numpages;
 		}
 		if (!pagevec_add(&pvec, page)) {
 			spin_unlock_irq(&zone->lru_lock);
@@ -1350,6 +1353,10 @@ static noinline_for_stack void update_is
 
 	reclaim_stat->recent_scanned[0] += *nr_anon;
 	reclaim_stat->recent_scanned[1] += *nr_file;
+	if (!scanning_global_lru(sc)) {
+		sc->memcg_record->nr_scanned[0] += *nr_anon;
+		sc->memcg_record->nr_scanned[1] += *nr_file;
+	}
 }
 
 /*
@@ -1463,6 +1470,9 @@ shrink_inactive_list(unsigned long nr_to
 		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
 	}
 
+	if (!scanning_global_lru(sc))
+		sc->memcg_record->nr_freed[file] += nr_reclaimed;
+
 	local_irq_disable();
 	if (current_is_kswapd())
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
@@ -1562,6 +1572,8 @@ static void shrink_active_list(unsigned 
 	}
 
 	reclaim_stat->recent_scanned[file] += nr_taken;
+	if (!scanning_global_lru(sc))
+		sc->memcg_record->nr_scanned[file] += nr_taken;
 
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
@@ -1613,6 +1625,8 @@ static void shrink_active_list(unsigned 
 	 * get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
+	if (!scanning_global_lru(sc))
+		sc->memcg_record->nr_rotated[file] += nr_rotated;
 
 	move_active_pages_to_lru(zone, &l_active,
 						LRU_ACTIVE + file * LRU_FILE);
@@ -2207,9 +2229,10 @@ unsigned long try_to_free_pages(struct z
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
 unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						unsigned long *nr_scanned)
+					gfp_t gfp_mask, bool noswap,
+					struct zone *zone,
+					struct memcg_scanrecord *rec,
+					unsigned long *scanned)
 {
 	struct scan_control sc = {
 		.nr_scanned = 0,
@@ -2219,7 +2242,9 @@ unsigned long mem_cgroup_shrink_node_zon
 		.may_swap = !noswap,
 		.order = 0,
 		.mem_cgroup = mem,
+		.memcg_record = rec,
 	};
+	unsigned long start, end;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -2228,6 +2253,7 @@ unsigned long mem_cgroup_shrink_node_zon
 						      sc.may_writepage,
 						      sc.gfp_mask);
 
+	start = sched_clock();
 	/*
 	 * NOTE: Although we can get the priority field, using it
 	 * here is not a good idea, since it limits the pages we can scan.
@@ -2236,19 +2262,25 @@ unsigned long mem_cgroup_shrink_node_zon
 	 * the priority and make it zero.
 	 */
 	shrink_zone(0, zone, &sc);
+	end = sched_clock();
+
+	if (rec)
+		rec->elapsed += end - start;
+	*scanned = sc.nr_scanned;
 
 	trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
 
-	*nr_scanned = sc.nr_scanned;
 	return sc.nr_reclaimed;
 }
 
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 					   gfp_t gfp_mask,
-					   bool noswap)
+					   bool noswap,
+					   struct memcg_scanrecord *rec)
 {
 	struct zonelist *zonelist;
 	unsigned long nr_reclaimed;
+	unsigned long start, end;
 	int nid;
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
@@ -2257,6 +2289,7 @@ unsigned long try_to_free_mem_cgroup_pag
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.order = 0,
 		.mem_cgroup = mem_cont,
+		.memcg_record = rec,
 		.nodemask = NULL, /* we don't care the placement */
 		.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 				(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK),
@@ -2265,6 +2298,7 @@ unsigned long try_to_free_mem_cgroup_pag
 		.gfp_mask = sc.gfp_mask,
 	};
 
+	start = sched_clock();
 	/*
 	 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
 	 * take care of from where we get pages. So the node where we start the
@@ -2279,6 +2313,9 @@ unsigned long try_to_free_mem_cgroup_pag
 					    sc.gfp_mask);
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc, &shrink);
+	end = sched_clock();
+	if (rec)
+		rec->elapsed += end - start;
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3] memcg: add memory.vmscan_stat
@ 2011-07-22  8:15 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-22  8:15 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, nishimura, Michal Hocko, akpm, abrestic

[PATCH] add memory.vmscan_stat

commit log of commit 0ae5e89 " memcg: count the soft_limit reclaim in..."
says it adds scanning stats to memory.stat file. But it doesn't because
we considered we needed to make a concensus for such new APIs.

This patch is a trial to add memory.scan_stat. This shows
  - the number of scanned pages(total, anon, file)
  - the number of rotated pages(total, anon, file)
  - the number of freed pages(total, anon, file)
  - the number of elaplsed time (including sleep/pause time)

  for both of direct/soft reclaim.

The biggest difference with oringinal Ying's one is that this file
can be reset by some write, as

  # echo 0 ...../memory.scan_stat

Example of output is here. This is a result after make -j 6 kernel
under 300M limit.

[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
scanned_pages_by_limit 9471864
scanned_anon_pages_by_limit 6640629
scanned_file_pages_by_limit 2831235
rotated_pages_by_limit 4243974
rotated_anon_pages_by_limit 3971968
rotated_file_pages_by_limit 272006
freed_pages_by_limit 2318492
freed_anon_pages_by_limit 962052
freed_file_pages_by_limit 1356440
elapsed_ns_by_limit 351386416101
scanned_pages_by_system 0
scanned_anon_pages_by_system 0
scanned_file_pages_by_system 0
rotated_pages_by_system 0
rotated_anon_pages_by_system 0
rotated_file_pages_by_system 0
freed_pages_by_system 0
freed_anon_pages_by_system 0
freed_file_pages_by_system 0
elapsed_ns_by_system 0
scanned_pages_by_limit_under_hierarchy 9471864
scanned_anon_pages_by_limit_under_hierarchy 6640629
scanned_file_pages_by_limit_under_hierarchy 2831235
rotated_pages_by_limit_under_hierarchy 4243974
rotated_anon_pages_by_limit_under_hierarchy 3971968
rotated_file_pages_by_limit_under_hierarchy 272006
freed_pages_by_limit_under_hierarchy 2318492
freed_anon_pages_by_limit_under_hierarchy 962052
freed_file_pages_by_limit_under_hierarchy 1356440
elapsed_ns_by_limit_under_hierarchy 351386416101
scanned_pages_by_system_under_hierarchy 0
scanned_anon_pages_by_system_under_hierarchy 0
scanned_file_pages_by_system_under_hierarchy 0
rotated_pages_by_system_under_hierarchy 0
rotated_anon_pages_by_system_under_hierarchy 0
rotated_file_pages_by_system_under_hierarchy 0
freed_pages_by_system_under_hierarchy 0
freed_anon_pages_by_system_under_hierarchy 0
freed_file_pages_by_system_under_hierarchy 0
elapsed_ns_by_system_under_hierarchy 0


total_xxxx is for hierarchy management.

This will be useful for further memcg developments and need to be
developped before we do some complicated rework on LRU/softlimit
management.

This patch adds a new struct memcg_scanrecord into scan_control struct.
sc->nr_scanned at el is not designed for exporting information. For example,
nr_scanned is reset frequentrly and incremented +2 at scanning mapped pages.

For avoiding complexity, I added a new param in scan_control which is for
exporting scanning score.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Changelog:
  - fixed the trigger for recording nr_freed in shrink_inactive_list()
Changelog:
  - renamed as vmscan_stat
  - handle file/anon
  - added "rotated"
  - changed names of param in vmscan_stat.
---
 Documentation/cgroups/memory.txt |   85 +++++++++++++++++++
 include/linux/memcontrol.h       |   19 ++++
 include/linux/swap.h             |    6 -
 mm/memcontrol.c                  |  172 +++++++++++++++++++++++++++++++++++++--
 mm/vmscan.c                      |   39 +++++++-
 5 files changed, 303 insertions(+), 18 deletions(-)

Index: mmotm-0710/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-0710.orig/Documentation/cgroups/memory.txt
+++ mmotm-0710/Documentation/cgroups/memory.txt
@@ -380,7 +380,7 @@ will be charged as a new owner of it.
 
 5.2 stat file
 
-memory.stat file includes following statistics
+5.2.1 memory.stat file includes following statistics
 
 # per-memory cgroup local status
 cache		- # of bytes of page cache memory.
@@ -438,6 +438,89 @@ Note:
 	 file_mapped is accounted only when the memory cgroup is owner of page
 	 cache.)
 
+5.2.2 memory.vmscan_stat
+
+memory.vmscan_stat includes statistics information for memory scanning and
+freeing, reclaiming. The statistics shows memory scanning information since
+memory cgroup creation and can be reset to 0 by writing 0 as
+
+ #echo 0 > ../memory.vmscan_stat
+
+This file contains following statistics.
+
+[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
+[param]_elapsed_ns_by_[reason]_[under_hierarchy]
+
+For example,
+
+  scanned_file_pages_by_limit indicates the number of scanned
+  file pages at vmscan.
+
+Now, 3 parameters are supported
+
+  scanned - the number of pages scanned by vmscan
+  rotated - the number of pages activated at vmscan
+  freed   - the number of pages freed by vmscan
+
+If "rotated" is high against scanned/freed, the memcg seems busy.
+
+Now, 2 reason are supported
+
+  limit - the memory cgroup's limit
+  system - global memory pressure + softlimit
+           (global memory pressure not under softlimit is not handled now)
+
+When under_hierarchy is added in the tail, the number indicates the
+total memcg scan of its children and itself.
+
+elapsed_ns is a elapsed time in nanosecond. This may include sleep time
+and not indicates CPU usage. So, please take this as just showing
+latency.
+
+Here is an example.
+
+# cat /cgroup/memory/A/memory.vmscan_stat
+scanned_pages_by_limit 9471864
+scanned_anon_pages_by_limit 6640629
+scanned_file_pages_by_limit 2831235
+rotated_pages_by_limit 4243974
+rotated_anon_pages_by_limit 3971968
+rotated_file_pages_by_limit 272006
+freed_pages_by_limit 2318492
+freed_anon_pages_by_limit 962052
+freed_file_pages_by_limit 1356440
+elapsed_ns_by_limit 351386416101
+scanned_pages_by_system 0
+scanned_anon_pages_by_system 0
+scanned_file_pages_by_system 0
+rotated_pages_by_system 0
+rotated_anon_pages_by_system 0
+rotated_file_pages_by_system 0
+freed_pages_by_system 0
+freed_anon_pages_by_system 0
+freed_file_pages_by_system 0
+elapsed_ns_by_system 0
+scanned_pages_by_limit_under_hierarchy 9471864
+scanned_anon_pages_by_limit_under_hierarchy 6640629
+scanned_file_pages_by_limit_under_hierarchy 2831235
+rotated_pages_by_limit_under_hierarchy 4243974
+rotated_anon_pages_by_limit_under_hierarchy 3971968
+rotated_file_pages_by_limit_under_hierarchy 272006
+freed_pages_by_limit_under_hierarchy 2318492
+freed_anon_pages_by_limit_under_hierarchy 962052
+freed_file_pages_by_limit_under_hierarchy 1356440
+elapsed_ns_by_limit_under_hierarchy 351386416101
+scanned_pages_by_system_under_hierarchy 0
+scanned_anon_pages_by_system_under_hierarchy 0
+scanned_file_pages_by_system_under_hierarchy 0
+rotated_pages_by_system_under_hierarchy 0
+rotated_anon_pages_by_system_under_hierarchy 0
+rotated_file_pages_by_system_under_hierarchy 0
+freed_pages_by_system_under_hierarchy 0
+freed_anon_pages_by_system_under_hierarchy 0
+freed_file_pages_by_system_under_hierarchy 0
+elapsed_ns_by_system_under_hierarchy 0
+
 5.3 swappiness
 
 Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
Index: mmotm-0710/include/linux/memcontrol.h
===================================================================
--- mmotm-0710.orig/include/linux/memcontrol.h
+++ mmotm-0710/include/linux/memcontrol.h
@@ -39,6 +39,16 @@ extern unsigned long mem_cgroup_isolate_
 					struct mem_cgroup *mem_cont,
 					int active, int file);
 
+struct memcg_scanrecord {
+	struct mem_cgroup *mem; /* scanend memory cgroup */
+	struct mem_cgroup *root; /* scan target hierarchy root */
+	int context;		/* scanning context (see memcontrol.c) */
+	unsigned long nr_scanned[2]; /* the number of scanned pages */
+	unsigned long nr_rotated[2]; /* the number of rotated pages */
+	unsigned long nr_freed[2]; /* the number of freed pages */
+	unsigned long elapsed; /* nsec of time elapsed while scanning */
+};
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 /*
  * All "charge" functions with gfp_mask should use GFP_KERNEL or
@@ -117,6 +127,15 @@ mem_cgroup_get_reclaim_stat_from_page(st
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 					struct task_struct *p);
 
+extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
+						  gfp_t gfp_mask, bool noswap,
+						  struct memcg_scanrecord *rec);
+extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
+						gfp_t gfp_mask, bool noswap,
+						struct zone *zone,
+						struct memcg_scanrecord *rec,
+						unsigned long *nr_scanned);
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
 #endif
Index: mmotm-0710/include/linux/swap.h
===================================================================
--- mmotm-0710.orig/include/linux/swap.h
+++ mmotm-0710/include/linux/swap.h
@@ -253,12 +253,6 @@ static inline void lru_cache_add_file(st
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
-extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
-						  gfp_t gfp_mask, bool noswap);
-extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						unsigned long *nr_scanned);
 extern int __isolate_lru_page(struct page *page, int mode, int file);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
Index: mmotm-0710/mm/memcontrol.c
===================================================================
--- mmotm-0710.orig/mm/memcontrol.c
+++ mmotm-0710/mm/memcontrol.c
@@ -204,6 +204,50 @@ struct mem_cgroup_eventfd_list {
 static void mem_cgroup_threshold(struct mem_cgroup *mem);
 static void mem_cgroup_oom_notify(struct mem_cgroup *mem);
 
+enum {
+	SCAN_BY_LIMIT,
+	SCAN_BY_SYSTEM,
+	NR_SCAN_CONTEXT,
+	SCAN_BY_SHRINK,	/* not recorded now */
+};
+
+enum {
+	SCAN,
+	SCAN_ANON,
+	SCAN_FILE,
+	ROTATE,
+	ROTATE_ANON,
+	ROTATE_FILE,
+	FREED,
+	FREED_ANON,
+	FREED_FILE,
+	ELAPSED,
+	NR_SCANSTATS,
+};
+
+struct scanstat {
+	spinlock_t	lock;
+	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
+	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
+};
+
+const char *scanstat_string[NR_SCANSTATS] = {
+	"scanned_pages",
+	"scanned_anon_pages",
+	"scanned_file_pages",
+	"rotated_pages",
+	"rotated_anon_pages",
+	"rotated_file_pages",
+	"freed_pages",
+	"freed_anon_pages",
+	"freed_file_pages",
+	"elapsed_ns",
+};
+#define SCANSTAT_WORD_LIMIT	"_by_limit"
+#define SCANSTAT_WORD_SYSTEM	"_by_system"
+#define SCANSTAT_WORD_HIERARCHY	"_under_hierarchy"
+
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -266,7 +310,8 @@ struct mem_cgroup {
 
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
-
+	/* For recording LRU-scan statistics */
+	struct scanstat scanstat;
 	/*
 	 * Should we move charges of a task when a task is moved into this
 	 * mem_cgroup ? And what type of charges should we move ?
@@ -1619,6 +1664,44 @@ bool mem_cgroup_reclaimable(struct mem_c
 }
 #endif
 
+static void __mem_cgroup_record_scanstat(unsigned long *stats,
+			   struct memcg_scanrecord *rec)
+{
+
+	stats[SCAN] += rec->nr_scanned[0] + rec->nr_scanned[1];
+	stats[SCAN_ANON] += rec->nr_scanned[0];
+	stats[SCAN_FILE] += rec->nr_scanned[1];
+
+	stats[ROTATE] += rec->nr_rotated[0] + rec->nr_rotated[1];
+	stats[ROTATE_ANON] += rec->nr_rotated[0];
+	stats[ROTATE_FILE] += rec->nr_rotated[1];
+
+	stats[FREED] += rec->nr_freed[0] + rec->nr_freed[1];
+	stats[FREED_ANON] += rec->nr_freed[0];
+	stats[FREED_FILE] += rec->nr_freed[1];
+
+	stats[ELAPSED] += rec->elapsed;
+}
+
+static void mem_cgroup_record_scanstat(struct memcg_scanrecord *rec)
+{
+	struct mem_cgroup *mem;
+	int context = rec->context;
+
+	if (context >= NR_SCAN_CONTEXT)
+		return;
+
+	mem = rec->mem;
+	spin_lock(&mem->scanstat.lock);
+	__mem_cgroup_record_scanstat(mem->scanstat.stats[context], rec);
+	spin_unlock(&mem->scanstat.lock);
+
+	mem = rec->root;
+	spin_lock(&mem->scanstat.lock);
+	__mem_cgroup_record_scanstat(mem->scanstat.rootstats[context], rec);
+	spin_unlock(&mem->scanstat.lock);
+}
+
 /*
  * Scan the hierarchy if needed to reclaim memory. We remember the last child
  * we reclaimed from, so that we don't end up penalizing one child extensively
@@ -1643,8 +1726,9 @@ static int mem_cgroup_hierarchical_recla
 	bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP;
 	bool shrink = reclaim_options & MEM_CGROUP_RECLAIM_SHRINK;
 	bool check_soft = reclaim_options & MEM_CGROUP_RECLAIM_SOFT;
+	struct memcg_scanrecord rec;
 	unsigned long excess;
-	unsigned long nr_scanned;
+	unsigned long scanned;
 
 	excess = res_counter_soft_limit_excess(&root_mem->res) >> PAGE_SHIFT;
 
@@ -1652,6 +1736,15 @@ static int mem_cgroup_hierarchical_recla
 	if (!check_soft && root_mem->memsw_is_minimum)
 		noswap = true;
 
+	if (shrink)
+		rec.context = SCAN_BY_SHRINK;
+	else if (check_soft)
+		rec.context = SCAN_BY_SYSTEM;
+	else
+		rec.context = SCAN_BY_LIMIT;
+
+	rec.root = root_mem;
+
 	while (1) {
 		victim = mem_cgroup_select_victim(root_mem);
 		if (victim == root_mem) {
@@ -1692,14 +1785,23 @@ static int mem_cgroup_hierarchical_recla
 			css_put(&victim->css);
 			continue;
 		}
+		rec.mem = victim;
+		rec.nr_scanned[0] = 0;
+		rec.nr_scanned[1] = 0;
+		rec.nr_rotated[0] = 0;
+		rec.nr_rotated[1] = 0;
+		rec.nr_freed[0] = 0;
+		rec.nr_freed[1] = 0;
+		rec.elapsed = 0;
 		/* we use swappiness of local cgroup */
 		if (check_soft) {
 			ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
-				noswap, zone, &nr_scanned);
-			*total_scanned += nr_scanned;
+				noswap, zone, &rec, &scanned);
+			*total_scanned += scanned;
 		} else
 			ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
-						noswap);
+						noswap, &rec);
+		mem_cgroup_record_scanstat(&rec);
 		css_put(&victim->css);
 		/*
 		 * At shrinking usage, we can't check we should stop here or
@@ -3688,14 +3790,18 @@ try_to_free:
 	/* try to free all pages in this cgroup */
 	shrink = 1;
 	while (nr_retries && mem->res.usage > 0) {
+		struct memcg_scanrecord rec;
 		int progress;
 
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			goto out;
 		}
+		rec.context = SCAN_BY_SHRINK;
+		rec.mem = mem;
+		rec.root = mem;
 		progress = try_to_free_mem_cgroup_pages(mem, GFP_KERNEL,
-						false);
+						false, &rec);
 		if (!progress) {
 			nr_retries--;
 			/* maybe some writeback is necessary */
@@ -4539,6 +4645,54 @@ static int mem_control_numa_stat_open(st
 }
 #endif /* CONFIG_NUMA */
 
+static int mem_cgroup_vmscan_stat_read(struct cgroup *cgrp,
+				struct cftype *cft,
+				struct cgroup_map_cb *cb)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+	char string[64];
+	int i;
+
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_LIMIT);
+		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_LIMIT][i]);
+	}
+
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_SYSTEM);
+		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_SYSTEM][i]);
+	}
+
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_LIMIT);
+		strcat(string, SCANSTAT_WORD_HIERARCHY);
+		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_LIMIT][i]);
+	}
+	for (i = 0; i < NR_SCANSTATS; i++) {
+		strcpy(string, scanstat_string[i]);
+		strcat(string, SCANSTAT_WORD_SYSTEM);
+		strcat(string, SCANSTAT_WORD_HIERARCHY);
+		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
+	}
+	return 0;
+}
+
+static int mem_cgroup_reset_vmscan_stat(struct cgroup *cgrp,
+				unsigned int event)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+	spin_lock(&mem->scanstat.lock);
+	memset(&mem->scanstat.stats, 0, sizeof(mem->scanstat.stats));
+	memset(&mem->scanstat.rootstats, 0, sizeof(mem->scanstat.rootstats));
+	spin_unlock(&mem->scanstat.lock);
+	return 0;
+}
+
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -4609,6 +4763,11 @@ static struct cftype mem_cgroup_files[] 
 		.mode = S_IRUGO,
 	},
 #endif
+	{
+		.name = "vmscan_stat",
+		.read_map = mem_cgroup_vmscan_stat_read,
+		.trigger = mem_cgroup_reset_vmscan_stat,
+	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -4872,6 +5031,7 @@ mem_cgroup_create(struct cgroup_subsys *
 	atomic_set(&mem->refcnt, 1);
 	mem->move_charge_at_immigrate = 0;
 	mutex_init(&mem->thresholds_lock);
+	spin_lock_init(&mem->scanstat.lock);
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
Index: mmotm-0710/mm/vmscan.c
===================================================================
--- mmotm-0710.orig/mm/vmscan.c
+++ mmotm-0710/mm/vmscan.c
@@ -105,6 +105,7 @@ struct scan_control {
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
+	struct memcg_scanrecord *memcg_record;
 
 	/*
 	 * Nodemask of nodes allowed by the caller. If NULL, all nodes
@@ -1307,6 +1308,8 @@ putback_lru_pages(struct zone *zone, str
 			int file = is_file_lru(lru);
 			int numpages = hpage_nr_pages(page);
 			reclaim_stat->recent_rotated[file] += numpages;
+			if (!scanning_global_lru(sc))
+				sc->memcg_record->nr_rotated[file] += numpages;
 		}
 		if (!pagevec_add(&pvec, page)) {
 			spin_unlock_irq(&zone->lru_lock);
@@ -1350,6 +1353,10 @@ static noinline_for_stack void update_is
 
 	reclaim_stat->recent_scanned[0] += *nr_anon;
 	reclaim_stat->recent_scanned[1] += *nr_file;
+	if (!scanning_global_lru(sc)) {
+		sc->memcg_record->nr_scanned[0] += *nr_anon;
+		sc->memcg_record->nr_scanned[1] += *nr_file;
+	}
 }
 
 /*
@@ -1463,6 +1470,9 @@ shrink_inactive_list(unsigned long nr_to
 		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
 	}
 
+	if (!scanning_global_lru(sc))
+		sc->memcg_record->nr_freed[file] += nr_reclaimed;
+
 	local_irq_disable();
 	if (current_is_kswapd())
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
@@ -1562,6 +1572,8 @@ static void shrink_active_list(unsigned 
 	}
 
 	reclaim_stat->recent_scanned[file] += nr_taken;
+	if (!scanning_global_lru(sc))
+		sc->memcg_record->nr_scanned[file] += nr_taken;
 
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
@@ -1613,6 +1625,8 @@ static void shrink_active_list(unsigned 
 	 * get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
+	if (!scanning_global_lru(sc))
+		sc->memcg_record->nr_rotated[file] += nr_rotated;
 
 	move_active_pages_to_lru(zone, &l_active,
 						LRU_ACTIVE + file * LRU_FILE);
@@ -2207,9 +2229,10 @@ unsigned long try_to_free_pages(struct z
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
 unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						unsigned long *nr_scanned)
+					gfp_t gfp_mask, bool noswap,
+					struct zone *zone,
+					struct memcg_scanrecord *rec,
+					unsigned long *scanned)
 {
 	struct scan_control sc = {
 		.nr_scanned = 0,
@@ -2219,7 +2242,9 @@ unsigned long mem_cgroup_shrink_node_zon
 		.may_swap = !noswap,
 		.order = 0,
 		.mem_cgroup = mem,
+		.memcg_record = rec,
 	};
+	unsigned long start, end;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -2228,6 +2253,7 @@ unsigned long mem_cgroup_shrink_node_zon
 						      sc.may_writepage,
 						      sc.gfp_mask);
 
+	start = sched_clock();
 	/*
 	 * NOTE: Although we can get the priority field, using it
 	 * here is not a good idea, since it limits the pages we can scan.
@@ -2236,19 +2262,25 @@ unsigned long mem_cgroup_shrink_node_zon
 	 * the priority and make it zero.
 	 */
 	shrink_zone(0, zone, &sc);
+	end = sched_clock();
+
+	if (rec)
+		rec->elapsed += end - start;
+	*scanned = sc.nr_scanned;
 
 	trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
 
-	*nr_scanned = sc.nr_scanned;
 	return sc.nr_reclaimed;
 }
 
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 					   gfp_t gfp_mask,
-					   bool noswap)
+					   bool noswap,
+					   struct memcg_scanrecord *rec)
 {
 	struct zonelist *zonelist;
 	unsigned long nr_reclaimed;
+	unsigned long start, end;
 	int nid;
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
@@ -2257,6 +2289,7 @@ unsigned long try_to_free_mem_cgroup_pag
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.order = 0,
 		.mem_cgroup = mem_cont,
+		.memcg_record = rec,
 		.nodemask = NULL, /* we don't care the placement */
 		.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 				(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK),
@@ -2265,6 +2298,7 @@ unsigned long try_to_free_mem_cgroup_pag
 		.gfp_mask = sc.gfp_mask,
 	};
 
+	start = sched_clock();
 	/*
 	 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
 	 * take care of from where we get pages. So the node where we start the
@@ -2279,6 +2313,9 @@ unsigned long try_to_free_mem_cgroup_pag
 					    sc.gfp_mask);
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc, &shrink);
+	end = sched_clock();
+	if (rec)
+		rec->elapsed += end - start;
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
  2011-07-22  8:15 ` KAMEZAWA Hiroyuki
@ 2011-08-08 12:43   ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-08 12:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> [PATCH] add memory.vmscan_stat
> 
> commit log of commit 0ae5e89 " memcg: count the soft_limit reclaim in..."
> says it adds scanning stats to memory.stat file. But it doesn't because
> we considered we needed to make a concensus for such new APIs.
> 
> This patch is a trial to add memory.scan_stat. This shows
>   - the number of scanned pages(total, anon, file)
>   - the number of rotated pages(total, anon, file)
>   - the number of freed pages(total, anon, file)
>   - the number of elaplsed time (including sleep/pause time)
> 
>   for both of direct/soft reclaim.
> 
> The biggest difference with oringinal Ying's one is that this file
> can be reset by some write, as
> 
>   # echo 0 ...../memory.scan_stat
> 
> Example of output is here. This is a result after make -j 6 kernel
> under 300M limit.
> 
> [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
> [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
> scanned_pages_by_limit 9471864
> scanned_anon_pages_by_limit 6640629
> scanned_file_pages_by_limit 2831235
> rotated_pages_by_limit 4243974
> rotated_anon_pages_by_limit 3971968
> rotated_file_pages_by_limit 272006
> freed_pages_by_limit 2318492
> freed_anon_pages_by_limit 962052
> freed_file_pages_by_limit 1356440
> elapsed_ns_by_limit 351386416101
> scanned_pages_by_system 0
> scanned_anon_pages_by_system 0
> scanned_file_pages_by_system 0
> rotated_pages_by_system 0
> rotated_anon_pages_by_system 0
> rotated_file_pages_by_system 0
> freed_pages_by_system 0
> freed_anon_pages_by_system 0
> freed_file_pages_by_system 0
> elapsed_ns_by_system 0
> scanned_pages_by_limit_under_hierarchy 9471864
> scanned_anon_pages_by_limit_under_hierarchy 6640629
> scanned_file_pages_by_limit_under_hierarchy 2831235
> rotated_pages_by_limit_under_hierarchy 4243974
> rotated_anon_pages_by_limit_under_hierarchy 3971968
> rotated_file_pages_by_limit_under_hierarchy 272006
> freed_pages_by_limit_under_hierarchy 2318492
> freed_anon_pages_by_limit_under_hierarchy 962052
> freed_file_pages_by_limit_under_hierarchy 1356440
> elapsed_ns_by_limit_under_hierarchy 351386416101
> scanned_pages_by_system_under_hierarchy 0
> scanned_anon_pages_by_system_under_hierarchy 0
> scanned_file_pages_by_system_under_hierarchy 0
> rotated_pages_by_system_under_hierarchy 0
> rotated_anon_pages_by_system_under_hierarchy 0
> rotated_file_pages_by_system_under_hierarchy 0
> freed_pages_by_system_under_hierarchy 0
> freed_anon_pages_by_system_under_hierarchy 0
> freed_file_pages_by_system_under_hierarchy 0
> elapsed_ns_by_system_under_hierarchy 0
>
> total_xxxx is for hierarchy management.
> 
> This will be useful for further memcg developments and need to be
> developped before we do some complicated rework on LRU/softlimit
> management.
> 
> This patch adds a new struct memcg_scanrecord into scan_control struct.
> sc->nr_scanned at el is not designed for exporting information. For example,
> nr_scanned is reset frequentrly and incremented +2 at scanning mapped pages.
> 
> For avoiding complexity, I added a new param in scan_control which is for
> exporting scanning score.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Changelog:
>   - fixed the trigger for recording nr_freed in shrink_inactive_list()
> Changelog:
>   - renamed as vmscan_stat
>   - handle file/anon
>   - added "rotated"
>   - changed names of param in vmscan_stat.
> ---
>  Documentation/cgroups/memory.txt |   85 +++++++++++++++++++
>  include/linux/memcontrol.h       |   19 ++++
>  include/linux/swap.h             |    6 -
>  mm/memcontrol.c                  |  172 +++++++++++++++++++++++++++++++++++++--
>  mm/vmscan.c                      |   39 +++++++-
>  5 files changed, 303 insertions(+), 18 deletions(-)
> 
> Index: mmotm-0710/Documentation/cgroups/memory.txt
> ===================================================================
> --- mmotm-0710.orig/Documentation/cgroups/memory.txt
> +++ mmotm-0710/Documentation/cgroups/memory.txt
> @@ -380,7 +380,7 @@ will be charged as a new owner of it.
>  
>  5.2 stat file
>  
> -memory.stat file includes following statistics
> +5.2.1 memory.stat file includes following statistics
>  
>  # per-memory cgroup local status
>  cache		- # of bytes of page cache memory.
> @@ -438,6 +438,89 @@ Note:
>  	 file_mapped is accounted only when the memory cgroup is owner of page
>  	 cache.)
>  
> +5.2.2 memory.vmscan_stat
> +
> +memory.vmscan_stat includes statistics information for memory scanning and
> +freeing, reclaiming. The statistics shows memory scanning information since
> +memory cgroup creation and can be reset to 0 by writing 0 as
> +
> + #echo 0 > ../memory.vmscan_stat
> +
> +This file contains following statistics.
> +
> +[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
> +[param]_elapsed_ns_by_[reason]_[under_hierarchy]
> +
> +For example,
> +
> +  scanned_file_pages_by_limit indicates the number of scanned
> +  file pages at vmscan.
> +
> +Now, 3 parameters are supported
> +
> +  scanned - the number of pages scanned by vmscan
> +  rotated - the number of pages activated at vmscan
> +  freed   - the number of pages freed by vmscan
> +
> +If "rotated" is high against scanned/freed, the memcg seems busy.
> +
> +Now, 2 reason are supported
> +
> +  limit - the memory cgroup's limit
> +  system - global memory pressure + softlimit
> +           (global memory pressure not under softlimit is not handled now)
> +
> +When under_hierarchy is added in the tail, the number indicates the
> +total memcg scan of its children and itself.

In your implementation, statistics are only accounted to the memcg
triggering the limit and the respectively scanned memcgs.

Consider the following setup:

	A
       / \
      B   C
     /
    D

If D tries to charge but hits the limit of A, then B's hierarchy
counters do not reflect the reclaim activity resulting in D.

That's not consistent with how hierarchy counters usually operate, and
neither with how you documented it.

On a non-technical note: as Ying Han and I were the other two people
working on reclaim and statistics, it really irks me that neither of
us were CCd on this.  Especially on such a controversial change.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
@ 2011-08-08 12:43   ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-08 12:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> [PATCH] add memory.vmscan_stat
> 
> commit log of commit 0ae5e89 " memcg: count the soft_limit reclaim in..."
> says it adds scanning stats to memory.stat file. But it doesn't because
> we considered we needed to make a concensus for such new APIs.
> 
> This patch is a trial to add memory.scan_stat. This shows
>   - the number of scanned pages(total, anon, file)
>   - the number of rotated pages(total, anon, file)
>   - the number of freed pages(total, anon, file)
>   - the number of elaplsed time (including sleep/pause time)
> 
>   for both of direct/soft reclaim.
> 
> The biggest difference with oringinal Ying's one is that this file
> can be reset by some write, as
> 
>   # echo 0 ...../memory.scan_stat
> 
> Example of output is here. This is a result after make -j 6 kernel
> under 300M limit.
> 
> [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
> [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
> scanned_pages_by_limit 9471864
> scanned_anon_pages_by_limit 6640629
> scanned_file_pages_by_limit 2831235
> rotated_pages_by_limit 4243974
> rotated_anon_pages_by_limit 3971968
> rotated_file_pages_by_limit 272006
> freed_pages_by_limit 2318492
> freed_anon_pages_by_limit 962052
> freed_file_pages_by_limit 1356440
> elapsed_ns_by_limit 351386416101
> scanned_pages_by_system 0
> scanned_anon_pages_by_system 0
> scanned_file_pages_by_system 0
> rotated_pages_by_system 0
> rotated_anon_pages_by_system 0
> rotated_file_pages_by_system 0
> freed_pages_by_system 0
> freed_anon_pages_by_system 0
> freed_file_pages_by_system 0
> elapsed_ns_by_system 0
> scanned_pages_by_limit_under_hierarchy 9471864
> scanned_anon_pages_by_limit_under_hierarchy 6640629
> scanned_file_pages_by_limit_under_hierarchy 2831235
> rotated_pages_by_limit_under_hierarchy 4243974
> rotated_anon_pages_by_limit_under_hierarchy 3971968
> rotated_file_pages_by_limit_under_hierarchy 272006
> freed_pages_by_limit_under_hierarchy 2318492
> freed_anon_pages_by_limit_under_hierarchy 962052
> freed_file_pages_by_limit_under_hierarchy 1356440
> elapsed_ns_by_limit_under_hierarchy 351386416101
> scanned_pages_by_system_under_hierarchy 0
> scanned_anon_pages_by_system_under_hierarchy 0
> scanned_file_pages_by_system_under_hierarchy 0
> rotated_pages_by_system_under_hierarchy 0
> rotated_anon_pages_by_system_under_hierarchy 0
> rotated_file_pages_by_system_under_hierarchy 0
> freed_pages_by_system_under_hierarchy 0
> freed_anon_pages_by_system_under_hierarchy 0
> freed_file_pages_by_system_under_hierarchy 0
> elapsed_ns_by_system_under_hierarchy 0
>
> total_xxxx is for hierarchy management.
> 
> This will be useful for further memcg developments and need to be
> developped before we do some complicated rework on LRU/softlimit
> management.
> 
> This patch adds a new struct memcg_scanrecord into scan_control struct.
> sc->nr_scanned at el is not designed for exporting information. For example,
> nr_scanned is reset frequentrly and incremented +2 at scanning mapped pages.
> 
> For avoiding complexity, I added a new param in scan_control which is for
> exporting scanning score.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Changelog:
>   - fixed the trigger for recording nr_freed in shrink_inactive_list()
> Changelog:
>   - renamed as vmscan_stat
>   - handle file/anon
>   - added "rotated"
>   - changed names of param in vmscan_stat.
> ---
>  Documentation/cgroups/memory.txt |   85 +++++++++++++++++++
>  include/linux/memcontrol.h       |   19 ++++
>  include/linux/swap.h             |    6 -
>  mm/memcontrol.c                  |  172 +++++++++++++++++++++++++++++++++++++--
>  mm/vmscan.c                      |   39 +++++++-
>  5 files changed, 303 insertions(+), 18 deletions(-)
> 
> Index: mmotm-0710/Documentation/cgroups/memory.txt
> ===================================================================
> --- mmotm-0710.orig/Documentation/cgroups/memory.txt
> +++ mmotm-0710/Documentation/cgroups/memory.txt
> @@ -380,7 +380,7 @@ will be charged as a new owner of it.
>  
>  5.2 stat file
>  
> -memory.stat file includes following statistics
> +5.2.1 memory.stat file includes following statistics
>  
>  # per-memory cgroup local status
>  cache		- # of bytes of page cache memory.
> @@ -438,6 +438,89 @@ Note:
>  	 file_mapped is accounted only when the memory cgroup is owner of page
>  	 cache.)
>  
> +5.2.2 memory.vmscan_stat
> +
> +memory.vmscan_stat includes statistics information for memory scanning and
> +freeing, reclaiming. The statistics shows memory scanning information since
> +memory cgroup creation and can be reset to 0 by writing 0 as
> +
> + #echo 0 > ../memory.vmscan_stat
> +
> +This file contains following statistics.
> +
> +[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
> +[param]_elapsed_ns_by_[reason]_[under_hierarchy]
> +
> +For example,
> +
> +  scanned_file_pages_by_limit indicates the number of scanned
> +  file pages at vmscan.
> +
> +Now, 3 parameters are supported
> +
> +  scanned - the number of pages scanned by vmscan
> +  rotated - the number of pages activated at vmscan
> +  freed   - the number of pages freed by vmscan
> +
> +If "rotated" is high against scanned/freed, the memcg seems busy.
> +
> +Now, 2 reason are supported
> +
> +  limit - the memory cgroup's limit
> +  system - global memory pressure + softlimit
> +           (global memory pressure not under softlimit is not handled now)
> +
> +When under_hierarchy is added in the tail, the number indicates the
> +total memcg scan of its children and itself.

In your implementation, statistics are only accounted to the memcg
triggering the limit and the respectively scanned memcgs.

Consider the following setup:

	A
       / \
      B   C
     /
    D

If D tries to charge but hits the limit of A, then B's hierarchy
counters do not reflect the reclaim activity resulting in D.

That's not consistent with how hierarchy counters usually operate, and
neither with how you documented it.

On a non-technical note: as Ying Han and I were the other two people
working on reclaim and statistics, it really irks me that neither of
us were CCd on this.  Especially on such a controversial change.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
  2011-08-08 12:43   ` Johannes Weiner
@ 2011-08-08 23:33     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-08 23:33 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Mon, 8 Aug 2011 14:43:33 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > [PATCH] add memory.vmscan_stat
> > 
> > commit log of commit 0ae5e89 " memcg: count the soft_limit reclaim in..."
> > says it adds scanning stats to memory.stat file. But it doesn't because
> > we considered we needed to make a concensus for such new APIs.
> > 
> > This patch is a trial to add memory.scan_stat. This shows
> >   - the number of scanned pages(total, anon, file)
> >   - the number of rotated pages(total, anon, file)
> >   - the number of freed pages(total, anon, file)
> >   - the number of elaplsed time (including sleep/pause time)
> > 
> >   for both of direct/soft reclaim.
> > 
> > The biggest difference with oringinal Ying's one is that this file
> > can be reset by some write, as
> > 
> >   # echo 0 ...../memory.scan_stat
> > 
> > Example of output is here. This is a result after make -j 6 kernel
> > under 300M limit.
> > 
> > [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
> > [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
> > scanned_pages_by_limit 9471864
> > scanned_anon_pages_by_limit 6640629
> > scanned_file_pages_by_limit 2831235
> > rotated_pages_by_limit 4243974
> > rotated_anon_pages_by_limit 3971968
> > rotated_file_pages_by_limit 272006
> > freed_pages_by_limit 2318492
> > freed_anon_pages_by_limit 962052
> > freed_file_pages_by_limit 1356440
> > elapsed_ns_by_limit 351386416101
> > scanned_pages_by_system 0
> > scanned_anon_pages_by_system 0
> > scanned_file_pages_by_system 0
> > rotated_pages_by_system 0
> > rotated_anon_pages_by_system 0
> > rotated_file_pages_by_system 0
> > freed_pages_by_system 0
> > freed_anon_pages_by_system 0
> > freed_file_pages_by_system 0
> > elapsed_ns_by_system 0
> > scanned_pages_by_limit_under_hierarchy 9471864
> > scanned_anon_pages_by_limit_under_hierarchy 6640629
> > scanned_file_pages_by_limit_under_hierarchy 2831235
> > rotated_pages_by_limit_under_hierarchy 4243974
> > rotated_anon_pages_by_limit_under_hierarchy 3971968
> > rotated_file_pages_by_limit_under_hierarchy 272006
> > freed_pages_by_limit_under_hierarchy 2318492
> > freed_anon_pages_by_limit_under_hierarchy 962052
> > freed_file_pages_by_limit_under_hierarchy 1356440
> > elapsed_ns_by_limit_under_hierarchy 351386416101
> > scanned_pages_by_system_under_hierarchy 0
> > scanned_anon_pages_by_system_under_hierarchy 0
> > scanned_file_pages_by_system_under_hierarchy 0
> > rotated_pages_by_system_under_hierarchy 0
> > rotated_anon_pages_by_system_under_hierarchy 0
> > rotated_file_pages_by_system_under_hierarchy 0
> > freed_pages_by_system_under_hierarchy 0
> > freed_anon_pages_by_system_under_hierarchy 0
> > freed_file_pages_by_system_under_hierarchy 0
> > elapsed_ns_by_system_under_hierarchy 0
> >
> > total_xxxx is for hierarchy management.
> > 
> > This will be useful for further memcg developments and need to be
> > developped before we do some complicated rework on LRU/softlimit
> > management.
> > 
> > This patch adds a new struct memcg_scanrecord into scan_control struct.
> > sc->nr_scanned at el is not designed for exporting information. For example,
> > nr_scanned is reset frequentrly and incremented +2 at scanning mapped pages.
> > 
> > For avoiding complexity, I added a new param in scan_control which is for
> > exporting scanning score.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Changelog:
> >   - fixed the trigger for recording nr_freed in shrink_inactive_list()
> > Changelog:
> >   - renamed as vmscan_stat
> >   - handle file/anon
> >   - added "rotated"
> >   - changed names of param in vmscan_stat.
> > ---
> >  Documentation/cgroups/memory.txt |   85 +++++++++++++++++++
> >  include/linux/memcontrol.h       |   19 ++++
> >  include/linux/swap.h             |    6 -
> >  mm/memcontrol.c                  |  172 +++++++++++++++++++++++++++++++++++++--
> >  mm/vmscan.c                      |   39 +++++++-
> >  5 files changed, 303 insertions(+), 18 deletions(-)
> > 
> > Index: mmotm-0710/Documentation/cgroups/memory.txt
> > ===================================================================
> > --- mmotm-0710.orig/Documentation/cgroups/memory.txt
> > +++ mmotm-0710/Documentation/cgroups/memory.txt
> > @@ -380,7 +380,7 @@ will be charged as a new owner of it.
> >  
> >  5.2 stat file
> >  
> > -memory.stat file includes following statistics
> > +5.2.1 memory.stat file includes following statistics
> >  
> >  # per-memory cgroup local status
> >  cache		- # of bytes of page cache memory.
> > @@ -438,6 +438,89 @@ Note:
> >  	 file_mapped is accounted only when the memory cgroup is owner of page
> >  	 cache.)
> >  
> > +5.2.2 memory.vmscan_stat
> > +
> > +memory.vmscan_stat includes statistics information for memory scanning and
> > +freeing, reclaiming. The statistics shows memory scanning information since
> > +memory cgroup creation and can be reset to 0 by writing 0 as
> > +
> > + #echo 0 > ../memory.vmscan_stat
> > +
> > +This file contains following statistics.
> > +
> > +[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
> > +[param]_elapsed_ns_by_[reason]_[under_hierarchy]
> > +
> > +For example,
> > +
> > +  scanned_file_pages_by_limit indicates the number of scanned
> > +  file pages at vmscan.
> > +
> > +Now, 3 parameters are supported
> > +
> > +  scanned - the number of pages scanned by vmscan
> > +  rotated - the number of pages activated at vmscan
> > +  freed   - the number of pages freed by vmscan
> > +
> > +If "rotated" is high against scanned/freed, the memcg seems busy.
> > +
> > +Now, 2 reason are supported
> > +
> > +  limit - the memory cgroup's limit
> > +  system - global memory pressure + softlimit
> > +           (global memory pressure not under softlimit is not handled now)
> > +
> > +When under_hierarchy is added in the tail, the number indicates the
> > +total memcg scan of its children and itself.
> 
> In your implementation, statistics are only accounted to the memcg
> triggering the limit and the respectively scanned memcgs.
> 
> Consider the following setup:
> 
> 	A
>        / \
>       B   C
>      /
>     D
> 
> If D tries to charge but hits the limit of A, then B's hierarchy
> counters do not reflect the reclaim activity resulting in D.
> 
yes, as I expected.

> That's not consistent with how hierarchy counters usually operate, and
> neither with how you documented it.
> 
Hmm.

> On a non-technical note: as Ying Han and I were the other two people
> working on reclaim and statistics, it really irks me that neither of
> us were CCd on this.  Especially on such a controversial change.

I always drop CC if no reply/review comes.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
@ 2011-08-08 23:33     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-08 23:33 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Mon, 8 Aug 2011 14:43:33 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > [PATCH] add memory.vmscan_stat
> > 
> > commit log of commit 0ae5e89 " memcg: count the soft_limit reclaim in..."
> > says it adds scanning stats to memory.stat file. But it doesn't because
> > we considered we needed to make a concensus for such new APIs.
> > 
> > This patch is a trial to add memory.scan_stat. This shows
> >   - the number of scanned pages(total, anon, file)
> >   - the number of rotated pages(total, anon, file)
> >   - the number of freed pages(total, anon, file)
> >   - the number of elaplsed time (including sleep/pause time)
> > 
> >   for both of direct/soft reclaim.
> > 
> > The biggest difference with oringinal Ying's one is that this file
> > can be reset by some write, as
> > 
> >   # echo 0 ...../memory.scan_stat
> > 
> > Example of output is here. This is a result after make -j 6 kernel
> > under 300M limit.
> > 
> > [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
> > [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
> > scanned_pages_by_limit 9471864
> > scanned_anon_pages_by_limit 6640629
> > scanned_file_pages_by_limit 2831235
> > rotated_pages_by_limit 4243974
> > rotated_anon_pages_by_limit 3971968
> > rotated_file_pages_by_limit 272006
> > freed_pages_by_limit 2318492
> > freed_anon_pages_by_limit 962052
> > freed_file_pages_by_limit 1356440
> > elapsed_ns_by_limit 351386416101
> > scanned_pages_by_system 0
> > scanned_anon_pages_by_system 0
> > scanned_file_pages_by_system 0
> > rotated_pages_by_system 0
> > rotated_anon_pages_by_system 0
> > rotated_file_pages_by_system 0
> > freed_pages_by_system 0
> > freed_anon_pages_by_system 0
> > freed_file_pages_by_system 0
> > elapsed_ns_by_system 0
> > scanned_pages_by_limit_under_hierarchy 9471864
> > scanned_anon_pages_by_limit_under_hierarchy 6640629
> > scanned_file_pages_by_limit_under_hierarchy 2831235
> > rotated_pages_by_limit_under_hierarchy 4243974
> > rotated_anon_pages_by_limit_under_hierarchy 3971968
> > rotated_file_pages_by_limit_under_hierarchy 272006
> > freed_pages_by_limit_under_hierarchy 2318492
> > freed_anon_pages_by_limit_under_hierarchy 962052
> > freed_file_pages_by_limit_under_hierarchy 1356440
> > elapsed_ns_by_limit_under_hierarchy 351386416101
> > scanned_pages_by_system_under_hierarchy 0
> > scanned_anon_pages_by_system_under_hierarchy 0
> > scanned_file_pages_by_system_under_hierarchy 0
> > rotated_pages_by_system_under_hierarchy 0
> > rotated_anon_pages_by_system_under_hierarchy 0
> > rotated_file_pages_by_system_under_hierarchy 0
> > freed_pages_by_system_under_hierarchy 0
> > freed_anon_pages_by_system_under_hierarchy 0
> > freed_file_pages_by_system_under_hierarchy 0
> > elapsed_ns_by_system_under_hierarchy 0
> >
> > total_xxxx is for hierarchy management.
> > 
> > This will be useful for further memcg developments and need to be
> > developped before we do some complicated rework on LRU/softlimit
> > management.
> > 
> > This patch adds a new struct memcg_scanrecord into scan_control struct.
> > sc->nr_scanned at el is not designed for exporting information. For example,
> > nr_scanned is reset frequentrly and incremented +2 at scanning mapped pages.
> > 
> > For avoiding complexity, I added a new param in scan_control which is for
> > exporting scanning score.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Changelog:
> >   - fixed the trigger for recording nr_freed in shrink_inactive_list()
> > Changelog:
> >   - renamed as vmscan_stat
> >   - handle file/anon
> >   - added "rotated"
> >   - changed names of param in vmscan_stat.
> > ---
> >  Documentation/cgroups/memory.txt |   85 +++++++++++++++++++
> >  include/linux/memcontrol.h       |   19 ++++
> >  include/linux/swap.h             |    6 -
> >  mm/memcontrol.c                  |  172 +++++++++++++++++++++++++++++++++++++--
> >  mm/vmscan.c                      |   39 +++++++-
> >  5 files changed, 303 insertions(+), 18 deletions(-)
> > 
> > Index: mmotm-0710/Documentation/cgroups/memory.txt
> > ===================================================================
> > --- mmotm-0710.orig/Documentation/cgroups/memory.txt
> > +++ mmotm-0710/Documentation/cgroups/memory.txt
> > @@ -380,7 +380,7 @@ will be charged as a new owner of it.
> >  
> >  5.2 stat file
> >  
> > -memory.stat file includes following statistics
> > +5.2.1 memory.stat file includes following statistics
> >  
> >  # per-memory cgroup local status
> >  cache		- # of bytes of page cache memory.
> > @@ -438,6 +438,89 @@ Note:
> >  	 file_mapped is accounted only when the memory cgroup is owner of page
> >  	 cache.)
> >  
> > +5.2.2 memory.vmscan_stat
> > +
> > +memory.vmscan_stat includes statistics information for memory scanning and
> > +freeing, reclaiming. The statistics shows memory scanning information since
> > +memory cgroup creation and can be reset to 0 by writing 0 as
> > +
> > + #echo 0 > ../memory.vmscan_stat
> > +
> > +This file contains following statistics.
> > +
> > +[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
> > +[param]_elapsed_ns_by_[reason]_[under_hierarchy]
> > +
> > +For example,
> > +
> > +  scanned_file_pages_by_limit indicates the number of scanned
> > +  file pages at vmscan.
> > +
> > +Now, 3 parameters are supported
> > +
> > +  scanned - the number of pages scanned by vmscan
> > +  rotated - the number of pages activated at vmscan
> > +  freed   - the number of pages freed by vmscan
> > +
> > +If "rotated" is high against scanned/freed, the memcg seems busy.
> > +
> > +Now, 2 reason are supported
> > +
> > +  limit - the memory cgroup's limit
> > +  system - global memory pressure + softlimit
> > +           (global memory pressure not under softlimit is not handled now)
> > +
> > +When under_hierarchy is added in the tail, the number indicates the
> > +total memcg scan of its children and itself.
> 
> In your implementation, statistics are only accounted to the memcg
> triggering the limit and the respectively scanned memcgs.
> 
> Consider the following setup:
> 
> 	A
>        / \
>       B   C
>      /
>     D
> 
> If D tries to charge but hits the limit of A, then B's hierarchy
> counters do not reflect the reclaim activity resulting in D.
> 
yes, as I expected.

> That's not consistent with how hierarchy counters usually operate, and
> neither with how you documented it.
> 
Hmm.

> On a non-technical note: as Ying Han and I were the other two people
> working on reclaim and statistics, it really irks me that neither of
> us were CCd on this.  Especially on such a controversial change.

I always drop CC if no reply/review comes.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
  2011-08-09  8:01       ` Johannes Weiner
@ 2011-08-09  8:01         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-09  8:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Tue, 9 Aug 2011 10:01:59 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 8 Aug 2011 14:43:33 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > > On a non-technical note: as Ying Han and I were the other two people
> > > working on reclaim and statistics, it really irks me that neither of
> > > us were CCd on this.  Especially on such a controversial change.
> > 
> > I always drop CC if no reply/review comes.
> 
> There is always the possibility that a single mail in an otherwise
> unrelated patch series is overlooked (especially while on vacation ;).
> Getting CCd on revisions and -mm inclusion is a really nice reminder.
> 
> Unless there is a really good reason not to (is there ever?), could
> you please keep CCs?
> 

Ok, if you want, I'll CC always.
I myself just don't like to get 3 copies of mails when I don't have
much interests ;)

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
@ 2011-08-09  8:01         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-09  8:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Tue, 9 Aug 2011 10:01:59 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 8 Aug 2011 14:43:33 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > > On a non-technical note: as Ying Han and I were the other two people
> > > working on reclaim and statistics, it really irks me that neither of
> > > us were CCd on this.  Especially on such a controversial change.
> > 
> > I always drop CC if no reply/review comes.
> 
> There is always the possibility that a single mail in an otherwise
> unrelated patch series is overlooked (especially while on vacation ;).
> Getting CCd on revisions and -mm inclusion is a really nice reminder.
> 
> Unless there is a really good reason not to (is there ever?), could
> you please keep CCs?
> 

Ok, if you want, I'll CC always.
I myself just don't like to get 3 copies of mails when I don't have
much interests ;)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
  2011-08-08 23:33     ` KAMEZAWA Hiroyuki
@ 2011-08-09  8:01       ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-09  8:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 8 Aug 2011 14:43:33 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> > On a non-technical note: as Ying Han and I were the other two people
> > working on reclaim and statistics, it really irks me that neither of
> > us were CCd on this.  Especially on such a controversial change.
> 
> I always drop CC if no reply/review comes.

There is always the possibility that a single mail in an otherwise
unrelated patch series is overlooked (especially while on vacation ;).
Getting CCd on revisions and -mm inclusion is a really nice reminder.

Unless there is a really good reason not to (is there ever?), could
you please keep CCs?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
@ 2011-08-09  8:01       ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-09  8:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, Michal Hocko, akpm, abrestic

On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 8 Aug 2011 14:43:33 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> > On a non-technical note: as Ying Han and I were the other two people
> > working on reclaim and statistics, it really irks me that neither of
> > us were CCd on this.  Especially on such a controversial change.
> 
> I always drop CC if no reply/review comes.

There is always the possibility that a single mail in an otherwise
unrelated patch series is overlooked (especially while on vacation ;).
Getting CCd on revisions and -mm inclusion is a really nice reminder.

Unless there is a really good reason not to (is there ever?), could
you please keep CCs?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
  2011-08-09  8:01         ` KAMEZAWA Hiroyuki
@ 2011-08-13  1:04           ` Ying Han
  -1 siblings, 0 replies; 54+ messages in thread
From: Ying Han @ 2011-08-13  1:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, linux-mm, linux-kernel, nishimura, Michal Hocko,
	akpm, abrestic

On Tue, Aug 9, 2011 at 1:01 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> On Tue, 9 Aug 2011 10:01:59 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
>
> > On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 8 Aug 2011 14:43:33 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > On a non-technical note: as Ying Han and I were the other two people
> > > > working on reclaim and statistics, it really irks me that neither of
> > > > us were CCd on this.  Especially on such a controversial change.
> > >
> > > I always drop CC if no reply/review comes.
> >
> > There is always the possibility that a single mail in an otherwise
> > unrelated patch series is overlooked (especially while on vacation ;).
> > Getting CCd on revisions and -mm inclusion is a really nice reminder.
> >
> > Unless there is a really good reason not to (is there ever?), could
> > you please keep CCs?
> >
>
> Ok, if you want, I'll CC always.
> I myself just don't like to get 3 copies of mails when I don't have
> much interests ;)
>
> Thanks,
> -Kame

Hi Kame, Johannes,

Sorry for getting into this thread late and here are some comments:

There are few patches that we've been working on which could change
the memcg reclaim path quite bit. I wonder if they have chance to be
merged later, this patch might need to be adjusted accordingly as
well. If the ABI needs to be changed, that would be hard.

There is a patch Andrew (abrestic@) has been testing which adds the
same memory.vmscan_stat, but based on some page reclaim patches.
(Mainly the memcg-aware global reclaim from Johannes ). And it does
adjust to the hierarchical reclaim change as Johannes mentioned.

So, may I suggest us to hold on this patch for now? while the other
page reclaim changes being settled, we can then add it in.

Thanks

--Ying





>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3] memcg: add memory.vmscan_stat
@ 2011-08-13  1:04           ` Ying Han
  0 siblings, 0 replies; 54+ messages in thread
From: Ying Han @ 2011-08-13  1:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, linux-mm, linux-kernel, nishimura, Michal Hocko,
	akpm, abrestic

On Tue, Aug 9, 2011 at 1:01 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> On Tue, 9 Aug 2011 10:01:59 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
>
> > On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 8 Aug 2011 14:43:33 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > On a non-technical note: as Ying Han and I were the other two people
> > > > working on reclaim and statistics, it really irks me that neither of
> > > > us were CCd on this.  Especially on such a controversial change.
> > >
> > > I always drop CC if no reply/review comes.
> >
> > There is always the possibility that a single mail in an otherwise
> > unrelated patch series is overlooked (especially while on vacation ;).
> > Getting CCd on revisions and -mm inclusion is a really nice reminder.
> >
> > Unless there is a really good reason not to (is there ever?), could
> > you please keep CCs?
> >
>
> Ok, if you want, I'll CC always.
> I myself just don't like to get 3 copies of mails when I don't have
> much interests ;)
>
> Thanks,
> -Kame

Hi Kame, Johannes,

Sorry for getting into this thread late and here are some comments:

There are few patches that we've been working on which could change
the memcg reclaim path quite bit. I wonder if they have chance to be
merged later, this patch might need to be adjusted accordingly as
well. If the ABI needs to be changed, that would be hard.

There is a patch Andrew (abrestic@) has been testing which adds the
same memory.vmscan_stat, but based on some page reclaim patches.
(Mainly the memcg-aware global reclaim from Johannes ). And it does
adjust to the hierarchical reclaim change as Johannes mentioned.

So, may I suggest us to hold on this patch for now? while the other
page reclaim changes being settled, we can then add it in.

Thanks

--Ying





>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-08 23:33     ` KAMEZAWA Hiroyuki
@ 2011-08-29 15:51       ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-29 15:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daisuke Nishimura, Balbir Singh, Andrew Brestic, Ying Han,
	Michal Hocko, KAMEZAWA Hiroyuki, linux-mm, linux-kernel

On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 8 Aug 2011 14:43:33 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > +When under_hierarchy is added in the tail, the number indicates the
> > > +total memcg scan of its children and itself.
> > 
> > In your implementation, statistics are only accounted to the memcg
> > triggering the limit and the respectively scanned memcgs.
> > 
> > Consider the following setup:
> > 
> >         A
> >        / \
> >       B   C
> >      /
> >     D
> > 
> > If D tries to charge but hits the limit of A, then B's hierarchy
> > counters do not reflect the reclaim activity resulting in D.
> > 
> yes, as I expected.

Andrew,

with a flawed design, the author unwilling to fix it, and two NAKs,
can we please revert this before the release?

This only got in silently because KAMEZAWA-san dropped all parties
involed in the discussions around this change from the Cc list of
subsequent submissions.

---
From: Johannes Weiner <jweiner@redhat.com>
Subject: [patch] Revert "memcg: add memory.vmscan_stat"

This reverts commit 82f9d486e59f588c7d100865c36510644abda356.

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between.  Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.

Conflicts:

	mm/vmscan.c

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 Documentation/cgroups/memory.txt |   85 +------------------
 include/linux/memcontrol.h       |   19 ----
 include/linux/swap.h             |    6 ++
 mm/memcontrol.c                  |  172 ++------------------------------------
 mm/vmscan.c                      |   39 +--------
 5 files changed, 18 insertions(+), 303 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 6f3c598..06eb6d9 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -380,7 +380,7 @@ will be charged as a new owner of it.
 
 5.2 stat file
 
-5.2.1 memory.stat file includes following statistics
+memory.stat file includes following statistics
 
 # per-memory cgroup local status
 cache		- # of bytes of page cache memory.
@@ -438,89 +438,6 @@ Note:
 	 file_mapped is accounted only when the memory cgroup is owner of page
 	 cache.)
 
-5.2.2 memory.vmscan_stat
-
-memory.vmscan_stat includes statistics information for memory scanning and
-freeing, reclaiming. The statistics shows memory scanning information since
-memory cgroup creation and can be reset to 0 by writing 0 as
-
- #echo 0 > ../memory.vmscan_stat
-
-This file contains following statistics.
-
-[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
-[param]_elapsed_ns_by_[reason]_[under_hierarchy]
-
-For example,
-
-  scanned_file_pages_by_limit indicates the number of scanned
-  file pages at vmscan.
-
-Now, 3 parameters are supported
-
-  scanned - the number of pages scanned by vmscan
-  rotated - the number of pages activated at vmscan
-  freed   - the number of pages freed by vmscan
-
-If "rotated" is high against scanned/freed, the memcg seems busy.
-
-Now, 2 reason are supported
-
-  limit - the memory cgroup's limit
-  system - global memory pressure + softlimit
-           (global memory pressure not under softlimit is not handled now)
-
-When under_hierarchy is added in the tail, the number indicates the
-total memcg scan of its children and itself.
-
-elapsed_ns is a elapsed time in nanosecond. This may include sleep time
-and not indicates CPU usage. So, please take this as just showing
-latency.
-
-Here is an example.
-
-# cat /cgroup/memory/A/memory.vmscan_stat
-scanned_pages_by_limit 9471864
-scanned_anon_pages_by_limit 6640629
-scanned_file_pages_by_limit 2831235
-rotated_pages_by_limit 4243974
-rotated_anon_pages_by_limit 3971968
-rotated_file_pages_by_limit 272006
-freed_pages_by_limit 2318492
-freed_anon_pages_by_limit 962052
-freed_file_pages_by_limit 1356440
-elapsed_ns_by_limit 351386416101
-scanned_pages_by_system 0
-scanned_anon_pages_by_system 0
-scanned_file_pages_by_system 0
-rotated_pages_by_system 0
-rotated_anon_pages_by_system 0
-rotated_file_pages_by_system 0
-freed_pages_by_system 0
-freed_anon_pages_by_system 0
-freed_file_pages_by_system 0
-elapsed_ns_by_system 0
-scanned_pages_by_limit_under_hierarchy 9471864
-scanned_anon_pages_by_limit_under_hierarchy 6640629
-scanned_file_pages_by_limit_under_hierarchy 2831235
-rotated_pages_by_limit_under_hierarchy 4243974
-rotated_anon_pages_by_limit_under_hierarchy 3971968
-rotated_file_pages_by_limit_under_hierarchy 272006
-freed_pages_by_limit_under_hierarchy 2318492
-freed_anon_pages_by_limit_under_hierarchy 962052
-freed_file_pages_by_limit_under_hierarchy 1356440
-elapsed_ns_by_limit_under_hierarchy 351386416101
-scanned_pages_by_system_under_hierarchy 0
-scanned_anon_pages_by_system_under_hierarchy 0
-scanned_file_pages_by_system_under_hierarchy 0
-rotated_pages_by_system_under_hierarchy 0
-rotated_anon_pages_by_system_under_hierarchy 0
-rotated_file_pages_by_system_under_hierarchy 0
-freed_pages_by_system_under_hierarchy 0
-freed_anon_pages_by_system_under_hierarchy 0
-freed_file_pages_by_system_under_hierarchy 0
-elapsed_ns_by_system_under_hierarchy 0
-
 5.3 swappiness
 
 Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3b535db..343bd76 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -39,16 +39,6 @@ extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 					struct mem_cgroup *mem_cont,
 					int active, int file);
 
-struct memcg_scanrecord {
-	struct mem_cgroup *mem; /* scanend memory cgroup */
-	struct mem_cgroup *root; /* scan target hierarchy root */
-	int context;		/* scanning context (see memcontrol.c) */
-	unsigned long nr_scanned[2]; /* the number of scanned pages */
-	unsigned long nr_rotated[2]; /* the number of rotated pages */
-	unsigned long nr_freed[2]; /* the number of freed pages */
-	unsigned long elapsed; /* nsec of time elapsed while scanning */
-};
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 /*
  * All "charge" functions with gfp_mask should use GFP_KERNEL or
@@ -127,15 +117,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page);
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 					struct task_struct *p);
 
-extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
-						  gfp_t gfp_mask, bool noswap,
-						  struct memcg_scanrecord *rec);
-extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						struct memcg_scanrecord *rec,
-						unsigned long *nr_scanned);
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
 #endif
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 14d6249..c71f84b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -252,6 +252,12 @@ static inline void lru_cache_add_file(struct page *page)
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern int __isolate_lru_page(struct page *page, int mode, int file);
+extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
+						  gfp_t gfp_mask, bool noswap);
+extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
+						gfp_t gfp_mask, bool noswap,
+						struct zone *zone,
+						unsigned long *nr_scanned);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ebd1e86..3508777 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -204,50 +204,6 @@ struct mem_cgroup_eventfd_list {
 static void mem_cgroup_threshold(struct mem_cgroup *mem);
 static void mem_cgroup_oom_notify(struct mem_cgroup *mem);
 
-enum {
-	SCAN_BY_LIMIT,
-	SCAN_BY_SYSTEM,
-	NR_SCAN_CONTEXT,
-	SCAN_BY_SHRINK,	/* not recorded now */
-};
-
-enum {
-	SCAN,
-	SCAN_ANON,
-	SCAN_FILE,
-	ROTATE,
-	ROTATE_ANON,
-	ROTATE_FILE,
-	FREED,
-	FREED_ANON,
-	FREED_FILE,
-	ELAPSED,
-	NR_SCANSTATS,
-};
-
-struct scanstat {
-	spinlock_t	lock;
-	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-};
-
-const char *scanstat_string[NR_SCANSTATS] = {
-	"scanned_pages",
-	"scanned_anon_pages",
-	"scanned_file_pages",
-	"rotated_pages",
-	"rotated_anon_pages",
-	"rotated_file_pages",
-	"freed_pages",
-	"freed_anon_pages",
-	"freed_file_pages",
-	"elapsed_ns",
-};
-#define SCANSTAT_WORD_LIMIT	"_by_limit"
-#define SCANSTAT_WORD_SYSTEM	"_by_system"
-#define SCANSTAT_WORD_HIERARCHY	"_under_hierarchy"
-
-
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -313,8 +269,7 @@ struct mem_cgroup {
 
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
-	/* For recording LRU-scan statistics */
-	struct scanstat scanstat;
+
 	/*
 	 * Should we move charges of a task when a task is moved into this
 	 * mem_cgroup ? And what type of charges should we move ?
@@ -1678,44 +1633,6 @@ bool mem_cgroup_reclaimable(struct mem_cgroup *mem, bool noswap)
 }
 #endif
 
-static void __mem_cgroup_record_scanstat(unsigned long *stats,
-			   struct memcg_scanrecord *rec)
-{
-
-	stats[SCAN] += rec->nr_scanned[0] + rec->nr_scanned[1];
-	stats[SCAN_ANON] += rec->nr_scanned[0];
-	stats[SCAN_FILE] += rec->nr_scanned[1];
-
-	stats[ROTATE] += rec->nr_rotated[0] + rec->nr_rotated[1];
-	stats[ROTATE_ANON] += rec->nr_rotated[0];
-	stats[ROTATE_FILE] += rec->nr_rotated[1];
-
-	stats[FREED] += rec->nr_freed[0] + rec->nr_freed[1];
-	stats[FREED_ANON] += rec->nr_freed[0];
-	stats[FREED_FILE] += rec->nr_freed[1];
-
-	stats[ELAPSED] += rec->elapsed;
-}
-
-static void mem_cgroup_record_scanstat(struct memcg_scanrecord *rec)
-{
-	struct mem_cgroup *mem;
-	int context = rec->context;
-
-	if (context >= NR_SCAN_CONTEXT)
-		return;
-
-	mem = rec->mem;
-	spin_lock(&mem->scanstat.lock);
-	__mem_cgroup_record_scanstat(mem->scanstat.stats[context], rec);
-	spin_unlock(&mem->scanstat.lock);
-
-	mem = rec->root;
-	spin_lock(&mem->scanstat.lock);
-	__mem_cgroup_record_scanstat(mem->scanstat.rootstats[context], rec);
-	spin_unlock(&mem->scanstat.lock);
-}
-
 /*
  * Scan the hierarchy if needed to reclaim memory. We remember the last child
  * we reclaimed from, so that we don't end up penalizing one child extensively
@@ -1740,9 +1657,8 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 	bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP;
 	bool shrink = reclaim_options & MEM_CGROUP_RECLAIM_SHRINK;
 	bool check_soft = reclaim_options & MEM_CGROUP_RECLAIM_SOFT;
-	struct memcg_scanrecord rec;
 	unsigned long excess;
-	unsigned long scanned;
+	unsigned long nr_scanned;
 
 	excess = res_counter_soft_limit_excess(&root_mem->res) >> PAGE_SHIFT;
 
@@ -1750,15 +1666,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 	if (!check_soft && !shrink && root_mem->memsw_is_minimum)
 		noswap = true;
 
-	if (shrink)
-		rec.context = SCAN_BY_SHRINK;
-	else if (check_soft)
-		rec.context = SCAN_BY_SYSTEM;
-	else
-		rec.context = SCAN_BY_LIMIT;
-
-	rec.root = root_mem;
-
 	while (1) {
 		victim = mem_cgroup_select_victim(root_mem);
 		if (victim == root_mem) {
@@ -1799,23 +1706,14 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 			css_put(&victim->css);
 			continue;
 		}
-		rec.mem = victim;
-		rec.nr_scanned[0] = 0;
-		rec.nr_scanned[1] = 0;
-		rec.nr_rotated[0] = 0;
-		rec.nr_rotated[1] = 0;
-		rec.nr_freed[0] = 0;
-		rec.nr_freed[1] = 0;
-		rec.elapsed = 0;
 		/* we use swappiness of local cgroup */
 		if (check_soft) {
 			ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
-				noswap, zone, &rec, &scanned);
-			*total_scanned += scanned;
+				noswap, zone, &nr_scanned);
+			*total_scanned += nr_scanned;
 		} else
 			ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
-						noswap, &rec);
-		mem_cgroup_record_scanstat(&rec);
+						noswap);
 		css_put(&victim->css);
 		/*
 		 * At shrinking usage, we can't check we should stop here or
@@ -3854,18 +3752,14 @@ try_to_free:
 	/* try to free all pages in this cgroup */
 	shrink = 1;
 	while (nr_retries && mem->res.usage > 0) {
-		struct memcg_scanrecord rec;
 		int progress;
 
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			goto out;
 		}
-		rec.context = SCAN_BY_SHRINK;
-		rec.mem = mem;
-		rec.root = mem;
 		progress = try_to_free_mem_cgroup_pages(mem, GFP_KERNEL,
-						false, &rec);
+						false);
 		if (!progress) {
 			nr_retries--;
 			/* maybe some writeback is necessary */
@@ -4709,54 +4603,6 @@ static int mem_control_numa_stat_open(struct inode *unused, struct file *file)
 }
 #endif /* CONFIG_NUMA */
 
-static int mem_cgroup_vmscan_stat_read(struct cgroup *cgrp,
-				struct cftype *cft,
-				struct cgroup_map_cb *cb)
-{
-	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
-	char string[64];
-	int i;
-
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_LIMIT);
-		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_LIMIT][i]);
-	}
-
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_SYSTEM);
-		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_SYSTEM][i]);
-	}
-
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_LIMIT);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_LIMIT][i]);
-	}
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_SYSTEM);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
-	}
-	return 0;
-}
-
-static int mem_cgroup_reset_vmscan_stat(struct cgroup *cgrp,
-				unsigned int event)
-{
-	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
-
-	spin_lock(&mem->scanstat.lock);
-	memset(&mem->scanstat.stats, 0, sizeof(mem->scanstat.stats));
-	memset(&mem->scanstat.rootstats, 0, sizeof(mem->scanstat.rootstats));
-	spin_unlock(&mem->scanstat.lock);
-	return 0;
-}
-
-
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -4827,11 +4673,6 @@ static struct cftype mem_cgroup_files[] = {
 		.mode = S_IRUGO,
 	},
 #endif
-	{
-		.name = "vmscan_stat",
-		.read_map = mem_cgroup_vmscan_stat_read,
-		.trigger = mem_cgroup_reset_vmscan_stat,
-	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -5095,7 +4936,6 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	atomic_set(&mem->refcnt, 1);
 	mem->move_charge_at_immigrate = 0;
 	mutex_init(&mem->thresholds_lock);
-	spin_lock_init(&mem->scanstat.lock);
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 04bb6ae..6588746 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -105,7 +105,6 @@ struct scan_control {
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
-	struct memcg_scanrecord *memcg_record;
 
 	/*
 	 * Nodemask of nodes allowed by the caller. If NULL, all nodes
@@ -1349,8 +1348,6 @@ putback_lru_pages(struct zone *zone, struct scan_control *sc,
 			int file = is_file_lru(lru);
 			int numpages = hpage_nr_pages(page);
 			reclaim_stat->recent_rotated[file] += numpages;
-			if (!scanning_global_lru(sc))
-				sc->memcg_record->nr_rotated[file] += numpages;
 		}
 		if (!pagevec_add(&pvec, page)) {
 			spin_unlock_irq(&zone->lru_lock);
@@ -1394,10 +1391,6 @@ static noinline_for_stack void update_isolated_counts(struct zone *zone,
 
 	reclaim_stat->recent_scanned[0] += *nr_anon;
 	reclaim_stat->recent_scanned[1] += *nr_file;
-	if (!scanning_global_lru(sc)) {
-		sc->memcg_record->nr_scanned[0] += *nr_anon;
-		sc->memcg_record->nr_scanned[1] += *nr_file;
-	}
 }
 
 /*
@@ -1511,9 +1504,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
 	}
 
-	if (!scanning_global_lru(sc))
-		sc->memcg_record->nr_freed[file] += nr_reclaimed;
-
 	local_irq_disable();
 	if (current_is_kswapd())
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
@@ -1613,8 +1603,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	}
 
 	reclaim_stat->recent_scanned[file] += nr_taken;
-	if (!scanning_global_lru(sc))
-		sc->memcg_record->nr_scanned[file] += nr_taken;
 
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
@@ -1666,8 +1654,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	 * get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
-	if (!scanning_global_lru(sc))
-		sc->memcg_record->nr_rotated[file] += nr_rotated;
 
 	move_active_pages_to_lru(zone, &l_active,
 						LRU_ACTIVE + file * LRU_FILE);
@@ -2253,10 +2239,9 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
 unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-					gfp_t gfp_mask, bool noswap,
-					struct zone *zone,
-					struct memcg_scanrecord *rec,
-					unsigned long *scanned)
+						gfp_t gfp_mask, bool noswap,
+						struct zone *zone,
+						unsigned long *nr_scanned)
 {
 	struct scan_control sc = {
 		.nr_scanned = 0,
@@ -2266,9 +2251,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 		.may_swap = !noswap,
 		.order = 0,
 		.mem_cgroup = mem,
-		.memcg_record = rec,
 	};
-	ktime_t start, end;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -2277,7 +2260,6 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 						      sc.may_writepage,
 						      sc.gfp_mask);
 
-	start = ktime_get();
 	/*
 	 * NOTE: Although we can get the priority field, using it
 	 * here is not a good idea, since it limits the pages we can scan.
@@ -2286,25 +2268,19 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 	 * the priority and make it zero.
 	 */
 	shrink_zone(0, zone, &sc);
-	end = ktime_get();
-
-	if (rec)
-		rec->elapsed += ktime_to_ns(ktime_sub(end, start));
-	*scanned = sc.nr_scanned;
 
 	trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
 
+	*nr_scanned = sc.nr_scanned;
 	return sc.nr_reclaimed;
 }
 
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 					   gfp_t gfp_mask,
-					   bool noswap,
-					   struct memcg_scanrecord *rec)
+					   bool noswap)
 {
 	struct zonelist *zonelist;
 	unsigned long nr_reclaimed;
-	ktime_t start, end;
 	int nid;
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
@@ -2313,7 +2289,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.order = 0,
 		.mem_cgroup = mem_cont,
-		.memcg_record = rec,
 		.nodemask = NULL, /* we don't care the placement */
 		.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 				(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK),
@@ -2322,7 +2297,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.gfp_mask = sc.gfp_mask,
 	};
 
-	start = ktime_get();
 	/*
 	 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
 	 * take care of from where we get pages. So the node where we start the
@@ -2337,9 +2311,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 					    sc.gfp_mask);
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc, &shrink);
-	end = ktime_get();
-	if (rec)
-		rec->elapsed += ktime_to_ns(ktime_sub(end, start));
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-29 15:51       ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-29 15:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daisuke Nishimura, Balbir Singh, Andrew Brestic, Ying Han,
	Michal Hocko, KAMEZAWA Hiroyuki, linux-mm, linux-kernel

On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 8 Aug 2011 14:43:33 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > +When under_hierarchy is added in the tail, the number indicates the
> > > +total memcg scan of its children and itself.
> > 
> > In your implementation, statistics are only accounted to the memcg
> > triggering the limit and the respectively scanned memcgs.
> > 
> > Consider the following setup:
> > 
> >         A
> >        / \
> >       B   C
> >      /
> >     D
> > 
> > If D tries to charge but hits the limit of A, then B's hierarchy
> > counters do not reflect the reclaim activity resulting in D.
> > 
> yes, as I expected.

Andrew,

with a flawed design, the author unwilling to fix it, and two NAKs,
can we please revert this before the release?

This only got in silently because KAMEZAWA-san dropped all parties
involed in the discussions around this change from the Cc list of
subsequent submissions.

---
From: Johannes Weiner <jweiner@redhat.com>
Subject: [patch] Revert "memcg: add memory.vmscan_stat"

This reverts commit 82f9d486e59f588c7d100865c36510644abda356.

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between.  Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.

Conflicts:

	mm/vmscan.c

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 Documentation/cgroups/memory.txt |   85 +------------------
 include/linux/memcontrol.h       |   19 ----
 include/linux/swap.h             |    6 ++
 mm/memcontrol.c                  |  172 ++------------------------------------
 mm/vmscan.c                      |   39 +--------
 5 files changed, 18 insertions(+), 303 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 6f3c598..06eb6d9 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -380,7 +380,7 @@ will be charged as a new owner of it.
 
 5.2 stat file
 
-5.2.1 memory.stat file includes following statistics
+memory.stat file includes following statistics
 
 # per-memory cgroup local status
 cache		- # of bytes of page cache memory.
@@ -438,89 +438,6 @@ Note:
 	 file_mapped is accounted only when the memory cgroup is owner of page
 	 cache.)
 
-5.2.2 memory.vmscan_stat
-
-memory.vmscan_stat includes statistics information for memory scanning and
-freeing, reclaiming. The statistics shows memory scanning information since
-memory cgroup creation and can be reset to 0 by writing 0 as
-
- #echo 0 > ../memory.vmscan_stat
-
-This file contains following statistics.
-
-[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
-[param]_elapsed_ns_by_[reason]_[under_hierarchy]
-
-For example,
-
-  scanned_file_pages_by_limit indicates the number of scanned
-  file pages at vmscan.
-
-Now, 3 parameters are supported
-
-  scanned - the number of pages scanned by vmscan
-  rotated - the number of pages activated at vmscan
-  freed   - the number of pages freed by vmscan
-
-If "rotated" is high against scanned/freed, the memcg seems busy.
-
-Now, 2 reason are supported
-
-  limit - the memory cgroup's limit
-  system - global memory pressure + softlimit
-           (global memory pressure not under softlimit is not handled now)
-
-When under_hierarchy is added in the tail, the number indicates the
-total memcg scan of its children and itself.
-
-elapsed_ns is a elapsed time in nanosecond. This may include sleep time
-and not indicates CPU usage. So, please take this as just showing
-latency.
-
-Here is an example.
-
-# cat /cgroup/memory/A/memory.vmscan_stat
-scanned_pages_by_limit 9471864
-scanned_anon_pages_by_limit 6640629
-scanned_file_pages_by_limit 2831235
-rotated_pages_by_limit 4243974
-rotated_anon_pages_by_limit 3971968
-rotated_file_pages_by_limit 272006
-freed_pages_by_limit 2318492
-freed_anon_pages_by_limit 962052
-freed_file_pages_by_limit 1356440
-elapsed_ns_by_limit 351386416101
-scanned_pages_by_system 0
-scanned_anon_pages_by_system 0
-scanned_file_pages_by_system 0
-rotated_pages_by_system 0
-rotated_anon_pages_by_system 0
-rotated_file_pages_by_system 0
-freed_pages_by_system 0
-freed_anon_pages_by_system 0
-freed_file_pages_by_system 0
-elapsed_ns_by_system 0
-scanned_pages_by_limit_under_hierarchy 9471864
-scanned_anon_pages_by_limit_under_hierarchy 6640629
-scanned_file_pages_by_limit_under_hierarchy 2831235
-rotated_pages_by_limit_under_hierarchy 4243974
-rotated_anon_pages_by_limit_under_hierarchy 3971968
-rotated_file_pages_by_limit_under_hierarchy 272006
-freed_pages_by_limit_under_hierarchy 2318492
-freed_anon_pages_by_limit_under_hierarchy 962052
-freed_file_pages_by_limit_under_hierarchy 1356440
-elapsed_ns_by_limit_under_hierarchy 351386416101
-scanned_pages_by_system_under_hierarchy 0
-scanned_anon_pages_by_system_under_hierarchy 0
-scanned_file_pages_by_system_under_hierarchy 0
-rotated_pages_by_system_under_hierarchy 0
-rotated_anon_pages_by_system_under_hierarchy 0
-rotated_file_pages_by_system_under_hierarchy 0
-freed_pages_by_system_under_hierarchy 0
-freed_anon_pages_by_system_under_hierarchy 0
-freed_file_pages_by_system_under_hierarchy 0
-elapsed_ns_by_system_under_hierarchy 0
-
 5.3 swappiness
 
 Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3b535db..343bd76 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -39,16 +39,6 @@ extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 					struct mem_cgroup *mem_cont,
 					int active, int file);
 
-struct memcg_scanrecord {
-	struct mem_cgroup *mem; /* scanend memory cgroup */
-	struct mem_cgroup *root; /* scan target hierarchy root */
-	int context;		/* scanning context (see memcontrol.c) */
-	unsigned long nr_scanned[2]; /* the number of scanned pages */
-	unsigned long nr_rotated[2]; /* the number of rotated pages */
-	unsigned long nr_freed[2]; /* the number of freed pages */
-	unsigned long elapsed; /* nsec of time elapsed while scanning */
-};
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 /*
  * All "charge" functions with gfp_mask should use GFP_KERNEL or
@@ -127,15 +117,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page);
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 					struct task_struct *p);
 
-extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
-						  gfp_t gfp_mask, bool noswap,
-						  struct memcg_scanrecord *rec);
-extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						struct memcg_scanrecord *rec,
-						unsigned long *nr_scanned);
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
 #endif
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 14d6249..c71f84b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -252,6 +252,12 @@ static inline void lru_cache_add_file(struct page *page)
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern int __isolate_lru_page(struct page *page, int mode, int file);
+extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
+						  gfp_t gfp_mask, bool noswap);
+extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
+						gfp_t gfp_mask, bool noswap,
+						struct zone *zone,
+						unsigned long *nr_scanned);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ebd1e86..3508777 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -204,50 +204,6 @@ struct mem_cgroup_eventfd_list {
 static void mem_cgroup_threshold(struct mem_cgroup *mem);
 static void mem_cgroup_oom_notify(struct mem_cgroup *mem);
 
-enum {
-	SCAN_BY_LIMIT,
-	SCAN_BY_SYSTEM,
-	NR_SCAN_CONTEXT,
-	SCAN_BY_SHRINK,	/* not recorded now */
-};
-
-enum {
-	SCAN,
-	SCAN_ANON,
-	SCAN_FILE,
-	ROTATE,
-	ROTATE_ANON,
-	ROTATE_FILE,
-	FREED,
-	FREED_ANON,
-	FREED_FILE,
-	ELAPSED,
-	NR_SCANSTATS,
-};
-
-struct scanstat {
-	spinlock_t	lock;
-	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-};
-
-const char *scanstat_string[NR_SCANSTATS] = {
-	"scanned_pages",
-	"scanned_anon_pages",
-	"scanned_file_pages",
-	"rotated_pages",
-	"rotated_anon_pages",
-	"rotated_file_pages",
-	"freed_pages",
-	"freed_anon_pages",
-	"freed_file_pages",
-	"elapsed_ns",
-};
-#define SCANSTAT_WORD_LIMIT	"_by_limit"
-#define SCANSTAT_WORD_SYSTEM	"_by_system"
-#define SCANSTAT_WORD_HIERARCHY	"_under_hierarchy"
-
-
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -313,8 +269,7 @@ struct mem_cgroup {
 
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
-	/* For recording LRU-scan statistics */
-	struct scanstat scanstat;
+
 	/*
 	 * Should we move charges of a task when a task is moved into this
 	 * mem_cgroup ? And what type of charges should we move ?
@@ -1678,44 +1633,6 @@ bool mem_cgroup_reclaimable(struct mem_cgroup *mem, bool noswap)
 }
 #endif
 
-static void __mem_cgroup_record_scanstat(unsigned long *stats,
-			   struct memcg_scanrecord *rec)
-{
-
-	stats[SCAN] += rec->nr_scanned[0] + rec->nr_scanned[1];
-	stats[SCAN_ANON] += rec->nr_scanned[0];
-	stats[SCAN_FILE] += rec->nr_scanned[1];
-
-	stats[ROTATE] += rec->nr_rotated[0] + rec->nr_rotated[1];
-	stats[ROTATE_ANON] += rec->nr_rotated[0];
-	stats[ROTATE_FILE] += rec->nr_rotated[1];
-
-	stats[FREED] += rec->nr_freed[0] + rec->nr_freed[1];
-	stats[FREED_ANON] += rec->nr_freed[0];
-	stats[FREED_FILE] += rec->nr_freed[1];
-
-	stats[ELAPSED] += rec->elapsed;
-}
-
-static void mem_cgroup_record_scanstat(struct memcg_scanrecord *rec)
-{
-	struct mem_cgroup *mem;
-	int context = rec->context;
-
-	if (context >= NR_SCAN_CONTEXT)
-		return;
-
-	mem = rec->mem;
-	spin_lock(&mem->scanstat.lock);
-	__mem_cgroup_record_scanstat(mem->scanstat.stats[context], rec);
-	spin_unlock(&mem->scanstat.lock);
-
-	mem = rec->root;
-	spin_lock(&mem->scanstat.lock);
-	__mem_cgroup_record_scanstat(mem->scanstat.rootstats[context], rec);
-	spin_unlock(&mem->scanstat.lock);
-}
-
 /*
  * Scan the hierarchy if needed to reclaim memory. We remember the last child
  * we reclaimed from, so that we don't end up penalizing one child extensively
@@ -1740,9 +1657,8 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 	bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP;
 	bool shrink = reclaim_options & MEM_CGROUP_RECLAIM_SHRINK;
 	bool check_soft = reclaim_options & MEM_CGROUP_RECLAIM_SOFT;
-	struct memcg_scanrecord rec;
 	unsigned long excess;
-	unsigned long scanned;
+	unsigned long nr_scanned;
 
 	excess = res_counter_soft_limit_excess(&root_mem->res) >> PAGE_SHIFT;
 
@@ -1750,15 +1666,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 	if (!check_soft && !shrink && root_mem->memsw_is_minimum)
 		noswap = true;
 
-	if (shrink)
-		rec.context = SCAN_BY_SHRINK;
-	else if (check_soft)
-		rec.context = SCAN_BY_SYSTEM;
-	else
-		rec.context = SCAN_BY_LIMIT;
-
-	rec.root = root_mem;
-
 	while (1) {
 		victim = mem_cgroup_select_victim(root_mem);
 		if (victim == root_mem) {
@@ -1799,23 +1706,14 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
 			css_put(&victim->css);
 			continue;
 		}
-		rec.mem = victim;
-		rec.nr_scanned[0] = 0;
-		rec.nr_scanned[1] = 0;
-		rec.nr_rotated[0] = 0;
-		rec.nr_rotated[1] = 0;
-		rec.nr_freed[0] = 0;
-		rec.nr_freed[1] = 0;
-		rec.elapsed = 0;
 		/* we use swappiness of local cgroup */
 		if (check_soft) {
 			ret = mem_cgroup_shrink_node_zone(victim, gfp_mask,
-				noswap, zone, &rec, &scanned);
-			*total_scanned += scanned;
+				noswap, zone, &nr_scanned);
+			*total_scanned += nr_scanned;
 		} else
 			ret = try_to_free_mem_cgroup_pages(victim, gfp_mask,
-						noswap, &rec);
-		mem_cgroup_record_scanstat(&rec);
+						noswap);
 		css_put(&victim->css);
 		/*
 		 * At shrinking usage, we can't check we should stop here or
@@ -3854,18 +3752,14 @@ try_to_free:
 	/* try to free all pages in this cgroup */
 	shrink = 1;
 	while (nr_retries && mem->res.usage > 0) {
-		struct memcg_scanrecord rec;
 		int progress;
 
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			goto out;
 		}
-		rec.context = SCAN_BY_SHRINK;
-		rec.mem = mem;
-		rec.root = mem;
 		progress = try_to_free_mem_cgroup_pages(mem, GFP_KERNEL,
-						false, &rec);
+						false);
 		if (!progress) {
 			nr_retries--;
 			/* maybe some writeback is necessary */
@@ -4709,54 +4603,6 @@ static int mem_control_numa_stat_open(struct inode *unused, struct file *file)
 }
 #endif /* CONFIG_NUMA */
 
-static int mem_cgroup_vmscan_stat_read(struct cgroup *cgrp,
-				struct cftype *cft,
-				struct cgroup_map_cb *cb)
-{
-	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
-	char string[64];
-	int i;
-
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_LIMIT);
-		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_LIMIT][i]);
-	}
-
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_SYSTEM);
-		cb->fill(cb, string,  mem->scanstat.stats[SCAN_BY_SYSTEM][i]);
-	}
-
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_LIMIT);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_LIMIT][i]);
-	}
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_SYSTEM);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb, string,  mem->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
-	}
-	return 0;
-}
-
-static int mem_cgroup_reset_vmscan_stat(struct cgroup *cgrp,
-				unsigned int event)
-{
-	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
-
-	spin_lock(&mem->scanstat.lock);
-	memset(&mem->scanstat.stats, 0, sizeof(mem->scanstat.stats));
-	memset(&mem->scanstat.rootstats, 0, sizeof(mem->scanstat.rootstats));
-	spin_unlock(&mem->scanstat.lock);
-	return 0;
-}
-
-
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -4827,11 +4673,6 @@ static struct cftype mem_cgroup_files[] = {
 		.mode = S_IRUGO,
 	},
 #endif
-	{
-		.name = "vmscan_stat",
-		.read_map = mem_cgroup_vmscan_stat_read,
-		.trigger = mem_cgroup_reset_vmscan_stat,
-	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -5095,7 +4936,6 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	atomic_set(&mem->refcnt, 1);
 	mem->move_charge_at_immigrate = 0;
 	mutex_init(&mem->thresholds_lock);
-	spin_lock_init(&mem->scanstat.lock);
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 04bb6ae..6588746 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -105,7 +105,6 @@ struct scan_control {
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
-	struct memcg_scanrecord *memcg_record;
 
 	/*
 	 * Nodemask of nodes allowed by the caller. If NULL, all nodes
@@ -1349,8 +1348,6 @@ putback_lru_pages(struct zone *zone, struct scan_control *sc,
 			int file = is_file_lru(lru);
 			int numpages = hpage_nr_pages(page);
 			reclaim_stat->recent_rotated[file] += numpages;
-			if (!scanning_global_lru(sc))
-				sc->memcg_record->nr_rotated[file] += numpages;
 		}
 		if (!pagevec_add(&pvec, page)) {
 			spin_unlock_irq(&zone->lru_lock);
@@ -1394,10 +1391,6 @@ static noinline_for_stack void update_isolated_counts(struct zone *zone,
 
 	reclaim_stat->recent_scanned[0] += *nr_anon;
 	reclaim_stat->recent_scanned[1] += *nr_file;
-	if (!scanning_global_lru(sc)) {
-		sc->memcg_record->nr_scanned[0] += *nr_anon;
-		sc->memcg_record->nr_scanned[1] += *nr_file;
-	}
 }
 
 /*
@@ -1511,9 +1504,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
 	}
 
-	if (!scanning_global_lru(sc))
-		sc->memcg_record->nr_freed[file] += nr_reclaimed;
-
 	local_irq_disable();
 	if (current_is_kswapd())
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
@@ -1613,8 +1603,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	}
 
 	reclaim_stat->recent_scanned[file] += nr_taken;
-	if (!scanning_global_lru(sc))
-		sc->memcg_record->nr_scanned[file] += nr_taken;
 
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
@@ -1666,8 +1654,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	 * get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
-	if (!scanning_global_lru(sc))
-		sc->memcg_record->nr_rotated[file] += nr_rotated;
 
 	move_active_pages_to_lru(zone, &l_active,
 						LRU_ACTIVE + file * LRU_FILE);
@@ -2253,10 +2239,9 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
 unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-					gfp_t gfp_mask, bool noswap,
-					struct zone *zone,
-					struct memcg_scanrecord *rec,
-					unsigned long *scanned)
+						gfp_t gfp_mask, bool noswap,
+						struct zone *zone,
+						unsigned long *nr_scanned)
 {
 	struct scan_control sc = {
 		.nr_scanned = 0,
@@ -2266,9 +2251,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 		.may_swap = !noswap,
 		.order = 0,
 		.mem_cgroup = mem,
-		.memcg_record = rec,
 	};
-	ktime_t start, end;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -2277,7 +2260,6 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 						      sc.may_writepage,
 						      sc.gfp_mask);
 
-	start = ktime_get();
 	/*
 	 * NOTE: Although we can get the priority field, using it
 	 * here is not a good idea, since it limits the pages we can scan.
@@ -2286,25 +2268,19 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 	 * the priority and make it zero.
 	 */
 	shrink_zone(0, zone, &sc);
-	end = ktime_get();
-
-	if (rec)
-		rec->elapsed += ktime_to_ns(ktime_sub(end, start));
-	*scanned = sc.nr_scanned;
 
 	trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
 
+	*nr_scanned = sc.nr_scanned;
 	return sc.nr_reclaimed;
 }
 
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 					   gfp_t gfp_mask,
-					   bool noswap,
-					   struct memcg_scanrecord *rec)
+					   bool noswap)
 {
 	struct zonelist *zonelist;
 	unsigned long nr_reclaimed;
-	ktime_t start, end;
 	int nid;
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
@@ -2313,7 +2289,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.order = 0,
 		.mem_cgroup = mem_cont,
-		.memcg_record = rec,
 		.nodemask = NULL, /* we don't care the placement */
 		.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 				(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK),
@@ -2322,7 +2297,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.gfp_mask = sc.gfp_mask,
 	};
 
-	start = ktime_get();
 	/*
 	 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
 	 * take care of from where we get pages. So the node where we start the
@@ -2337,9 +2311,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 					    sc.gfp_mask);
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc, &shrink);
-	end = ktime_get();
-	if (rec)
-		rec->elapsed += ktime_to_ns(ktime_sub(end, start));
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 
-- 
1.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-29 15:51       ` Johannes Weiner
@ 2011-08-30  1:12         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  1:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Mon, 29 Aug 2011 17:51:13 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 8 Aug 2011 14:43:33 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > +When under_hierarchy is added in the tail, the number indicates the
> > > > +total memcg scan of its children and itself.
> > > 
> > > In your implementation, statistics are only accounted to the memcg
> > > triggering the limit and the respectively scanned memcgs.
> > > 
> > > Consider the following setup:
> > > 
> > >         A
> > >        / \
> > >       B   C
> > >      /
> > >     D
> > > 
> > > If D tries to charge but hits the limit of A, then B's hierarchy
> > > counters do not reflect the reclaim activity resulting in D.
> > > 
> > yes, as I expected.
> 
> Andrew,
> 
> with a flawed design, the author unwilling to fix it, and two NAKs,
> can we please revert this before the release?
> 

How about this ?
==
Now, vmscan_stat's hierarchy counter just counts scan data which
is caused by the owner of limits. Then, it's not 'hierarchical'
as other parts of memcg does.

For example, Assuming following hierarchy

	A
       /
      B
     /
    C

When B,C, is scanned because of A's limit, vmscan_stat's
hierarchy accounting does
   A's hierarchy scan = A'scan + B'scan + C'scan
   B's hierarchy scan = 0
   C's hierarchy scan = 0
This first design was because the author considered C's
scan is caused by A. But considering interface compatibility,
following is natural.

  A's hierarchy scan = A'scan + B'scan + C'scan
  B's hierarchy scan = B'scan + C'scan
  C's hierarchy scan = C'scan

This patch changes counting implementation.

Suggested-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Index: mmotm-Aug29/mm/memcontrol.c
===================================================================
--- mmotm-Aug29.orig/mm/memcontrol.c
+++ mmotm-Aug29/mm/memcontrol.c
@@ -229,7 +229,7 @@ enum {
 struct scanstat {
 	spinlock_t	lock;
 	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
+	unsigned long	hierarchy_stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
 };
 
 const char *scanstat_string[NR_SCANSTATS] = {
@@ -1701,6 +1701,7 @@ static void __mem_cgroup_record_scanstat
 static void mem_cgroup_record_scanstat(struct memcg_scanrecord *rec)
 {
 	struct mem_cgroup *memcg;
+	struct cgroup *cgroup;
 	int context = rec->context;
 
 	if (context >= NR_SCAN_CONTEXT)
@@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
 	spin_lock(&memcg->scanstat.lock);
 	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
 	spin_unlock(&memcg->scanstat.lock);
-
-	memcg = rec->root;
-	spin_lock(&memcg->scanstat.lock);
-	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
-	spin_unlock(&memcg->scanstat.lock);
+	cgroup = memcg->css.cgroup;
+	do {
+		spin_lock(&memcg->scanstat.lock);
+		__mem_cgroup_record_scanstat(
+			memcg->scanstat.hierarchy_stats[context], rec);
+		spin_unlock(&memcg->scanstat.lock);
+		if (!cgroup->parent)
+			break;
+		cgroup = cgroup->parent;
+		memcg = mem_cgroup_from_cont(cgroup);
+	} while (memcg->use_hierarchy && memcg != rec->root);
+	return;
 }
 
 /*
@@ -4733,14 +4741,14 @@ static int mem_cgroup_vmscan_stat_read(s
 		strcat(string, SCANSTAT_WORD_LIMIT);
 		strcat(string, SCANSTAT_WORD_HIERARCHY);
 		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_LIMIT][i]);
+		    string, memcg->scanstat.hierarchy_stats[SCAN_BY_LIMIT][i]);
 	}
 	for (i = 0; i < NR_SCANSTATS; i++) {
 		strcpy(string, scanstat_string[i]);
 		strcat(string, SCANSTAT_WORD_SYSTEM);
 		strcat(string, SCANSTAT_WORD_HIERARCHY);
 		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
+		    string, memcg->scanstat.hierarchy_stats[SCAN_BY_SYSTEM][i]);
 	}
 	return 0;
 }
@@ -4752,8 +4760,8 @@ static int mem_cgroup_reset_vmscan_stat(
 
 	spin_lock(&memcg->scanstat.lock);
 	memset(&memcg->scanstat.stats, 0, sizeof(memcg->scanstat.stats));
-	memset(&memcg->scanstat.rootstats,
-		0, sizeof(memcg->scanstat.rootstats));
+	memset(&memcg->scanstat.hierarchy_stats,
+		0, sizeof(memcg->scanstat.hierarchy_stats));
 	spin_unlock(&memcg->scanstat.lock);
 	return 0;
 }







^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30  1:12         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  1:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Mon, 29 Aug 2011 17:51:13 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 8 Aug 2011 14:43:33 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > +When under_hierarchy is added in the tail, the number indicates the
> > > > +total memcg scan of its children and itself.
> > > 
> > > In your implementation, statistics are only accounted to the memcg
> > > triggering the limit and the respectively scanned memcgs.
> > > 
> > > Consider the following setup:
> > > 
> > >         A
> > >        / \
> > >       B   C
> > >      /
> > >     D
> > > 
> > > If D tries to charge but hits the limit of A, then B's hierarchy
> > > counters do not reflect the reclaim activity resulting in D.
> > > 
> > yes, as I expected.
> 
> Andrew,
> 
> with a flawed design, the author unwilling to fix it, and two NAKs,
> can we please revert this before the release?
> 

How about this ?
==
Now, vmscan_stat's hierarchy counter just counts scan data which
is caused by the owner of limits. Then, it's not 'hierarchical'
as other parts of memcg does.

For example, Assuming following hierarchy

	A
       /
      B
     /
    C

When B,C, is scanned because of A's limit, vmscan_stat's
hierarchy accounting does
   A's hierarchy scan = A'scan + B'scan + C'scan
   B's hierarchy scan = 0
   C's hierarchy scan = 0
This first design was because the author considered C's
scan is caused by A. But considering interface compatibility,
following is natural.

  A's hierarchy scan = A'scan + B'scan + C'scan
  B's hierarchy scan = B'scan + C'scan
  C's hierarchy scan = C'scan

This patch changes counting implementation.

Suggested-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Index: mmotm-Aug29/mm/memcontrol.c
===================================================================
--- mmotm-Aug29.orig/mm/memcontrol.c
+++ mmotm-Aug29/mm/memcontrol.c
@@ -229,7 +229,7 @@ enum {
 struct scanstat {
 	spinlock_t	lock;
 	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
+	unsigned long	hierarchy_stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
 };
 
 const char *scanstat_string[NR_SCANSTATS] = {
@@ -1701,6 +1701,7 @@ static void __mem_cgroup_record_scanstat
 static void mem_cgroup_record_scanstat(struct memcg_scanrecord *rec)
 {
 	struct mem_cgroup *memcg;
+	struct cgroup *cgroup;
 	int context = rec->context;
 
 	if (context >= NR_SCAN_CONTEXT)
@@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
 	spin_lock(&memcg->scanstat.lock);
 	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
 	spin_unlock(&memcg->scanstat.lock);
-
-	memcg = rec->root;
-	spin_lock(&memcg->scanstat.lock);
-	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
-	spin_unlock(&memcg->scanstat.lock);
+	cgroup = memcg->css.cgroup;
+	do {
+		spin_lock(&memcg->scanstat.lock);
+		__mem_cgroup_record_scanstat(
+			memcg->scanstat.hierarchy_stats[context], rec);
+		spin_unlock(&memcg->scanstat.lock);
+		if (!cgroup->parent)
+			break;
+		cgroup = cgroup->parent;
+		memcg = mem_cgroup_from_cont(cgroup);
+	} while (memcg->use_hierarchy && memcg != rec->root);
+	return;
 }
 
 /*
@@ -4733,14 +4741,14 @@ static int mem_cgroup_vmscan_stat_read(s
 		strcat(string, SCANSTAT_WORD_LIMIT);
 		strcat(string, SCANSTAT_WORD_HIERARCHY);
 		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_LIMIT][i]);
+		    string, memcg->scanstat.hierarchy_stats[SCAN_BY_LIMIT][i]);
 	}
 	for (i = 0; i < NR_SCANSTATS; i++) {
 		strcpy(string, scanstat_string[i]);
 		strcat(string, SCANSTAT_WORD_SYSTEM);
 		strcat(string, SCANSTAT_WORD_HIERARCHY);
 		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
+		    string, memcg->scanstat.hierarchy_stats[SCAN_BY_SYSTEM][i]);
 	}
 	return 0;
 }
@@ -4752,8 +4760,8 @@ static int mem_cgroup_reset_vmscan_stat(
 
 	spin_lock(&memcg->scanstat.lock);
 	memset(&memcg->scanstat.stats, 0, sizeof(memcg->scanstat.stats));
-	memset(&memcg->scanstat.rootstats,
-		0, sizeof(memcg->scanstat.rootstats));
+	memset(&memcg->scanstat.hierarchy_stats,
+		0, sizeof(memcg->scanstat.hierarchy_stats));
 	spin_unlock(&memcg->scanstat.lock);
 	return 0;
 }






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30  1:12         ` KAMEZAWA Hiroyuki
@ 2011-08-30  7:04           ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30  7:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 29 Aug 2011 17:51:13 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 8 Aug 2011 14:43:33 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > +When under_hierarchy is added in the tail, the number indicates the
> > > > > +total memcg scan of its children and itself.
> > > > 
> > > > In your implementation, statistics are only accounted to the memcg
> > > > triggering the limit and the respectively scanned memcgs.
> > > > 
> > > > Consider the following setup:
> > > > 
> > > >         A
> > > >        / \
> > > >       B   C
> > > >      /
> > > >     D
> > > > 
> > > > If D tries to charge but hits the limit of A, then B's hierarchy
> > > > counters do not reflect the reclaim activity resulting in D.
> > > > 
> > > yes, as I expected.
> > 
> > Andrew,
> > 
> > with a flawed design, the author unwilling to fix it, and two NAKs,
> > can we please revert this before the release?
> 
> How about this ?

> @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
>  	spin_lock(&memcg->scanstat.lock);
>  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
>  	spin_unlock(&memcg->scanstat.lock);
> -
> -	memcg = rec->root;
> -	spin_lock(&memcg->scanstat.lock);
> -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> -	spin_unlock(&memcg->scanstat.lock);
> +	cgroup = memcg->css.cgroup;
> +	do {
> +		spin_lock(&memcg->scanstat.lock);
> +		__mem_cgroup_record_scanstat(
> +			memcg->scanstat.hierarchy_stats[context], rec);
> +		spin_unlock(&memcg->scanstat.lock);
> +		if (!cgroup->parent)
> +			break;
> +		cgroup = cgroup->parent;
> +		memcg = mem_cgroup_from_cont(cgroup);
> +	} while (memcg->use_hierarchy && memcg != rec->root);

Okay, so this looks correct, but it sums up all parents after each
memcg scanned, which could have a performance impact.  Usually,
hierarchy statistics are only summed up when a user reads them.

I don't get why this has to be done completely different from the way
we usually do things, without any justification, whatsoever.

Why do you want to pass a recording structure down the reclaim stack?
Why not make it per-cpu counters that are only summed up, together
with the hierarchy values, when someone is actually interested in
them?  With an interface like mem_cgroup_count_vm_event(), or maybe
even an extension of that function?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30  7:04           ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30  7:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 29 Aug 2011 17:51:13 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 8 Aug 2011 14:43:33 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > +When under_hierarchy is added in the tail, the number indicates the
> > > > > +total memcg scan of its children and itself.
> > > > 
> > > > In your implementation, statistics are only accounted to the memcg
> > > > triggering the limit and the respectively scanned memcgs.
> > > > 
> > > > Consider the following setup:
> > > > 
> > > >         A
> > > >        / \
> > > >       B   C
> > > >      /
> > > >     D
> > > > 
> > > > If D tries to charge but hits the limit of A, then B's hierarchy
> > > > counters do not reflect the reclaim activity resulting in D.
> > > > 
> > > yes, as I expected.
> > 
> > Andrew,
> > 
> > with a flawed design, the author unwilling to fix it, and two NAKs,
> > can we please revert this before the release?
> 
> How about this ?

> @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
>  	spin_lock(&memcg->scanstat.lock);
>  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
>  	spin_unlock(&memcg->scanstat.lock);
> -
> -	memcg = rec->root;
> -	spin_lock(&memcg->scanstat.lock);
> -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> -	spin_unlock(&memcg->scanstat.lock);
> +	cgroup = memcg->css.cgroup;
> +	do {
> +		spin_lock(&memcg->scanstat.lock);
> +		__mem_cgroup_record_scanstat(
> +			memcg->scanstat.hierarchy_stats[context], rec);
> +		spin_unlock(&memcg->scanstat.lock);
> +		if (!cgroup->parent)
> +			break;
> +		cgroup = cgroup->parent;
> +		memcg = mem_cgroup_from_cont(cgroup);
> +	} while (memcg->use_hierarchy && memcg != rec->root);

Okay, so this looks correct, but it sums up all parents after each
memcg scanned, which could have a performance impact.  Usually,
hierarchy statistics are only summed up when a user reads them.

I don't get why this has to be done completely different from the way
we usually do things, without any justification, whatsoever.

Why do you want to pass a recording structure down the reclaim stack?
Why not make it per-cpu counters that are only summed up, together
with the hierarchy values, when someone is actually interested in
them?  With an interface like mem_cgroup_count_vm_event(), or maybe
even an extension of that function?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30  7:04           ` Johannes Weiner
@ 2011-08-30  7:20             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  7:20 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 09:04:24 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 29 Aug 2011 17:51:13 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Mon, 8 Aug 2011 14:43:33 +0200
> > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > 
> > > > > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > +When under_hierarchy is added in the tail, the number indicates the
> > > > > > +total memcg scan of its children and itself.
> > > > > 
> > > > > In your implementation, statistics are only accounted to the memcg
> > > > > triggering the limit and the respectively scanned memcgs.
> > > > > 
> > > > > Consider the following setup:
> > > > > 
> > > > >         A
> > > > >        / \
> > > > >       B   C
> > > > >      /
> > > > >     D
> > > > > 
> > > > > If D tries to charge but hits the limit of A, then B's hierarchy
> > > > > counters do not reflect the reclaim activity resulting in D.
> > > > > 
> > > > yes, as I expected.
> > > 
> > > Andrew,
> > > 
> > > with a flawed design, the author unwilling to fix it, and two NAKs,
> > > can we please revert this before the release?
> > 
> > How about this ?
> 
> > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> >  	spin_lock(&memcg->scanstat.lock);
> >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> >  	spin_unlock(&memcg->scanstat.lock);
> > -
> > -	memcg = rec->root;
> > -	spin_lock(&memcg->scanstat.lock);
> > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > -	spin_unlock(&memcg->scanstat.lock);
> > +	cgroup = memcg->css.cgroup;
> > +	do {
> > +		spin_lock(&memcg->scanstat.lock);
> > +		__mem_cgroup_record_scanstat(
> > +			memcg->scanstat.hierarchy_stats[context], rec);
> > +		spin_unlock(&memcg->scanstat.lock);
> > +		if (!cgroup->parent)
> > +			break;
> > +		cgroup = cgroup->parent;
> > +		memcg = mem_cgroup_from_cont(cgroup);
> > +	} while (memcg->use_hierarchy && memcg != rec->root);
> 
> Okay, so this looks correct, but it sums up all parents after each
> memcg scanned, which could have a performance impact.  Usually,
> hierarchy statistics are only summed up when a user reads them.
> 
Hmm. But sum-at-read doesn't work.

Assume 3 cgroups in a hierarchy.

	A
       /
      B
     /
    C

C's scan contains 3 causes.
	C's scan caused by limit of A.
	C's scan caused by limit of B.
	C's scan caused by limit of C.

If we make hierarchy sum at read, we think
	B's scan_stat = B's scan_stat + C's scan_stat
But in precice, this is

	B's scan_stat = B's scan_stat caused by B +
			B's scan_stat caused by A +
			C's scan_stat caused by C +
			C's scan_stat caused by B +
			C's scan_stat caused by A.

In orignal version.
	B's scan_stat = B's scan_stat caused by B +
			C's scan_stat caused by B +

After this patch,
	B's scan_stat = B's scan_stat caused by B +
			B's scan_stat caused by A +
			C's scan_stat caused by C +
			C's scan_stat caused by B +
			C's scan_stat caused by A.

Hmm...removing hierarchy part completely seems fine to me.


> I don't get why this has to be done completely different from the way
> we usually do things, without any justification, whatsoever.
> 
> Why do you want to pass a recording structure down the reclaim stack?

Just for reducing number of passed variables.

> Why not make it per-cpu counters that are only summed up, together
> with the hierarchy values, when someone is actually interested in
> them?  With an interface like mem_cgroup_count_vm_event(), or maybe
> even an extension of that function?

percpu counter seems overkill to me because there is no heavy lock contention.


Thanks,
-Kame





^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30  7:20             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  7:20 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 09:04:24 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 29 Aug 2011 17:51:13 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Mon, 8 Aug 2011 14:43:33 +0200
> > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > 
> > > > > On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > +When under_hierarchy is added in the tail, the number indicates the
> > > > > > +total memcg scan of its children and itself.
> > > > > 
> > > > > In your implementation, statistics are only accounted to the memcg
> > > > > triggering the limit and the respectively scanned memcgs.
> > > > > 
> > > > > Consider the following setup:
> > > > > 
> > > > >         A
> > > > >        / \
> > > > >       B   C
> > > > >      /
> > > > >     D
> > > > > 
> > > > > If D tries to charge but hits the limit of A, then B's hierarchy
> > > > > counters do not reflect the reclaim activity resulting in D.
> > > > > 
> > > > yes, as I expected.
> > > 
> > > Andrew,
> > > 
> > > with a flawed design, the author unwilling to fix it, and two NAKs,
> > > can we please revert this before the release?
> > 
> > How about this ?
> 
> > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> >  	spin_lock(&memcg->scanstat.lock);
> >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> >  	spin_unlock(&memcg->scanstat.lock);
> > -
> > -	memcg = rec->root;
> > -	spin_lock(&memcg->scanstat.lock);
> > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > -	spin_unlock(&memcg->scanstat.lock);
> > +	cgroup = memcg->css.cgroup;
> > +	do {
> > +		spin_lock(&memcg->scanstat.lock);
> > +		__mem_cgroup_record_scanstat(
> > +			memcg->scanstat.hierarchy_stats[context], rec);
> > +		spin_unlock(&memcg->scanstat.lock);
> > +		if (!cgroup->parent)
> > +			break;
> > +		cgroup = cgroup->parent;
> > +		memcg = mem_cgroup_from_cont(cgroup);
> > +	} while (memcg->use_hierarchy && memcg != rec->root);
> 
> Okay, so this looks correct, but it sums up all parents after each
> memcg scanned, which could have a performance impact.  Usually,
> hierarchy statistics are only summed up when a user reads them.
> 
Hmm. But sum-at-read doesn't work.

Assume 3 cgroups in a hierarchy.

	A
       /
      B
     /
    C

C's scan contains 3 causes.
	C's scan caused by limit of A.
	C's scan caused by limit of B.
	C's scan caused by limit of C.

If we make hierarchy sum at read, we think
	B's scan_stat = B's scan_stat + C's scan_stat
But in precice, this is

	B's scan_stat = B's scan_stat caused by B +
			B's scan_stat caused by A +
			C's scan_stat caused by C +
			C's scan_stat caused by B +
			C's scan_stat caused by A.

In orignal version.
	B's scan_stat = B's scan_stat caused by B +
			C's scan_stat caused by B +

After this patch,
	B's scan_stat = B's scan_stat caused by B +
			B's scan_stat caused by A +
			C's scan_stat caused by C +
			C's scan_stat caused by B +
			C's scan_stat caused by A.

Hmm...removing hierarchy part completely seems fine to me.


> I don't get why this has to be done completely different from the way
> we usually do things, without any justification, whatsoever.
> 
> Why do you want to pass a recording structure down the reclaim stack?

Just for reducing number of passed variables.

> Why not make it per-cpu counters that are only summed up, together
> with the hierarchy values, when someone is actually interested in
> them?  With an interface like mem_cgroup_count_vm_event(), or maybe
> even an extension of that function?

percpu counter seems overkill to me because there is no heavy lock contention.


Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30  7:20             ` KAMEZAWA Hiroyuki
@ 2011-08-30  7:35               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  7:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Daisuke Nishimura, Balbir Singh,
	Andrew Brestic, Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 16:20:50 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Hmm...removing hierarchy part completely seems fine to me.
> 
Another idea here.

==
Revert hierarchy support in vmscan_stat.

It turns out to be further study/use-case is required.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/memory.txt |   27 ++-------------------------
 include/linux/memcontrol.h       |    1 -
 mm/memcontrol.c                  |   25 -------------------------
 3 files changed, 2 insertions(+), 51 deletions(-)

Index: mmotm-Aug29/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-Aug29.orig/Documentation/cgroups/memory.txt
+++ mmotm-Aug29/Documentation/cgroups/memory.txt
@@ -448,8 +448,8 @@ memory cgroup creation and can be reset 
 
 This file contains following statistics.
 
-[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
-[param]_elapsed_ns_by_[reason]_[under_hierarchy]
+[param]_[file_or_anon]_pages_by_[reason]
+[param]_elapsed_ns_by_[reason]
 
 For example,
 
@@ -470,9 +470,6 @@ Now, 2 reason are supported
   system - global memory pressure + softlimit
            (global memory pressure not under softlimit is not handled now)
 
-When under_hierarchy is added in the tail, the number indicates the
-total memcg scan of its children and itself.
-
 elapsed_ns is a elapsed time in nanosecond. This may include sleep time
 and not indicates CPU usage. So, please take this as just showing
 latency.
@@ -500,26 +497,6 @@ freed_pages_by_system 0
 freed_anon_pages_by_system 0
 freed_file_pages_by_system 0
 elapsed_ns_by_system 0
-scanned_pages_by_limit_under_hierarchy 9471864
-scanned_anon_pages_by_limit_under_hierarchy 6640629
-scanned_file_pages_by_limit_under_hierarchy 2831235
-rotated_pages_by_limit_under_hierarchy 4243974
-rotated_anon_pages_by_limit_under_hierarchy 3971968
-rotated_file_pages_by_limit_under_hierarchy 272006
-freed_pages_by_limit_under_hierarchy 2318492
-freed_anon_pages_by_limit_under_hierarchy 962052
-freed_file_pages_by_limit_under_hierarchy 1356440
-elapsed_ns_by_limit_under_hierarchy 351386416101
-scanned_pages_by_system_under_hierarchy 0
-scanned_anon_pages_by_system_under_hierarchy 0
-scanned_file_pages_by_system_under_hierarchy 0
-rotated_pages_by_system_under_hierarchy 0
-rotated_anon_pages_by_system_under_hierarchy 0
-rotated_file_pages_by_system_under_hierarchy 0
-freed_pages_by_system_under_hierarchy 0
-freed_anon_pages_by_system_under_hierarchy 0
-freed_file_pages_by_system_under_hierarchy 0
-elapsed_ns_by_system_under_hierarchy 0
 
 5.3 swappiness
 
Index: mmotm-Aug29/mm/memcontrol.c
===================================================================
--- mmotm-Aug29.orig/mm/memcontrol.c
+++ mmotm-Aug29/mm/memcontrol.c
@@ -229,7 +229,6 @@ enum {
 struct scanstat {
 	spinlock_t	lock;
 	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
 };
 
 const char *scanstat_string[NR_SCANSTATS] = {
@@ -246,7 +245,6 @@ const char *scanstat_string[NR_SCANSTATS
 };
 #define SCANSTAT_WORD_LIMIT	"_by_limit"
 #define SCANSTAT_WORD_SYSTEM	"_by_system"
-#define SCANSTAT_WORD_HIERARCHY	"_under_hierarchy"
 
 
 /*
@@ -1710,11 +1708,6 @@ static void mem_cgroup_record_scanstat(s
 	spin_lock(&memcg->scanstat.lock);
 	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
 	spin_unlock(&memcg->scanstat.lock);
-
-	memcg = rec->root;
-	spin_lock(&memcg->scanstat.lock);
-	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
-	spin_unlock(&memcg->scanstat.lock);
 }
 
 /*
@@ -1758,8 +1751,6 @@ static int mem_cgroup_hierarchical_recla
 	else
 		rec.context = SCAN_BY_LIMIT;
 
-	rec.root = root_memcg;
-
 	while (1) {
 		victim = mem_cgroup_select_victim(root_memcg);
 		if (victim == root_memcg) {
@@ -4728,20 +4719,6 @@ static int mem_cgroup_vmscan_stat_read(s
 		cb->fill(cb, string, memcg->scanstat.stats[SCAN_BY_SYSTEM][i]);
 	}
 
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_LIMIT);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_LIMIT][i]);
-	}
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_SYSTEM);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
-	}
 	return 0;
 }
 
@@ -4752,8 +4729,6 @@ static int mem_cgroup_reset_vmscan_stat(
 
 	spin_lock(&memcg->scanstat.lock);
 	memset(&memcg->scanstat.stats, 0, sizeof(memcg->scanstat.stats));
-	memset(&memcg->scanstat.rootstats,
-		0, sizeof(memcg->scanstat.rootstats));
 	spin_unlock(&memcg->scanstat.lock);
 	return 0;
 }
Index: mmotm-Aug29/include/linux/memcontrol.h
===================================================================
--- mmotm-Aug29.orig/include/linux/memcontrol.h
+++ mmotm-Aug29/include/linux/memcontrol.h
@@ -42,7 +42,6 @@ extern unsigned long mem_cgroup_isolate_
 
 struct memcg_scanrecord {
 	struct mem_cgroup *mem; /* scanend memory cgroup */
-	struct mem_cgroup *root; /* scan target hierarchy root */
 	int context;		/* scanning context (see memcontrol.c) */
 	unsigned long nr_scanned[2]; /* the number of scanned pages */
 	unsigned long nr_rotated[2]; /* the number of rotated pages */



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30  7:35               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  7:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Daisuke Nishimura, Balbir Singh,
	Andrew Brestic, Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 16:20:50 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Hmm...removing hierarchy part completely seems fine to me.
> 
Another idea here.

==
Revert hierarchy support in vmscan_stat.

It turns out to be further study/use-case is required.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/memory.txt |   27 ++-------------------------
 include/linux/memcontrol.h       |    1 -
 mm/memcontrol.c                  |   25 -------------------------
 3 files changed, 2 insertions(+), 51 deletions(-)

Index: mmotm-Aug29/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-Aug29.orig/Documentation/cgroups/memory.txt
+++ mmotm-Aug29/Documentation/cgroups/memory.txt
@@ -448,8 +448,8 @@ memory cgroup creation and can be reset 
 
 This file contains following statistics.
 
-[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
-[param]_elapsed_ns_by_[reason]_[under_hierarchy]
+[param]_[file_or_anon]_pages_by_[reason]
+[param]_elapsed_ns_by_[reason]
 
 For example,
 
@@ -470,9 +470,6 @@ Now, 2 reason are supported
   system - global memory pressure + softlimit
            (global memory pressure not under softlimit is not handled now)
 
-When under_hierarchy is added in the tail, the number indicates the
-total memcg scan of its children and itself.
-
 elapsed_ns is a elapsed time in nanosecond. This may include sleep time
 and not indicates CPU usage. So, please take this as just showing
 latency.
@@ -500,26 +497,6 @@ freed_pages_by_system 0
 freed_anon_pages_by_system 0
 freed_file_pages_by_system 0
 elapsed_ns_by_system 0
-scanned_pages_by_limit_under_hierarchy 9471864
-scanned_anon_pages_by_limit_under_hierarchy 6640629
-scanned_file_pages_by_limit_under_hierarchy 2831235
-rotated_pages_by_limit_under_hierarchy 4243974
-rotated_anon_pages_by_limit_under_hierarchy 3971968
-rotated_file_pages_by_limit_under_hierarchy 272006
-freed_pages_by_limit_under_hierarchy 2318492
-freed_anon_pages_by_limit_under_hierarchy 962052
-freed_file_pages_by_limit_under_hierarchy 1356440
-elapsed_ns_by_limit_under_hierarchy 351386416101
-scanned_pages_by_system_under_hierarchy 0
-scanned_anon_pages_by_system_under_hierarchy 0
-scanned_file_pages_by_system_under_hierarchy 0
-rotated_pages_by_system_under_hierarchy 0
-rotated_anon_pages_by_system_under_hierarchy 0
-rotated_file_pages_by_system_under_hierarchy 0
-freed_pages_by_system_under_hierarchy 0
-freed_anon_pages_by_system_under_hierarchy 0
-freed_file_pages_by_system_under_hierarchy 0
-elapsed_ns_by_system_under_hierarchy 0
 
 5.3 swappiness
 
Index: mmotm-Aug29/mm/memcontrol.c
===================================================================
--- mmotm-Aug29.orig/mm/memcontrol.c
+++ mmotm-Aug29/mm/memcontrol.c
@@ -229,7 +229,6 @@ enum {
 struct scanstat {
 	spinlock_t	lock;
 	unsigned long	stats[NR_SCAN_CONTEXT][NR_SCANSTATS];
-	unsigned long	rootstats[NR_SCAN_CONTEXT][NR_SCANSTATS];
 };
 
 const char *scanstat_string[NR_SCANSTATS] = {
@@ -246,7 +245,6 @@ const char *scanstat_string[NR_SCANSTATS
 };
 #define SCANSTAT_WORD_LIMIT	"_by_limit"
 #define SCANSTAT_WORD_SYSTEM	"_by_system"
-#define SCANSTAT_WORD_HIERARCHY	"_under_hierarchy"
 
 
 /*
@@ -1710,11 +1708,6 @@ static void mem_cgroup_record_scanstat(s
 	spin_lock(&memcg->scanstat.lock);
 	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
 	spin_unlock(&memcg->scanstat.lock);
-
-	memcg = rec->root;
-	spin_lock(&memcg->scanstat.lock);
-	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
-	spin_unlock(&memcg->scanstat.lock);
 }
 
 /*
@@ -1758,8 +1751,6 @@ static int mem_cgroup_hierarchical_recla
 	else
 		rec.context = SCAN_BY_LIMIT;
 
-	rec.root = root_memcg;
-
 	while (1) {
 		victim = mem_cgroup_select_victim(root_memcg);
 		if (victim == root_memcg) {
@@ -4728,20 +4719,6 @@ static int mem_cgroup_vmscan_stat_read(s
 		cb->fill(cb, string, memcg->scanstat.stats[SCAN_BY_SYSTEM][i]);
 	}
 
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_LIMIT);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_LIMIT][i]);
-	}
-	for (i = 0; i < NR_SCANSTATS; i++) {
-		strcpy(string, scanstat_string[i]);
-		strcat(string, SCANSTAT_WORD_SYSTEM);
-		strcat(string, SCANSTAT_WORD_HIERARCHY);
-		cb->fill(cb,
-			string, memcg->scanstat.rootstats[SCAN_BY_SYSTEM][i]);
-	}
 	return 0;
 }
 
@@ -4752,8 +4729,6 @@ static int mem_cgroup_reset_vmscan_stat(
 
 	spin_lock(&memcg->scanstat.lock);
 	memset(&memcg->scanstat.stats, 0, sizeof(memcg->scanstat.stats));
-	memset(&memcg->scanstat.rootstats,
-		0, sizeof(memcg->scanstat.rootstats));
 	spin_unlock(&memcg->scanstat.lock);
 	return 0;
 }
Index: mmotm-Aug29/include/linux/memcontrol.h
===================================================================
--- mmotm-Aug29.orig/include/linux/memcontrol.h
+++ mmotm-Aug29/include/linux/memcontrol.h
@@ -42,7 +42,6 @@ extern unsigned long mem_cgroup_isolate_
 
 struct memcg_scanrecord {
 	struct mem_cgroup *mem; /* scanend memory cgroup */
-	struct mem_cgroup *root; /* scan target hierarchy root */
 	int context;		/* scanning context (see memcontrol.c) */
 	unsigned long nr_scanned[2]; /* the number of scanned pages */
 	unsigned long nr_rotated[2]; /* the number of rotated pages */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30  7:20             ` KAMEZAWA Hiroyuki
@ 2011-08-30  8:42               ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30  8:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 09:04:24 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > >  	spin_lock(&memcg->scanstat.lock);
> > >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > >  	spin_unlock(&memcg->scanstat.lock);
> > > -
> > > -	memcg = rec->root;
> > > -	spin_lock(&memcg->scanstat.lock);
> > > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > -	spin_unlock(&memcg->scanstat.lock);
> > > +	cgroup = memcg->css.cgroup;
> > > +	do {
> > > +		spin_lock(&memcg->scanstat.lock);
> > > +		__mem_cgroup_record_scanstat(
> > > +			memcg->scanstat.hierarchy_stats[context], rec);
> > > +		spin_unlock(&memcg->scanstat.lock);
> > > +		if (!cgroup->parent)
> > > +			break;
> > > +		cgroup = cgroup->parent;
> > > +		memcg = mem_cgroup_from_cont(cgroup);
> > > +	} while (memcg->use_hierarchy && memcg != rec->root);
> > 
> > Okay, so this looks correct, but it sums up all parents after each
> > memcg scanned, which could have a performance impact.  Usually,
> > hierarchy statistics are only summed up when a user reads them.
> > 
> Hmm. But sum-at-read doesn't work.
> 
> Assume 3 cgroups in a hierarchy.
> 
> 	A
>        /
>       B
>      /
>     C
> 
> C's scan contains 3 causes.
> 	C's scan caused by limit of A.
> 	C's scan caused by limit of B.
> 	C's scan caused by limit of C.
>
> If we make hierarchy sum at read, we think
> 	B's scan_stat = B's scan_stat + C's scan_stat
> But in precice, this is
> 
> 	B's scan_stat = B's scan_stat caused by B +
> 			B's scan_stat caused by A +
> 			C's scan_stat caused by C +
> 			C's scan_stat caused by B +
> 			C's scan_stat caused by A.
> 
> In orignal version.
> 	B's scan_stat = B's scan_stat caused by B +
> 			C's scan_stat caused by B +
> 
> After this patch,
> 	B's scan_stat = B's scan_stat caused by B +
> 			B's scan_stat caused by A +
> 			C's scan_stat caused by C +
> 			C's scan_stat caused by B +
> 			C's scan_stat caused by A.
> 
> Hmm...removing hierarchy part completely seems fine to me.

I see.

You want to look at A and see whether its limit was responsible for
reclaim scans in any children.  IMO, that is asking the question
backwards.  Instead, there is a cgroup under reclaim and one wants to
find out the cause for that.  Not the other way round.

In my original proposal I suggested differentiating reclaim caused by
internal pressure (due to own limit) and reclaim caused by
external/hierarchical pressure (due to limits from parents).

If you want to find out why C is under reclaim, look at its reclaim
statistics.  If the _limit numbers are high, C's limit is the problem.
If the _hierarchical numbers are high, the problem is B, A, or
physical memory, so you check B for _limit and _hierarchical as well,
then move on to A.

Implementing this would be as easy as passing not only the memcg to
scan (victim) to the reclaim code, but also the memcg /causing/ the
reclaim (root_mem):

	root_mem == victim -> account to victim as _limit
	root_mem != victim -> account to victim as _hierarchical

This would make things much simpler and more natural, both the code
and the way of tracking down a problem, IMO.

> > I don't get why this has to be done completely different from the way
> > we usually do things, without any justification, whatsoever.
> > 
> > Why do you want to pass a recording structure down the reclaim stack?
> 
> Just for reducing number of passed variables.

It's still sitting on bottom of the reclaim stack the whole time.

With my proposal, you would only need to pass the extra root_mem
pointer.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30  8:42               ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30  8:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 09:04:24 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > >  	spin_lock(&memcg->scanstat.lock);
> > >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > >  	spin_unlock(&memcg->scanstat.lock);
> > > -
> > > -	memcg = rec->root;
> > > -	spin_lock(&memcg->scanstat.lock);
> > > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > -	spin_unlock(&memcg->scanstat.lock);
> > > +	cgroup = memcg->css.cgroup;
> > > +	do {
> > > +		spin_lock(&memcg->scanstat.lock);
> > > +		__mem_cgroup_record_scanstat(
> > > +			memcg->scanstat.hierarchy_stats[context], rec);
> > > +		spin_unlock(&memcg->scanstat.lock);
> > > +		if (!cgroup->parent)
> > > +			break;
> > > +		cgroup = cgroup->parent;
> > > +		memcg = mem_cgroup_from_cont(cgroup);
> > > +	} while (memcg->use_hierarchy && memcg != rec->root);
> > 
> > Okay, so this looks correct, but it sums up all parents after each
> > memcg scanned, which could have a performance impact.  Usually,
> > hierarchy statistics are only summed up when a user reads them.
> > 
> Hmm. But sum-at-read doesn't work.
> 
> Assume 3 cgroups in a hierarchy.
> 
> 	A
>        /
>       B
>      /
>     C
> 
> C's scan contains 3 causes.
> 	C's scan caused by limit of A.
> 	C's scan caused by limit of B.
> 	C's scan caused by limit of C.
>
> If we make hierarchy sum at read, we think
> 	B's scan_stat = B's scan_stat + C's scan_stat
> But in precice, this is
> 
> 	B's scan_stat = B's scan_stat caused by B +
> 			B's scan_stat caused by A +
> 			C's scan_stat caused by C +
> 			C's scan_stat caused by B +
> 			C's scan_stat caused by A.
> 
> In orignal version.
> 	B's scan_stat = B's scan_stat caused by B +
> 			C's scan_stat caused by B +
> 
> After this patch,
> 	B's scan_stat = B's scan_stat caused by B +
> 			B's scan_stat caused by A +
> 			C's scan_stat caused by C +
> 			C's scan_stat caused by B +
> 			C's scan_stat caused by A.
> 
> Hmm...removing hierarchy part completely seems fine to me.

I see.

You want to look at A and see whether its limit was responsible for
reclaim scans in any children.  IMO, that is asking the question
backwards.  Instead, there is a cgroup under reclaim and one wants to
find out the cause for that.  Not the other way round.

In my original proposal I suggested differentiating reclaim caused by
internal pressure (due to own limit) and reclaim caused by
external/hierarchical pressure (due to limits from parents).

If you want to find out why C is under reclaim, look at its reclaim
statistics.  If the _limit numbers are high, C's limit is the problem.
If the _hierarchical numbers are high, the problem is B, A, or
physical memory, so you check B for _limit and _hierarchical as well,
then move on to A.

Implementing this would be as easy as passing not only the memcg to
scan (victim) to the reclaim code, but also the memcg /causing/ the
reclaim (root_mem):

	root_mem == victim -> account to victim as _limit
	root_mem != victim -> account to victim as _hierarchical

This would make things much simpler and more natural, both the code
and the way of tracking down a problem, IMO.

> > I don't get why this has to be done completely different from the way
> > we usually do things, without any justification, whatsoever.
> > 
> > Why do you want to pass a recording structure down the reclaim stack?
> 
> Just for reducing number of passed variables.

It's still sitting on bottom of the reclaim stack the whole time.

With my proposal, you would only need to pass the extra root_mem
pointer.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30  8:42               ` Johannes Weiner
@ 2011-08-30  8:56                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  8:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 10:42:45 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 09:04:24 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > > >  	spin_lock(&memcg->scanstat.lock);
> > > >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > > >  	spin_unlock(&memcg->scanstat.lock);
> > > > -
> > > > -	memcg = rec->root;
> > > > -	spin_lock(&memcg->scanstat.lock);
> > > > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > > -	spin_unlock(&memcg->scanstat.lock);
> > > > +	cgroup = memcg->css.cgroup;
> > > > +	do {
> > > > +		spin_lock(&memcg->scanstat.lock);
> > > > +		__mem_cgroup_record_scanstat(
> > > > +			memcg->scanstat.hierarchy_stats[context], rec);
> > > > +		spin_unlock(&memcg->scanstat.lock);
> > > > +		if (!cgroup->parent)
> > > > +			break;
> > > > +		cgroup = cgroup->parent;
> > > > +		memcg = mem_cgroup_from_cont(cgroup);
> > > > +	} while (memcg->use_hierarchy && memcg != rec->root);
> > > 
> > > Okay, so this looks correct, but it sums up all parents after each
> > > memcg scanned, which could have a performance impact.  Usually,
> > > hierarchy statistics are only summed up when a user reads them.
> > > 
> > Hmm. But sum-at-read doesn't work.
> > 
> > Assume 3 cgroups in a hierarchy.
> > 
> > 	A
> >        /
> >       B
> >      /
> >     C
> > 
> > C's scan contains 3 causes.
> > 	C's scan caused by limit of A.
> > 	C's scan caused by limit of B.
> > 	C's scan caused by limit of C.
> >
> > If we make hierarchy sum at read, we think
> > 	B's scan_stat = B's scan_stat + C's scan_stat
> > But in precice, this is
> > 
> > 	B's scan_stat = B's scan_stat caused by B +
> > 			B's scan_stat caused by A +
> > 			C's scan_stat caused by C +
> > 			C's scan_stat caused by B +
> > 			C's scan_stat caused by A.
> > 
> > In orignal version.
> > 	B's scan_stat = B's scan_stat caused by B +
> > 			C's scan_stat caused by B +
> > 
> > After this patch,
> > 	B's scan_stat = B's scan_stat caused by B +
> > 			B's scan_stat caused by A +
> > 			C's scan_stat caused by C +
> > 			C's scan_stat caused by B +
> > 			C's scan_stat caused by A.
> > 
> > Hmm...removing hierarchy part completely seems fine to me.
> 
> I see.
> 
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children.  IMO, that is asking the question
> backwards.  Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that.  Not the other way round.
> 
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
> 
> If you want to find out why C is under reclaim, look at its reclaim
> statistics.  If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
> 
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
> 
> 	root_mem == victim -> account to victim as _limit
> 	root_mem != victim -> account to victim as _hierarchical
> 
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.
> 

hmm. I have no strong opinion.


> > > I don't get why this has to be done completely different from the way
> > > we usually do things, without any justification, whatsoever.
> > > 
> > > Why do you want to pass a recording structure down the reclaim stack?
> > 
> > Just for reducing number of passed variables.
> 
> It's still sitting on bottom of the reclaim stack the whole time.
> 
> With my proposal, you would only need to pass the extra root_mem
> pointer.
> 

I'm sorry I miss something. Do you say to add a function like

mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
                               file_scan, file_free, elapsed_ns)

?

I'll prepare a patch, tomorrow.

Thanks,
-Kame







^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30  8:56                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30  8:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 10:42:45 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 09:04:24 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > > >  	spin_lock(&memcg->scanstat.lock);
> > > >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > > >  	spin_unlock(&memcg->scanstat.lock);
> > > > -
> > > > -	memcg = rec->root;
> > > > -	spin_lock(&memcg->scanstat.lock);
> > > > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > > -	spin_unlock(&memcg->scanstat.lock);
> > > > +	cgroup = memcg->css.cgroup;
> > > > +	do {
> > > > +		spin_lock(&memcg->scanstat.lock);
> > > > +		__mem_cgroup_record_scanstat(
> > > > +			memcg->scanstat.hierarchy_stats[context], rec);
> > > > +		spin_unlock(&memcg->scanstat.lock);
> > > > +		if (!cgroup->parent)
> > > > +			break;
> > > > +		cgroup = cgroup->parent;
> > > > +		memcg = mem_cgroup_from_cont(cgroup);
> > > > +	} while (memcg->use_hierarchy && memcg != rec->root);
> > > 
> > > Okay, so this looks correct, but it sums up all parents after each
> > > memcg scanned, which could have a performance impact.  Usually,
> > > hierarchy statistics are only summed up when a user reads them.
> > > 
> > Hmm. But sum-at-read doesn't work.
> > 
> > Assume 3 cgroups in a hierarchy.
> > 
> > 	A
> >        /
> >       B
> >      /
> >     C
> > 
> > C's scan contains 3 causes.
> > 	C's scan caused by limit of A.
> > 	C's scan caused by limit of B.
> > 	C's scan caused by limit of C.
> >
> > If we make hierarchy sum at read, we think
> > 	B's scan_stat = B's scan_stat + C's scan_stat
> > But in precice, this is
> > 
> > 	B's scan_stat = B's scan_stat caused by B +
> > 			B's scan_stat caused by A +
> > 			C's scan_stat caused by C +
> > 			C's scan_stat caused by B +
> > 			C's scan_stat caused by A.
> > 
> > In orignal version.
> > 	B's scan_stat = B's scan_stat caused by B +
> > 			C's scan_stat caused by B +
> > 
> > After this patch,
> > 	B's scan_stat = B's scan_stat caused by B +
> > 			B's scan_stat caused by A +
> > 			C's scan_stat caused by C +
> > 			C's scan_stat caused by B +
> > 			C's scan_stat caused by A.
> > 
> > Hmm...removing hierarchy part completely seems fine to me.
> 
> I see.
> 
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children.  IMO, that is asking the question
> backwards.  Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that.  Not the other way round.
> 
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
> 
> If you want to find out why C is under reclaim, look at its reclaim
> statistics.  If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
> 
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
> 
> 	root_mem == victim -> account to victim as _limit
> 	root_mem != victim -> account to victim as _hierarchical
> 
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.
> 

hmm. I have no strong opinion.


> > > I don't get why this has to be done completely different from the way
> > > we usually do things, without any justification, whatsoever.
> > > 
> > > Why do you want to pass a recording structure down the reclaim stack?
> > 
> > Just for reducing number of passed variables.
> 
> It's still sitting on bottom of the reclaim stack the whole time.
> 
> With my proposal, you would only need to pass the extra root_mem
> pointer.
> 

I'm sorry I miss something. Do you say to add a function like

mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
                               file_scan, file_free, elapsed_ns)

?

I'll prepare a patch, tomorrow.

Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30  8:56                 ` KAMEZAWA Hiroyuki
@ 2011-08-30 10:17                   ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30 10:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 10:42:45 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 09:04:24 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > > > >  	spin_lock(&memcg->scanstat.lock);
> > > > >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > > > >  	spin_unlock(&memcg->scanstat.lock);
> > > > > -
> > > > > -	memcg = rec->root;
> > > > > -	spin_lock(&memcg->scanstat.lock);
> > > > > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > > > -	spin_unlock(&memcg->scanstat.lock);
> > > > > +	cgroup = memcg->css.cgroup;
> > > > > +	do {
> > > > > +		spin_lock(&memcg->scanstat.lock);
> > > > > +		__mem_cgroup_record_scanstat(
> > > > > +			memcg->scanstat.hierarchy_stats[context], rec);
> > > > > +		spin_unlock(&memcg->scanstat.lock);
> > > > > +		if (!cgroup->parent)
> > > > > +			break;
> > > > > +		cgroup = cgroup->parent;
> > > > > +		memcg = mem_cgroup_from_cont(cgroup);
> > > > > +	} while (memcg->use_hierarchy && memcg != rec->root);
> > > > 
> > > > Okay, so this looks correct, but it sums up all parents after each
> > > > memcg scanned, which could have a performance impact.  Usually,
> > > > hierarchy statistics are only summed up when a user reads them.
> > > > 
> > > Hmm. But sum-at-read doesn't work.
> > > 
> > > Assume 3 cgroups in a hierarchy.
> > > 
> > > 	A
> > >        /
> > >       B
> > >      /
> > >     C
> > > 
> > > C's scan contains 3 causes.
> > > 	C's scan caused by limit of A.
> > > 	C's scan caused by limit of B.
> > > 	C's scan caused by limit of C.
> > >
> > > If we make hierarchy sum at read, we think
> > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > But in precice, this is
> > > 
> > > 	B's scan_stat = B's scan_stat caused by B +
> > > 			B's scan_stat caused by A +
> > > 			C's scan_stat caused by C +
> > > 			C's scan_stat caused by B +
> > > 			C's scan_stat caused by A.
> > > 
> > > In orignal version.
> > > 	B's scan_stat = B's scan_stat caused by B +
> > > 			C's scan_stat caused by B +
> > > 
> > > After this patch,
> > > 	B's scan_stat = B's scan_stat caused by B +
> > > 			B's scan_stat caused by A +
> > > 			C's scan_stat caused by C +
> > > 			C's scan_stat caused by B +
> > > 			C's scan_stat caused by A.
> > > 
> > > Hmm...removing hierarchy part completely seems fine to me.
> > 
> > I see.
> > 
> > You want to look at A and see whether its limit was responsible for
> > reclaim scans in any children.  IMO, that is asking the question
> > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > find out the cause for that.  Not the other way round.
> > 
> > In my original proposal I suggested differentiating reclaim caused by
> > internal pressure (due to own limit) and reclaim caused by
> > external/hierarchical pressure (due to limits from parents).
> > 
> > If you want to find out why C is under reclaim, look at its reclaim
> > statistics.  If the _limit numbers are high, C's limit is the problem.
> > If the _hierarchical numbers are high, the problem is B, A, or
> > physical memory, so you check B for _limit and _hierarchical as well,
> > then move on to A.
> > 
> > Implementing this would be as easy as passing not only the memcg to
> > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > reclaim (root_mem):
> > 
> > 	root_mem == victim -> account to victim as _limit
> > 	root_mem != victim -> account to victim as _hierarchical
> > 
> > This would make things much simpler and more natural, both the code
> > and the way of tracking down a problem, IMO.
> 
> hmm. I have no strong opinion.

I do :-)

> > > > I don't get why this has to be done completely different from the way
> > > > we usually do things, without any justification, whatsoever.
> > > > 
> > > > Why do you want to pass a recording structure down the reclaim stack?
> > > 
> > > Just for reducing number of passed variables.
> > 
> > It's still sitting on bottom of the reclaim stack the whole time.
> > 
> > With my proposal, you would only need to pass the extra root_mem
> > pointer.
> 
> I'm sorry I miss something. Do you say to add a function like
> 
> mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
>                                file_scan, file_free, elapsed_ns)
> 
> ?

Exactly, though passing it a stat item index and a delta would
probably be closer to our other statistics accounting, i.e.:

	mem_cgroup_record_reclaim_stat(sc->mem_cgroup, sc->root_mem_cgroup,
				       MEM_CGROUP_SCAN_ANON, *nr_anon);

where sc->mem_cgroup is `victim' and sc->root_mem_cgroup is `root_mem'
from hierarchical_reclaim.  ->root_mem_cgroup might be confusing,
though.  I named it ->target_mem_cgroup in my patch set but I don't
feel too strongly about that.

Even better would be to reuse enum vm_event_item and at one point
merge all the accounting stuff into a single function and have one
single set of events that makes sense on a global level as well as on
a per-memcg level.

There is deviation and implementing similar things twice with slight
variations and I don't see any justification for all that extra code
that needs maintaining.  Or counters that have similar names globally
and on a per-memcg level but with different meanings, like the rotated
counter.  Globally, a rotated page (PGROTATED) is one that is moved
back to the inactive list after writeback finishes.  Per-memcg, the
rotated counter is our internal heuristics value to balance pressure
between LRUs and means either rotated on the inactive list, activated,
not activated but countes as activated because of VM_EXEC etc.

I am still for reverting this patch before the release until we have
this all sorted out.  I feel rather strongly that these statistics are
in no way ready to make them part of the ABI and export them to
userspace as they are now.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30 10:17                   ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30 10:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 10:42:45 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 09:04:24 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > > > >  	spin_lock(&memcg->scanstat.lock);
> > > > >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > > > >  	spin_unlock(&memcg->scanstat.lock);
> > > > > -
> > > > > -	memcg = rec->root;
> > > > > -	spin_lock(&memcg->scanstat.lock);
> > > > > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > > > -	spin_unlock(&memcg->scanstat.lock);
> > > > > +	cgroup = memcg->css.cgroup;
> > > > > +	do {
> > > > > +		spin_lock(&memcg->scanstat.lock);
> > > > > +		__mem_cgroup_record_scanstat(
> > > > > +			memcg->scanstat.hierarchy_stats[context], rec);
> > > > > +		spin_unlock(&memcg->scanstat.lock);
> > > > > +		if (!cgroup->parent)
> > > > > +			break;
> > > > > +		cgroup = cgroup->parent;
> > > > > +		memcg = mem_cgroup_from_cont(cgroup);
> > > > > +	} while (memcg->use_hierarchy && memcg != rec->root);
> > > > 
> > > > Okay, so this looks correct, but it sums up all parents after each
> > > > memcg scanned, which could have a performance impact.  Usually,
> > > > hierarchy statistics are only summed up when a user reads them.
> > > > 
> > > Hmm. But sum-at-read doesn't work.
> > > 
> > > Assume 3 cgroups in a hierarchy.
> > > 
> > > 	A
> > >        /
> > >       B
> > >      /
> > >     C
> > > 
> > > C's scan contains 3 causes.
> > > 	C's scan caused by limit of A.
> > > 	C's scan caused by limit of B.
> > > 	C's scan caused by limit of C.
> > >
> > > If we make hierarchy sum at read, we think
> > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > But in precice, this is
> > > 
> > > 	B's scan_stat = B's scan_stat caused by B +
> > > 			B's scan_stat caused by A +
> > > 			C's scan_stat caused by C +
> > > 			C's scan_stat caused by B +
> > > 			C's scan_stat caused by A.
> > > 
> > > In orignal version.
> > > 	B's scan_stat = B's scan_stat caused by B +
> > > 			C's scan_stat caused by B +
> > > 
> > > After this patch,
> > > 	B's scan_stat = B's scan_stat caused by B +
> > > 			B's scan_stat caused by A +
> > > 			C's scan_stat caused by C +
> > > 			C's scan_stat caused by B +
> > > 			C's scan_stat caused by A.
> > > 
> > > Hmm...removing hierarchy part completely seems fine to me.
> > 
> > I see.
> > 
> > You want to look at A and see whether its limit was responsible for
> > reclaim scans in any children.  IMO, that is asking the question
> > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > find out the cause for that.  Not the other way round.
> > 
> > In my original proposal I suggested differentiating reclaim caused by
> > internal pressure (due to own limit) and reclaim caused by
> > external/hierarchical pressure (due to limits from parents).
> > 
> > If you want to find out why C is under reclaim, look at its reclaim
> > statistics.  If the _limit numbers are high, C's limit is the problem.
> > If the _hierarchical numbers are high, the problem is B, A, or
> > physical memory, so you check B for _limit and _hierarchical as well,
> > then move on to A.
> > 
> > Implementing this would be as easy as passing not only the memcg to
> > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > reclaim (root_mem):
> > 
> > 	root_mem == victim -> account to victim as _limit
> > 	root_mem != victim -> account to victim as _hierarchical
> > 
> > This would make things much simpler and more natural, both the code
> > and the way of tracking down a problem, IMO.
> 
> hmm. I have no strong opinion.

I do :-)

> > > > I don't get why this has to be done completely different from the way
> > > > we usually do things, without any justification, whatsoever.
> > > > 
> > > > Why do you want to pass a recording structure down the reclaim stack?
> > > 
> > > Just for reducing number of passed variables.
> > 
> > It's still sitting on bottom of the reclaim stack the whole time.
> > 
> > With my proposal, you would only need to pass the extra root_mem
> > pointer.
> 
> I'm sorry I miss something. Do you say to add a function like
> 
> mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
>                                file_scan, file_free, elapsed_ns)
> 
> ?

Exactly, though passing it a stat item index and a delta would
probably be closer to our other statistics accounting, i.e.:

	mem_cgroup_record_reclaim_stat(sc->mem_cgroup, sc->root_mem_cgroup,
				       MEM_CGROUP_SCAN_ANON, *nr_anon);

where sc->mem_cgroup is `victim' and sc->root_mem_cgroup is `root_mem'
from hierarchical_reclaim.  ->root_mem_cgroup might be confusing,
though.  I named it ->target_mem_cgroup in my patch set but I don't
feel too strongly about that.

Even better would be to reuse enum vm_event_item and at one point
merge all the accounting stuff into a single function and have one
single set of events that makes sense on a global level as well as on
a per-memcg level.

There is deviation and implementing similar things twice with slight
variations and I don't see any justification for all that extra code
that needs maintaining.  Or counters that have similar names globally
and on a per-memcg level but with different meanings, like the rotated
counter.  Globally, a rotated page (PGROTATED) is one that is moved
back to the inactive list after writeback finishes.  Per-memcg, the
rotated counter is our internal heuristics value to balance pressure
between LRUs and means either rotated on the inactive list, activated,
not activated but countes as activated because of VM_EXEC etc.

I am still for reverting this patch before the release until we have
this all sorted out.  I feel rather strongly that these statistics are
in no way ready to make them part of the ABI and export them to
userspace as they are now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30 10:17                   ` Johannes Weiner
@ 2011-08-30 10:34                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 10:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 12:17:26 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:

> > > > > I don't get why this has to be done completely different from the way
> > > > > we usually do things, without any justification, whatsoever.
> > > > > 
> > > > > Why do you want to pass a recording structure down the reclaim stack?
> > > > 
> > > > Just for reducing number of passed variables.
> > > 
> > > It's still sitting on bottom of the reclaim stack the whole time.
> > > 
> > > With my proposal, you would only need to pass the extra root_mem
> > > pointer.
> > 
> > I'm sorry I miss something. Do you say to add a function like
> > 
> > mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
> >                                file_scan, file_free, elapsed_ns)
> > 
> > ?
> 
> Exactly, though passing it a stat item index and a delta would
> probably be closer to our other statistics accounting, i.e.:
> 
> 	mem_cgroup_record_reclaim_stat(sc->mem_cgroup, sc->root_mem_cgroup,
> 				       MEM_CGROUP_SCAN_ANON, *nr_anon);
> 
> where sc->mem_cgroup is `victim' and sc->root_mem_cgroup is `root_mem'
> from hierarchical_reclaim.  ->root_mem_cgroup might be confusing,
> though.  I named it ->target_mem_cgroup in my patch set but I don't
> feel too strongly about that.
> 
> Even better would be to reuse enum vm_event_item and at one point
> merge all the accounting stuff into a single function and have one
> single set of events that makes sense on a global level as well as on
> a per-memcg level.
> 
> There is deviation and implementing similar things twice with slight
> variations and I don't see any justification for all that extra code
> that needs maintaining.  Or counters that have similar names globally
> and on a per-memcg level but with different meanings, like the rotated
> counter.  Globally, a rotated page (PGROTATED) is one that is moved
> back to the inactive list after writeback finishes.  Per-memcg, the
> rotated counter is our internal heuristics value to balance pressure
> between LRUs and means either rotated on the inactive list, activated,
> not activated but countes as activated because of VM_EXEC etc.
> 
> I am still for reverting this patch before the release until we have
> this all sorted out.  I feel rather strongly that these statistics are
> in no way ready to make them part of the ABI and export them to
> userspace as they are now.
> 

How about fixing interface first ? 1st version of this patch was 
in April and no big change since then.
I don't want to be starved more.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30 10:34                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 10:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 12:17:26 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:

> > > > > I don't get why this has to be done completely different from the way
> > > > > we usually do things, without any justification, whatsoever.
> > > > > 
> > > > > Why do you want to pass a recording structure down the reclaim stack?
> > > > 
> > > > Just for reducing number of passed variables.
> > > 
> > > It's still sitting on bottom of the reclaim stack the whole time.
> > > 
> > > With my proposal, you would only need to pass the extra root_mem
> > > pointer.
> > 
> > I'm sorry I miss something. Do you say to add a function like
> > 
> > mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
> >                                file_scan, file_free, elapsed_ns)
> > 
> > ?
> 
> Exactly, though passing it a stat item index and a delta would
> probably be closer to our other statistics accounting, i.e.:
> 
> 	mem_cgroup_record_reclaim_stat(sc->mem_cgroup, sc->root_mem_cgroup,
> 				       MEM_CGROUP_SCAN_ANON, *nr_anon);
> 
> where sc->mem_cgroup is `victim' and sc->root_mem_cgroup is `root_mem'
> from hierarchical_reclaim.  ->root_mem_cgroup might be confusing,
> though.  I named it ->target_mem_cgroup in my patch set but I don't
> feel too strongly about that.
> 
> Even better would be to reuse enum vm_event_item and at one point
> merge all the accounting stuff into a single function and have one
> single set of events that makes sense on a global level as well as on
> a per-memcg level.
> 
> There is deviation and implementing similar things twice with slight
> variations and I don't see any justification for all that extra code
> that needs maintaining.  Or counters that have similar names globally
> and on a per-memcg level but with different meanings, like the rotated
> counter.  Globally, a rotated page (PGROTATED) is one that is moved
> back to the inactive list after writeback finishes.  Per-memcg, the
> rotated counter is our internal heuristics value to balance pressure
> between LRUs and means either rotated on the inactive list, activated,
> not activated but countes as activated because of VM_EXEC etc.
> 
> I am still for reverting this patch before the release until we have
> this all sorted out.  I feel rather strongly that these statistics are
> in no way ready to make them part of the ABI and export them to
> userspace as they are now.
> 

How about fixing interface first ? 1st version of this patch was 
in April and no big change since then.
I don't want to be starved more.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30 10:17                   ` Johannes Weiner
@ 2011-08-30 10:38                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 10:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 12:17:26 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 10:42:45 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
 
> > > > Assume 3 cgroups in a hierarchy.
> > > > 
> > > > 	A
> > > >        /
> > > >       B
> > > >      /
> > > >     C
> > > > 
> > > > C's scan contains 3 causes.
> > > > 	C's scan caused by limit of A.
> > > > 	C's scan caused by limit of B.
> > > > 	C's scan caused by limit of C.
> > > >
> > > > If we make hierarchy sum at read, we think
> > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > But in precice, this is
> > > > 
> > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > 			B's scan_stat caused by A +
> > > > 			C's scan_stat caused by C +
> > > > 			C's scan_stat caused by B +
> > > > 			C's scan_stat caused by A.
> > > > 
> > > > In orignal version.
> > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > 			C's scan_stat caused by B +
> > > > 
> > > > After this patch,
> > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > 			B's scan_stat caused by A +
> > > > 			C's scan_stat caused by C +
> > > > 			C's scan_stat caused by B +
> > > > 			C's scan_stat caused by A.
> > > > 
> > > > Hmm...removing hierarchy part completely seems fine to me.
> > > 
> > > I see.
> > > 
> > > You want to look at A and see whether its limit was responsible for
> > > reclaim scans in any children.  IMO, that is asking the question
> > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > find out the cause for that.  Not the other way round.
> > > 
> > > In my original proposal I suggested differentiating reclaim caused by
> > > internal pressure (due to own limit) and reclaim caused by
> > > external/hierarchical pressure (due to limits from parents).
> > > 
> > > If you want to find out why C is under reclaim, look at its reclaim
> > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > If the _hierarchical numbers are high, the problem is B, A, or
> > > physical memory, so you check B for _limit and _hierarchical as well,
> > > then move on to A.
> > > 
> > > Implementing this would be as easy as passing not only the memcg to
> > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > reclaim (root_mem):
> > > 
> > > 	root_mem == victim -> account to victim as _limit
> > > 	root_mem != victim -> account to victim as _hierarchical
> > > 
> > > This would make things much simpler and more natural, both the code
> > > and the way of tracking down a problem, IMO.
> > 
> > hmm. I have no strong opinion.
> 
> I do :-)
> 
BTW,  how to calculate C's lru scan caused by A finally ?

            A
           /
          B
         /
        C

At scanning LRU of C because of A's limit, where stats are recorded ?

If we record it in C, we lose where the memory pressure comes from.
If we record it in A, we lose where scan happens.
I'm sorry I'm a little confused.

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30 10:38                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 10:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 12:17:26 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 10:42:45 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
 
> > > > Assume 3 cgroups in a hierarchy.
> > > > 
> > > > 	A
> > > >        /
> > > >       B
> > > >      /
> > > >     C
> > > > 
> > > > C's scan contains 3 causes.
> > > > 	C's scan caused by limit of A.
> > > > 	C's scan caused by limit of B.
> > > > 	C's scan caused by limit of C.
> > > >
> > > > If we make hierarchy sum at read, we think
> > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > But in precice, this is
> > > > 
> > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > 			B's scan_stat caused by A +
> > > > 			C's scan_stat caused by C +
> > > > 			C's scan_stat caused by B +
> > > > 			C's scan_stat caused by A.
> > > > 
> > > > In orignal version.
> > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > 			C's scan_stat caused by B +
> > > > 
> > > > After this patch,
> > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > 			B's scan_stat caused by A +
> > > > 			C's scan_stat caused by C +
> > > > 			C's scan_stat caused by B +
> > > > 			C's scan_stat caused by A.
> > > > 
> > > > Hmm...removing hierarchy part completely seems fine to me.
> > > 
> > > I see.
> > > 
> > > You want to look at A and see whether its limit was responsible for
> > > reclaim scans in any children.  IMO, that is asking the question
> > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > find out the cause for that.  Not the other way round.
> > > 
> > > In my original proposal I suggested differentiating reclaim caused by
> > > internal pressure (due to own limit) and reclaim caused by
> > > external/hierarchical pressure (due to limits from parents).
> > > 
> > > If you want to find out why C is under reclaim, look at its reclaim
> > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > If the _hierarchical numbers are high, the problem is B, A, or
> > > physical memory, so you check B for _limit and _hierarchical as well,
> > > then move on to A.
> > > 
> > > Implementing this would be as easy as passing not only the memcg to
> > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > reclaim (root_mem):
> > > 
> > > 	root_mem == victim -> account to victim as _limit
> > > 	root_mem != victim -> account to victim as _hierarchical
> > > 
> > > This would make things much simpler and more natural, both the code
> > > and the way of tracking down a problem, IMO.
> > 
> > hmm. I have no strong opinion.
> 
> I do :-)
> 
BTW,  how to calculate C's lru scan caused by A finally ?

            A
           /
          B
         /
        C

At scanning LRU of C because of A's limit, where stats are recorded ?

If we record it in C, we lose where the memory pressure comes from.
If we record it in A, we lose where scan happens.
I'm sorry I'm a little confused.

Thanks,
-Kame





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30 10:34                     ` KAMEZAWA Hiroyuki
@ 2011-08-30 11:03                       ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30 11:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 07:34:06PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 12:17:26 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> > > > > > I don't get why this has to be done completely different from the way
> > > > > > we usually do things, without any justification, whatsoever.
> > > > > > 
> > > > > > Why do you want to pass a recording structure down the reclaim stack?
> > > > > 
> > > > > Just for reducing number of passed variables.
> > > > 
> > > > It's still sitting on bottom of the reclaim stack the whole time.
> > > > 
> > > > With my proposal, you would only need to pass the extra root_mem
> > > > pointer.
> > > 
> > > I'm sorry I miss something. Do you say to add a function like
> > > 
> > > mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
> > >                                file_scan, file_free, elapsed_ns)
> > > 
> > > ?
> > 
> > Exactly, though passing it a stat item index and a delta would
> > probably be closer to our other statistics accounting, i.e.:
> > 
> > 	mem_cgroup_record_reclaim_stat(sc->mem_cgroup, sc->root_mem_cgroup,
> > 				       MEM_CGROUP_SCAN_ANON, *nr_anon);
> > 
> > where sc->mem_cgroup is `victim' and sc->root_mem_cgroup is `root_mem'
> > from hierarchical_reclaim.  ->root_mem_cgroup might be confusing,
> > though.  I named it ->target_mem_cgroup in my patch set but I don't
> > feel too strongly about that.
> > 
> > Even better would be to reuse enum vm_event_item and at one point
> > merge all the accounting stuff into a single function and have one
> > single set of events that makes sense on a global level as well as on
> > a per-memcg level.
> > 
> > There is deviation and implementing similar things twice with slight
> > variations and I don't see any justification for all that extra code
> > that needs maintaining.  Or counters that have similar names globally
> > and on a per-memcg level but with different meanings, like the rotated
> > counter.  Globally, a rotated page (PGROTATED) is one that is moved
> > back to the inactive list after writeback finishes.  Per-memcg, the
> > rotated counter is our internal heuristics value to balance pressure
> > between LRUs and means either rotated on the inactive list, activated,
> > not activated but countes as activated because of VM_EXEC etc.
> > 
> > I am still for reverting this patch before the release until we have
> > this all sorted out.  I feel rather strongly that these statistics are
> > in no way ready to make them part of the ABI and export them to
> > userspace as they are now.
> 
> How about fixing interface first ? 1st version of this patch was 
> in April and no big change since then.
> I don't want to be starved more.

Back then I mentioned all my concerns and alternate suggestions.
Different from you, I explained and provided a reason for every single
counter I wanted to add, suggested a basic pattern for how to
interpret them to gain insight into memcg configurations and their
behaviour.  No reaction.  If you want to make progress, than don't
ignore concerns and arguments.  If my arguments are crap, then tell me
why and we can move on.

What we have now is not ready.  It wasn't discussed properly, which
certainly wasn't for the lack of interest in this change.  I just got
tired of raising the same points over and over again without answer.

The amount of time a change has been around is not an argument for it
to get merged.  On the other hand, the fact that it hasn't changed
since April *even though* the implementation was opposed back then
doesn't really speak for your way of getting this upstream, does it?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30 11:03                       ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30 11:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 07:34:06PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 12:17:26 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> > > > > > I don't get why this has to be done completely different from the way
> > > > > > we usually do things, without any justification, whatsoever.
> > > > > > 
> > > > > > Why do you want to pass a recording structure down the reclaim stack?
> > > > > 
> > > > > Just for reducing number of passed variables.
> > > > 
> > > > It's still sitting on bottom of the reclaim stack the whole time.
> > > > 
> > > > With my proposal, you would only need to pass the extra root_mem
> > > > pointer.
> > > 
> > > I'm sorry I miss something. Do you say to add a function like
> > > 
> > > mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
> > >                                file_scan, file_free, elapsed_ns)
> > > 
> > > ?
> > 
> > Exactly, though passing it a stat item index and a delta would
> > probably be closer to our other statistics accounting, i.e.:
> > 
> > 	mem_cgroup_record_reclaim_stat(sc->mem_cgroup, sc->root_mem_cgroup,
> > 				       MEM_CGROUP_SCAN_ANON, *nr_anon);
> > 
> > where sc->mem_cgroup is `victim' and sc->root_mem_cgroup is `root_mem'
> > from hierarchical_reclaim.  ->root_mem_cgroup might be confusing,
> > though.  I named it ->target_mem_cgroup in my patch set but I don't
> > feel too strongly about that.
> > 
> > Even better would be to reuse enum vm_event_item and at one point
> > merge all the accounting stuff into a single function and have one
> > single set of events that makes sense on a global level as well as on
> > a per-memcg level.
> > 
> > There is deviation and implementing similar things twice with slight
> > variations and I don't see any justification for all that extra code
> > that needs maintaining.  Or counters that have similar names globally
> > and on a per-memcg level but with different meanings, like the rotated
> > counter.  Globally, a rotated page (PGROTATED) is one that is moved
> > back to the inactive list after writeback finishes.  Per-memcg, the
> > rotated counter is our internal heuristics value to balance pressure
> > between LRUs and means either rotated on the inactive list, activated,
> > not activated but countes as activated because of VM_EXEC etc.
> > 
> > I am still for reverting this patch before the release until we have
> > this all sorted out.  I feel rather strongly that these statistics are
> > in no way ready to make them part of the ABI and export them to
> > userspace as they are now.
> 
> How about fixing interface first ? 1st version of this patch was 
> in April and no big change since then.
> I don't want to be starved more.

Back then I mentioned all my concerns and alternate suggestions.
Different from you, I explained and provided a reason for every single
counter I wanted to add, suggested a basic pattern for how to
interpret them to gain insight into memcg configurations and their
behaviour.  No reaction.  If you want to make progress, than don't
ignore concerns and arguments.  If my arguments are crap, then tell me
why and we can move on.

What we have now is not ready.  It wasn't discussed properly, which
certainly wasn't for the lack of interest in this change.  I just got
tired of raising the same points over and over again without answer.

The amount of time a change has been around is not an argument for it
to get merged.  On the other hand, the fact that it hasn't changed
since April *even though* the implementation was opposed back then
doesn't really speak for your way of getting this upstream, does it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30 10:38                     ` KAMEZAWA Hiroyuki
@ 2011-08-30 11:32                       ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30 11:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 12:17:26 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
>  
> > > > > Assume 3 cgroups in a hierarchy.
> > > > > 
> > > > > 	A
> > > > >        /
> > > > >       B
> > > > >      /
> > > > >     C
> > > > > 
> > > > > C's scan contains 3 causes.
> > > > > 	C's scan caused by limit of A.
> > > > > 	C's scan caused by limit of B.
> > > > > 	C's scan caused by limit of C.
> > > > >
> > > > > If we make hierarchy sum at read, we think
> > > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > > But in precice, this is
> > > > > 
> > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > 			B's scan_stat caused by A +
> > > > > 			C's scan_stat caused by C +
> > > > > 			C's scan_stat caused by B +
> > > > > 			C's scan_stat caused by A.
> > > > > 
> > > > > In orignal version.
> > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > 			C's scan_stat caused by B +
> > > > > 
> > > > > After this patch,
> > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > 			B's scan_stat caused by A +
> > > > > 			C's scan_stat caused by C +
> > > > > 			C's scan_stat caused by B +
> > > > > 			C's scan_stat caused by A.
> > > > > 
> > > > > Hmm...removing hierarchy part completely seems fine to me.
> > > > 
> > > > I see.
> > > > 
> > > > You want to look at A and see whether its limit was responsible for
> > > > reclaim scans in any children.  IMO, that is asking the question
> > > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > > find out the cause for that.  Not the other way round.
> > > > 
> > > > In my original proposal I suggested differentiating reclaim caused by
> > > > internal pressure (due to own limit) and reclaim caused by
> > > > external/hierarchical pressure (due to limits from parents).
> > > > 
> > > > If you want to find out why C is under reclaim, look at its reclaim
> > > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > > If the _hierarchical numbers are high, the problem is B, A, or
> > > > physical memory, so you check B for _limit and _hierarchical as well,
> > > > then move on to A.
> > > > 
> > > > Implementing this would be as easy as passing not only the memcg to
> > > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > > reclaim (root_mem):
> > > > 
> > > > 	root_mem == victim -> account to victim as _limit
> > > > 	root_mem != victim -> account to victim as _hierarchical
> > > > 
> > > > This would make things much simpler and more natural, both the code
> > > > and the way of tracking down a problem, IMO.
> > > 
> > > hmm. I have no strong opinion.
> > 
> > I do :-)
> > 
> BTW,  how to calculate C's lru scan caused by A finally ?
> 
>             A
>            /
>           B
>          /
>         C
> 
> At scanning LRU of C because of A's limit, where stats are recorded ?
> 
> If we record it in C, we lose where the memory pressure comes from.

It's recorded in C as 'scanned due to parent'.

If you want to track down where pressure comes from, you check the
outer container, B.  If B is scanned due to internal pressure, you
know that C's external pressure comes from B.  If B is scanned due to
external pressure, you know that B's and C's pressure comes from A or
the physical memory limit (the outermost container, so to speak).

The containers are nested.  If C is scanned because of the limit in A,
then this concerns B as well and B must be scanned as well as B, as
C's usage is fully contained in B.

There is not really a direct connection between C and A that is
irrelevant to B, so I see no need to record in C which parent was the
cause of the pressure.  Just that it was /a/ parent and not itself.
Then you can follow the pressure up the hierarchy tree.

Answer to your original question:

	C_scan_due_to_A = C_scan_external - B_scan_internal - A_scan_external

But IMO, having this exact number is not necessary to find the reason
for why C is experiencing memory pressure in the first place, and I
assume that this is the goal.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30 11:32                       ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-30 11:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 12:17:26 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
>  
> > > > > Assume 3 cgroups in a hierarchy.
> > > > > 
> > > > > 	A
> > > > >        /
> > > > >       B
> > > > >      /
> > > > >     C
> > > > > 
> > > > > C's scan contains 3 causes.
> > > > > 	C's scan caused by limit of A.
> > > > > 	C's scan caused by limit of B.
> > > > > 	C's scan caused by limit of C.
> > > > >
> > > > > If we make hierarchy sum at read, we think
> > > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > > But in precice, this is
> > > > > 
> > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > 			B's scan_stat caused by A +
> > > > > 			C's scan_stat caused by C +
> > > > > 			C's scan_stat caused by B +
> > > > > 			C's scan_stat caused by A.
> > > > > 
> > > > > In orignal version.
> > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > 			C's scan_stat caused by B +
> > > > > 
> > > > > After this patch,
> > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > 			B's scan_stat caused by A +
> > > > > 			C's scan_stat caused by C +
> > > > > 			C's scan_stat caused by B +
> > > > > 			C's scan_stat caused by A.
> > > > > 
> > > > > Hmm...removing hierarchy part completely seems fine to me.
> > > > 
> > > > I see.
> > > > 
> > > > You want to look at A and see whether its limit was responsible for
> > > > reclaim scans in any children.  IMO, that is asking the question
> > > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > > find out the cause for that.  Not the other way round.
> > > > 
> > > > In my original proposal I suggested differentiating reclaim caused by
> > > > internal pressure (due to own limit) and reclaim caused by
> > > > external/hierarchical pressure (due to limits from parents).
> > > > 
> > > > If you want to find out why C is under reclaim, look at its reclaim
> > > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > > If the _hierarchical numbers are high, the problem is B, A, or
> > > > physical memory, so you check B for _limit and _hierarchical as well,
> > > > then move on to A.
> > > > 
> > > > Implementing this would be as easy as passing not only the memcg to
> > > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > > reclaim (root_mem):
> > > > 
> > > > 	root_mem == victim -> account to victim as _limit
> > > > 	root_mem != victim -> account to victim as _hierarchical
> > > > 
> > > > This would make things much simpler and more natural, both the code
> > > > and the way of tracking down a problem, IMO.
> > > 
> > > hmm. I have no strong opinion.
> > 
> > I do :-)
> > 
> BTW,  how to calculate C's lru scan caused by A finally ?
> 
>             A
>            /
>           B
>          /
>         C
> 
> At scanning LRU of C because of A's limit, where stats are recorded ?
> 
> If we record it in C, we lose where the memory pressure comes from.

It's recorded in C as 'scanned due to parent'.

If you want to track down where pressure comes from, you check the
outer container, B.  If B is scanned due to internal pressure, you
know that C's external pressure comes from B.  If B is scanned due to
external pressure, you know that B's and C's pressure comes from A or
the physical memory limit (the outermost container, so to speak).

The containers are nested.  If C is scanned because of the limit in A,
then this concerns B as well and B must be scanned as well as B, as
C's usage is fully contained in B.

There is not really a direct connection between C and A that is
irrelevant to B, so I see no need to record in C which parent was the
cause of the pressure.  Just that it was /a/ parent and not itself.
Then you can follow the pressure up the hierarchy tree.

Answer to your original question:

	C_scan_due_to_A = C_scan_external - B_scan_internal - A_scan_external

But IMO, having this exact number is not necessary to find the reason
for why C is experiencing memory pressure in the first place, and I
assume that this is the goal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30 11:32                       ` Johannes Weiner
@ 2011-08-30 23:29                         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 23:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 13:32:21 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 12:17:26 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > Johannes Weiner <jweiner@redhat.com> wrote:
> >  
> > > > > > Assume 3 cgroups in a hierarchy.
> > > > > > 
> > > > > > 	A
> > > > > >        /
> > > > > >       B
> > > > > >      /
> > > > > >     C
> > > > > > 
> > > > > > C's scan contains 3 causes.
> > > > > > 	C's scan caused by limit of A.
> > > > > > 	C's scan caused by limit of B.
> > > > > > 	C's scan caused by limit of C.
> > > > > >
> > > > > > If we make hierarchy sum at read, we think
> > > > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > > > But in precice, this is
> > > > > > 
> > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > 			B's scan_stat caused by A +
> > > > > > 			C's scan_stat caused by C +
> > > > > > 			C's scan_stat caused by B +
> > > > > > 			C's scan_stat caused by A.
> > > > > > 
> > > > > > In orignal version.
> > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > 			C's scan_stat caused by B +
> > > > > > 
> > > > > > After this patch,
> > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > 			B's scan_stat caused by A +
> > > > > > 			C's scan_stat caused by C +
> > > > > > 			C's scan_stat caused by B +
> > > > > > 			C's scan_stat caused by A.
> > > > > > 
> > > > > > Hmm...removing hierarchy part completely seems fine to me.
> > > > > 
> > > > > I see.
> > > > > 
> > > > > You want to look at A and see whether its limit was responsible for
> > > > > reclaim scans in any children.  IMO, that is asking the question
> > > > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > > > find out the cause for that.  Not the other way round.
> > > > > 
> > > > > In my original proposal I suggested differentiating reclaim caused by
> > > > > internal pressure (due to own limit) and reclaim caused by
> > > > > external/hierarchical pressure (due to limits from parents).
> > > > > 
> > > > > If you want to find out why C is under reclaim, look at its reclaim
> > > > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > > > If the _hierarchical numbers are high, the problem is B, A, or
> > > > > physical memory, so you check B for _limit and _hierarchical as well,
> > > > > then move on to A.
> > > > > 
> > > > > Implementing this would be as easy as passing not only the memcg to
> > > > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > > > reclaim (root_mem):
> > > > > 
> > > > > 	root_mem == victim -> account to victim as _limit
> > > > > 	root_mem != victim -> account to victim as _hierarchical
> > > > > 
> > > > > This would make things much simpler and more natural, both the code
> > > > > and the way of tracking down a problem, IMO.
> > > > 
> > > > hmm. I have no strong opinion.
> > > 
> > > I do :-)
> > > 
> > BTW,  how to calculate C's lru scan caused by A finally ?
> > 
> >             A
> >            /
> >           B
> >          /
> >         C
> > 
> > At scanning LRU of C because of A's limit, where stats are recorded ?
> > 
> > If we record it in C, we lose where the memory pressure comes from.
> 
> It's recorded in C as 'scanned due to parent'.
> 
> If you want to track down where pressure comes from, you check the
> outer container, B.  If B is scanned due to internal pressure, you
> know that C's external pressure comes from B.  If B is scanned due to
> external pressure, you know that B's and C's pressure comes from A or
> the physical memory limit (the outermost container, so to speak).
> 
> The containers are nested.  If C is scanned because of the limit in A,
> then this concerns B as well and B must be scanned as well as B, as
> C's usage is fully contained in B.
> 
> There is not really a direct connection between C and A that is
> irrelevant to B, so I see no need to record in C which parent was the
> cause of the pressure.  Just that it was /a/ parent and not itself.
> Then you can follow the pressure up the hierarchy tree.
> 
> Answer to your original question:
> 
> 	C_scan_due_to_A = C_scan_external - B_scan_internal - A_scan_external
> 

I'm confused. 

If vmscan is scanning in C's LRU,
	(memcg == root) : C_scan_internal ++
	(memcg != root) : C_scan_external ++

Why A_scan_external exists ? It's 0 ?

I think we can never get numbers.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30 23:29                         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 23:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 13:32:21 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 12:17:26 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > Johannes Weiner <jweiner@redhat.com> wrote:
> >  
> > > > > > Assume 3 cgroups in a hierarchy.
> > > > > > 
> > > > > > 	A
> > > > > >        /
> > > > > >       B
> > > > > >      /
> > > > > >     C
> > > > > > 
> > > > > > C's scan contains 3 causes.
> > > > > > 	C's scan caused by limit of A.
> > > > > > 	C's scan caused by limit of B.
> > > > > > 	C's scan caused by limit of C.
> > > > > >
> > > > > > If we make hierarchy sum at read, we think
> > > > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > > > But in precice, this is
> > > > > > 
> > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > 			B's scan_stat caused by A +
> > > > > > 			C's scan_stat caused by C +
> > > > > > 			C's scan_stat caused by B +
> > > > > > 			C's scan_stat caused by A.
> > > > > > 
> > > > > > In orignal version.
> > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > 			C's scan_stat caused by B +
> > > > > > 
> > > > > > After this patch,
> > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > 			B's scan_stat caused by A +
> > > > > > 			C's scan_stat caused by C +
> > > > > > 			C's scan_stat caused by B +
> > > > > > 			C's scan_stat caused by A.
> > > > > > 
> > > > > > Hmm...removing hierarchy part completely seems fine to me.
> > > > > 
> > > > > I see.
> > > > > 
> > > > > You want to look at A and see whether its limit was responsible for
> > > > > reclaim scans in any children.  IMO, that is asking the question
> > > > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > > > find out the cause for that.  Not the other way round.
> > > > > 
> > > > > In my original proposal I suggested differentiating reclaim caused by
> > > > > internal pressure (due to own limit) and reclaim caused by
> > > > > external/hierarchical pressure (due to limits from parents).
> > > > > 
> > > > > If you want to find out why C is under reclaim, look at its reclaim
> > > > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > > > If the _hierarchical numbers are high, the problem is B, A, or
> > > > > physical memory, so you check B for _limit and _hierarchical as well,
> > > > > then move on to A.
> > > > > 
> > > > > Implementing this would be as easy as passing not only the memcg to
> > > > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > > > reclaim (root_mem):
> > > > > 
> > > > > 	root_mem == victim -> account to victim as _limit
> > > > > 	root_mem != victim -> account to victim as _hierarchical
> > > > > 
> > > > > This would make things much simpler and more natural, both the code
> > > > > and the way of tracking down a problem, IMO.
> > > > 
> > > > hmm. I have no strong opinion.
> > > 
> > > I do :-)
> > > 
> > BTW,  how to calculate C's lru scan caused by A finally ?
> > 
> >             A
> >            /
> >           B
> >          /
> >         C
> > 
> > At scanning LRU of C because of A's limit, where stats are recorded ?
> > 
> > If we record it in C, we lose where the memory pressure comes from.
> 
> It's recorded in C as 'scanned due to parent'.
> 
> If you want to track down where pressure comes from, you check the
> outer container, B.  If B is scanned due to internal pressure, you
> know that C's external pressure comes from B.  If B is scanned due to
> external pressure, you know that B's and C's pressure comes from A or
> the physical memory limit (the outermost container, so to speak).
> 
> The containers are nested.  If C is scanned because of the limit in A,
> then this concerns B as well and B must be scanned as well as B, as
> C's usage is fully contained in B.
> 
> There is not really a direct connection between C and A that is
> irrelevant to B, so I see no need to record in C which parent was the
> cause of the pressure.  Just that it was /a/ parent and not itself.
> Then you can follow the pressure up the hierarchy tree.
> 
> Answer to your original question:
> 
> 	C_scan_due_to_A = C_scan_external - B_scan_internal - A_scan_external
> 

I'm confused. 

If vmscan is scanning in C's LRU,
	(memcg == root) : C_scan_internal ++
	(memcg != root) : C_scan_external ++

Why A_scan_external exists ? It's 0 ?

I think we can never get numbers.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30 11:03                       ` Johannes Weiner
@ 2011-08-30 23:38                         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 23:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 13:03:37 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 07:34:06PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 12:17:26 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:

> > How about fixing interface first ? 1st version of this patch was 
> > in April and no big change since then.
> > I don't want to be starved more.
> 
> Back then I mentioned all my concerns and alternate suggestions.
> Different from you, I explained and provided a reason for every single
> counter I wanted to add, suggested a basic pattern for how to
> interpret them to gain insight into memcg configurations and their
> behaviour.  No reaction.  If you want to make progress, than don't
> ignore concerns and arguments.  If my arguments are crap, then tell me
> why and we can move on.
> 

I think having percpu couneter has no performance benefit, just lose
extra memory by percpu allocation.
Anyway, you can change internal implemenatation when it's necessary.

But Ok, I agree using the same style as zone counters is better.

> What we have now is not ready.  It wasn't discussed properly, which
> certainly wasn't for the lack of interest in this change.  I just got
> tired of raising the same points over and over again without answer.
> 
> The amount of time a change has been around is not an argument for it
> to get merged.  On the other hand, the fact that it hasn't changed
> since April *even though* the implementation was opposed back then
> doesn't really speak for your way of getting this upstream, does it?

The fact is that you should revert the patch when it's merged to mmotm.

Please revert patch. And merge your own.
Anyway I don't have much interests in hierarchy.

Bye,
-Kame





^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-30 23:38                         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-30 23:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Tue, 30 Aug 2011 13:03:37 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Tue, Aug 30, 2011 at 07:34:06PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 12:17:26 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:

> > How about fixing interface first ? 1st version of this patch was 
> > in April and no big change since then.
> > I don't want to be starved more.
> 
> Back then I mentioned all my concerns and alternate suggestions.
> Different from you, I explained and provided a reason for every single
> counter I wanted to add, suggested a basic pattern for how to
> interpret them to gain insight into memcg configurations and their
> behaviour.  No reaction.  If you want to make progress, than don't
> ignore concerns and arguments.  If my arguments are crap, then tell me
> why and we can move on.
> 

I think having percpu couneter has no performance benefit, just lose
extra memory by percpu allocation.
Anyway, you can change internal implemenatation when it's necessary.

But Ok, I agree using the same style as zone counters is better.

> What we have now is not ready.  It wasn't discussed properly, which
> certainly wasn't for the lack of interest in this change.  I just got
> tired of raising the same points over and over again without answer.
> 
> The amount of time a change has been around is not an argument for it
> to get merged.  On the other hand, the fact that it hasn't changed
> since April *even though* the implementation was opposed back then
> doesn't really speak for your way of getting this upstream, does it?

The fact is that you should revert the patch when it's merged to mmotm.

Please revert patch. And merge your own.
Anyway I don't have much interests in hierarchy.

Bye,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30 23:29                         ` KAMEZAWA Hiroyuki
@ 2011-08-31  6:23                           ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-31  6:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Wed, Aug 31, 2011 at 08:29:24AM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 13:32:21 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 12:17:26 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > >  
> > > > > > > Assume 3 cgroups in a hierarchy.
> > > > > > > 
> > > > > > > 	A
> > > > > > >        /
> > > > > > >       B
> > > > > > >      /
> > > > > > >     C
> > > > > > > 
> > > > > > > C's scan contains 3 causes.
> > > > > > > 	C's scan caused by limit of A.
> > > > > > > 	C's scan caused by limit of B.
> > > > > > > 	C's scan caused by limit of C.
> > > > > > >
> > > > > > > If we make hierarchy sum at read, we think
> > > > > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > > > > But in precice, this is
> > > > > > > 
> > > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > > 			B's scan_stat caused by A +
> > > > > > > 			C's scan_stat caused by C +
> > > > > > > 			C's scan_stat caused by B +
> > > > > > > 			C's scan_stat caused by A.
> > > > > > > 
> > > > > > > In orignal version.
> > > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > > 			C's scan_stat caused by B +
> > > > > > > 
> > > > > > > After this patch,
> > > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > > 			B's scan_stat caused by A +
> > > > > > > 			C's scan_stat caused by C +
> > > > > > > 			C's scan_stat caused by B +
> > > > > > > 			C's scan_stat caused by A.
> > > > > > > 
> > > > > > > Hmm...removing hierarchy part completely seems fine to me.
> > > > > > 
> > > > > > I see.
> > > > > > 
> > > > > > You want to look at A and see whether its limit was responsible for
> > > > > > reclaim scans in any children.  IMO, that is asking the question
> > > > > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > > > > find out the cause for that.  Not the other way round.
> > > > > > 
> > > > > > In my original proposal I suggested differentiating reclaim caused by
> > > > > > internal pressure (due to own limit) and reclaim caused by
> > > > > > external/hierarchical pressure (due to limits from parents).
> > > > > > 
> > > > > > If you want to find out why C is under reclaim, look at its reclaim
> > > > > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > > > > If the _hierarchical numbers are high, the problem is B, A, or
> > > > > > physical memory, so you check B for _limit and _hierarchical as well,
> > > > > > then move on to A.
> > > > > > 
> > > > > > Implementing this would be as easy as passing not only the memcg to
> > > > > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > > > > reclaim (root_mem):
> > > > > > 
> > > > > > 	root_mem == victim -> account to victim as _limit
> > > > > > 	root_mem != victim -> account to victim as _hierarchical
> > > > > > 
> > > > > > This would make things much simpler and more natural, both the code
> > > > > > and the way of tracking down a problem, IMO.
> > > > > 
> > > > > hmm. I have no strong opinion.
> > > > 
> > > > I do :-)
> > > > 
> > > BTW,  how to calculate C's lru scan caused by A finally ?
> > > 
> > >             A
> > >            /
> > >           B
> > >          /
> > >         C
> > > 
> > > At scanning LRU of C because of A's limit, where stats are recorded ?
> > > 
> > > If we record it in C, we lose where the memory pressure comes from.
> > 
> > It's recorded in C as 'scanned due to parent'.
> > 
> > If you want to track down where pressure comes from, you check the
> > outer container, B.  If B is scanned due to internal pressure, you
> > know that C's external pressure comes from B.  If B is scanned due to
> > external pressure, you know that B's and C's pressure comes from A or
> > the physical memory limit (the outermost container, so to speak).
> > 
> > The containers are nested.  If C is scanned because of the limit in A,
> > then this concerns B as well and B must be scanned as well as B, as
> > C's usage is fully contained in B.
> > 
> > There is not really a direct connection between C and A that is
> > irrelevant to B, so I see no need to record in C which parent was the
> > cause of the pressure.  Just that it was /a/ parent and not itself.
> > Then you can follow the pressure up the hierarchy tree.
> > 
> > Answer to your original question:
> > 
> > 	C_scan_due_to_A = C_scan_external - B_scan_internal - A_scan_external
> > 
> 
> I'm confused. 
> 
> If vmscan is scanning in C's LRU,
> 	(memcg == root) : C_scan_internal ++
> 	(memcg != root) : C_scan_external ++

Yes.

> Why A_scan_external exists ? It's 0 ?
> 
> I think we can never get numbers.

Kswapd/direct reclaim should probably be accounted as A_external,
since A has no limit, so reclaim pressure can not be internal.

On the other hand, one could see the amount of physical memory in the
machine as A's limit and account global reclaim as A_internal.

I think the former may be more natural.

That aside, all memcgs should have the same statistics, obviously.
Scripts can easily deal with counters being zero.  If items differ
between cgroups, that would suck a lot.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-31  6:23                           ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-31  6:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Wed, Aug 31, 2011 at 08:29:24AM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 13:32:21 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 12:17:26 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > >  
> > > > > > > Assume 3 cgroups in a hierarchy.
> > > > > > > 
> > > > > > > 	A
> > > > > > >        /
> > > > > > >       B
> > > > > > >      /
> > > > > > >     C
> > > > > > > 
> > > > > > > C's scan contains 3 causes.
> > > > > > > 	C's scan caused by limit of A.
> > > > > > > 	C's scan caused by limit of B.
> > > > > > > 	C's scan caused by limit of C.
> > > > > > >
> > > > > > > If we make hierarchy sum at read, we think
> > > > > > > 	B's scan_stat = B's scan_stat + C's scan_stat
> > > > > > > But in precice, this is
> > > > > > > 
> > > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > > 			B's scan_stat caused by A +
> > > > > > > 			C's scan_stat caused by C +
> > > > > > > 			C's scan_stat caused by B +
> > > > > > > 			C's scan_stat caused by A.
> > > > > > > 
> > > > > > > In orignal version.
> > > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > > 			C's scan_stat caused by B +
> > > > > > > 
> > > > > > > After this patch,
> > > > > > > 	B's scan_stat = B's scan_stat caused by B +
> > > > > > > 			B's scan_stat caused by A +
> > > > > > > 			C's scan_stat caused by C +
> > > > > > > 			C's scan_stat caused by B +
> > > > > > > 			C's scan_stat caused by A.
> > > > > > > 
> > > > > > > Hmm...removing hierarchy part completely seems fine to me.
> > > > > > 
> > > > > > I see.
> > > > > > 
> > > > > > You want to look at A and see whether its limit was responsible for
> > > > > > reclaim scans in any children.  IMO, that is asking the question
> > > > > > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > > > > > find out the cause for that.  Not the other way round.
> > > > > > 
> > > > > > In my original proposal I suggested differentiating reclaim caused by
> > > > > > internal pressure (due to own limit) and reclaim caused by
> > > > > > external/hierarchical pressure (due to limits from parents).
> > > > > > 
> > > > > > If you want to find out why C is under reclaim, look at its reclaim
> > > > > > statistics.  If the _limit numbers are high, C's limit is the problem.
> > > > > > If the _hierarchical numbers are high, the problem is B, A, or
> > > > > > physical memory, so you check B for _limit and _hierarchical as well,
> > > > > > then move on to A.
> > > > > > 
> > > > > > Implementing this would be as easy as passing not only the memcg to
> > > > > > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > > > > > reclaim (root_mem):
> > > > > > 
> > > > > > 	root_mem == victim -> account to victim as _limit
> > > > > > 	root_mem != victim -> account to victim as _hierarchical
> > > > > > 
> > > > > > This would make things much simpler and more natural, both the code
> > > > > > and the way of tracking down a problem, IMO.
> > > > > 
> > > > > hmm. I have no strong opinion.
> > > > 
> > > > I do :-)
> > > > 
> > > BTW,  how to calculate C's lru scan caused by A finally ?
> > > 
> > >             A
> > >            /
> > >           B
> > >          /
> > >         C
> > > 
> > > At scanning LRU of C because of A's limit, where stats are recorded ?
> > > 
> > > If we record it in C, we lose where the memory pressure comes from.
> > 
> > It's recorded in C as 'scanned due to parent'.
> > 
> > If you want to track down where pressure comes from, you check the
> > outer container, B.  If B is scanned due to internal pressure, you
> > know that C's external pressure comes from B.  If B is scanned due to
> > external pressure, you know that B's and C's pressure comes from A or
> > the physical memory limit (the outermost container, so to speak).
> > 
> > The containers are nested.  If C is scanned because of the limit in A,
> > then this concerns B as well and B must be scanned as well as B, as
> > C's usage is fully contained in B.
> > 
> > There is not really a direct connection between C and A that is
> > irrelevant to B, so I see no need to record in C which parent was the
> > cause of the pressure.  Just that it was /a/ parent and not itself.
> > Then you can follow the pressure up the hierarchy tree.
> > 
> > Answer to your original question:
> > 
> > 	C_scan_due_to_A = C_scan_external - B_scan_internal - A_scan_external
> > 
> 
> I'm confused. 
> 
> If vmscan is scanning in C's LRU,
> 	(memcg == root) : C_scan_internal ++
> 	(memcg != root) : C_scan_external ++

Yes.

> Why A_scan_external exists ? It's 0 ?
> 
> I think we can never get numbers.

Kswapd/direct reclaim should probably be accounted as A_external,
since A has no limit, so reclaim pressure can not be internal.

On the other hand, one could see the amount of physical memory in the
machine as A's limit and account global reclaim as A_internal.

I think the former may be more natural.

That aside, all memcgs should have the same statistics, obviously.
Scripts can easily deal with counters being zero.  If items differ
between cgroups, that would suck a lot.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-31  6:23                           ` Johannes Weiner
@ 2011-08-31  6:30                             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-31  6:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Wed, 31 Aug 2011 08:23:54 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Wed, Aug 31, 2011 at 08:29:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 13:32:21 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 30 Aug 2011 12:17:26 +0200
> > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > 
> > > > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > > > Johannes Weiner <jweiner@redhat.com> wrote:
>
> > I'm confused. 
> > 
> > If vmscan is scanning in C's LRU,
> > 	(memcg == root) : C_scan_internal ++
> > 	(memcg != root) : C_scan_external ++
> 
> Yes.
> 
> > Why A_scan_external exists ? It's 0 ?
> > 
> > I think we can never get numbers.
> 
> Kswapd/direct reclaim should probably be accounted as A_external,
> since A has no limit, so reclaim pressure can not be internal.
> 

hmm, ok. All memory pressure from memcg/system other than the memcg itsef
is all external.

> On the other hand, one could see the amount of physical memory in the
> machine as A's limit and account global reclaim as A_internal.
> 
> I think the former may be more natural.
> 
> That aside, all memcgs should have the same statistics, obviously.
> Scripts can easily deal with counters being zero.  If items differ
> between cgroups, that would suck a lot.

So, when I improve direct-reclaim path, I need to see score in scan_internal.

How do you think about background-reclaim-per-memcg ?
Should be counted into scan_internal ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-31  6:30                             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-08-31  6:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Wed, 31 Aug 2011 08:23:54 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> On Wed, Aug 31, 2011 at 08:29:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 13:32:21 +0200
> > Johannes Weiner <jweiner@redhat.com> wrote:
> > 
> > > On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 30 Aug 2011 12:17:26 +0200
> > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > 
> > > > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > > > Johannes Weiner <jweiner@redhat.com> wrote:
>
> > I'm confused. 
> > 
> > If vmscan is scanning in C's LRU,
> > 	(memcg == root) : C_scan_internal ++
> > 	(memcg != root) : C_scan_external ++
> 
> Yes.
> 
> > Why A_scan_external exists ? It's 0 ?
> > 
> > I think we can never get numbers.
> 
> Kswapd/direct reclaim should probably be accounted as A_external,
> since A has no limit, so reclaim pressure can not be internal.
> 

hmm, ok. All memory pressure from memcg/system other than the memcg itsef
is all external.

> On the other hand, one could see the amount of physical memory in the
> machine as A's limit and account global reclaim as A_internal.
> 
> I think the former may be more natural.
> 
> That aside, all memcgs should have the same statistics, obviously.
> Scripts can easily deal with counters being zero.  If items differ
> between cgroups, that would suck a lot.

So, when I improve direct-reclaim path, I need to see score in scan_internal.

How do you think about background-reclaim-per-memcg ?
Should be counted into scan_internal ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-31  6:30                             ` KAMEZAWA Hiroyuki
@ 2011-08-31  8:33                               ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-31  8:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Wed, Aug 31, 2011 at 03:30:25PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 31 Aug 2011 08:23:54 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Wed, Aug 31, 2011 at 08:29:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 13:32:21 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > On Tue, 30 Aug 2011 12:17:26 +0200
> > > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > > 
> > > > > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > > > > Johannes Weiner <jweiner@redhat.com> wrote:
> >
> > > I'm confused. 
> > > 
> > > If vmscan is scanning in C's LRU,
> > > 	(memcg == root) : C_scan_internal ++
> > > 	(memcg != root) : C_scan_external ++
> > 
> > Yes.
> > 
> > > Why A_scan_external exists ? It's 0 ?
> > > 
> > > I think we can never get numbers.
> > 
> > Kswapd/direct reclaim should probably be accounted as A_external,
> > since A has no limit, so reclaim pressure can not be internal.
> > 
> 
> hmm, ok. All memory pressure from memcg/system other than the memcg itsef
> is all external.
>
> > On the other hand, one could see the amount of physical memory in the
> > machine as A's limit and account global reclaim as A_internal.
> > 
> > I think the former may be more natural.
> > 
> > That aside, all memcgs should have the same statistics, obviously.
> > Scripts can easily deal with counters being zero.  If items differ
> > between cgroups, that would suck a lot.
> 
> So, when I improve direct-reclaim path, I need to see score in scan_internal.

Direct reclaim because of the limit or because of global pressure?  I
am going to assume because of the limit because global reclaim is not
yet accounted to memcgs even though their pages are scanned.  Please
correct me if I'm wrong.

        A
       /
      B
     /
    C

If A hits the limit and does direct reclaim in A, B, and C, then the
scans in A get accounted as internal while the scans in B and C get
accounted as external.

> How do you think about background-reclaim-per-memcg ?
> Should be counted into scan_internal ?

Background reclaim is still triggered by the limit, just that the
condition is 'close to limit' instead of 'reached limit'.

So when per-memcg background reclaim goes off because A is close to
its limit, then it will scan A (internal) and B + C (external).

It's always the same code:

	record_reclaim_stat(culprit, victim, item, delta)

In direct limit reclaim, the culprit is the one hitting its limit.  In
background reclaim, the culprit is the one getting close to its limit.

And then again the accounting is

	culprit == victim -> victim_internal++ (own fault)
	culprit != victim -> victim_external++ (parent's fault)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-08-31  8:33                               ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-08-31  8:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, Andrew Brestic,
	Ying Han, Michal Hocko, linux-mm, linux-kernel

On Wed, Aug 31, 2011 at 03:30:25PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 31 Aug 2011 08:23:54 +0200
> Johannes Weiner <jweiner@redhat.com> wrote:
> 
> > On Wed, Aug 31, 2011 at 08:29:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 30 Aug 2011 13:32:21 +0200
> > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > 
> > > > On Tue, Aug 30, 2011 at 07:38:39PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > On Tue, 30 Aug 2011 12:17:26 +0200
> > > > > Johannes Weiner <jweiner@redhat.com> wrote:
> > > > > 
> > > > > > On Tue, Aug 30, 2011 at 05:56:09PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > > On Tue, 30 Aug 2011 10:42:45 +0200
> > > > > > > Johannes Weiner <jweiner@redhat.com> wrote:
> >
> > > I'm confused. 
> > > 
> > > If vmscan is scanning in C's LRU,
> > > 	(memcg == root) : C_scan_internal ++
> > > 	(memcg != root) : C_scan_external ++
> > 
> > Yes.
> > 
> > > Why A_scan_external exists ? It's 0 ?
> > > 
> > > I think we can never get numbers.
> > 
> > Kswapd/direct reclaim should probably be accounted as A_external,
> > since A has no limit, so reclaim pressure can not be internal.
> > 
> 
> hmm, ok. All memory pressure from memcg/system other than the memcg itsef
> is all external.
>
> > On the other hand, one could see the amount of physical memory in the
> > machine as A's limit and account global reclaim as A_internal.
> > 
> > I think the former may be more natural.
> > 
> > That aside, all memcgs should have the same statistics, obviously.
> > Scripts can easily deal with counters being zero.  If items differ
> > between cgroups, that would suck a lot.
> 
> So, when I improve direct-reclaim path, I need to see score in scan_internal.

Direct reclaim because of the limit or because of global pressure?  I
am going to assume because of the limit because global reclaim is not
yet accounted to memcgs even though their pages are scanned.  Please
correct me if I'm wrong.

        A
       /
      B
     /
    C

If A hits the limit and does direct reclaim in A, B, and C, then the
scans in A get accounted as internal while the scans in B and C get
accounted as external.

> How do you think about background-reclaim-per-memcg ?
> Should be counted into scan_internal ?

Background reclaim is still triggered by the limit, just that the
condition is 'close to limit' instead of 'reached limit'.

So when per-memcg background reclaim goes off because A is close to
its limit, then it will scan A (internal) and B + C (external).

It's always the same code:

	record_reclaim_stat(culprit, victim, item, delta)

In direct limit reclaim, the culprit is the one hitting its limit.  In
background reclaim, the culprit is the one getting close to its limit.

And then again the accounting is

	culprit == victim -> victim_internal++ (own fault)
	culprit != victim -> victim_external++ (parent's fault)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-08-30  8:42               ` Johannes Weiner
@ 2011-09-01  6:05                 ` Ying Han
  -1 siblings, 0 replies; 54+ messages in thread
From: Ying Han @ 2011-09-01  6:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Andrew Brestic, Michal Hocko, linux-mm,
	linux-kernel

On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 30 Aug 2011 09:04:24 +0200
>> Johannes Weiner <jweiner@redhat.com> wrote:
>>
>> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
>> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
>> > >   spin_lock(&memcg->scanstat.lock);
>> > >   __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
>> > >   spin_unlock(&memcg->scanstat.lock);
>> > > -
>> > > - memcg = rec->root;
>> > > - spin_lock(&memcg->scanstat.lock);
>> > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
>> > > - spin_unlock(&memcg->scanstat.lock);
>> > > + cgroup = memcg->css.cgroup;
>> > > + do {
>> > > +         spin_lock(&memcg->scanstat.lock);
>> > > +         __mem_cgroup_record_scanstat(
>> > > +                 memcg->scanstat.hierarchy_stats[context], rec);
>> > > +         spin_unlock(&memcg->scanstat.lock);
>> > > +         if (!cgroup->parent)
>> > > +                 break;
>> > > +         cgroup = cgroup->parent;
>> > > +         memcg = mem_cgroup_from_cont(cgroup);
>> > > + } while (memcg->use_hierarchy && memcg != rec->root);
>> >
>> > Okay, so this looks correct, but it sums up all parents after each
>> > memcg scanned, which could have a performance impact.  Usually,
>> > hierarchy statistics are only summed up when a user reads them.
>> >
>> Hmm. But sum-at-read doesn't work.
>>
>> Assume 3 cgroups in a hierarchy.
>>
>>       A
>>        /
>>       B
>>      /
>>     C
>>
>> C's scan contains 3 causes.
>>       C's scan caused by limit of A.
>>       C's scan caused by limit of B.
>>       C's scan caused by limit of C.
>>
>> If we make hierarchy sum at read, we think
>>       B's scan_stat = B's scan_stat + C's scan_stat
>> But in precice, this is
>>
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> In orignal version.
>>       B's scan_stat = B's scan_stat caused by B +
>>                       C's scan_stat caused by B +
>>
>> After this patch,
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> Hmm...removing hierarchy part completely seems fine to me.
>
> I see.
>
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children.  IMO, that is asking the question
> backwards.  Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that.  Not the other way round.
>
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
>
> If you want to find out why C is under reclaim, look at its reclaim
> statistics.  If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
>
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
>
>        root_mem == victim -> account to victim as _limit
>        root_mem != victim -> account to victim as _hierarchical
>
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.

This is pretty much the stats I am currently using for debugging the
reclaim patches. For example:

scanned_pages_by_system 0
scanned_pages_by_system_under_hierarchy 50989

scanned_pages_by_limit 0
scanned_pages_by_limit_under_hierarchy 0

"_system" is count under global reclaim, and "_limit" is count under
per-memcg reclaim.
"_under_hiearchy" is set if memcg is not the one triggering pressure.

So in the previous example:

>       A (root)
>        /
>       B
>      /
>     C

For cgroup C:
scanned_pages_by_system:
scanned_pages_by_system_under_hierarchy: # of pages scanned under
global memory pressure

scanned_pages_by_limit: # of pages scanned while C hits the limit
scanned_pages_by_limit_under_hierarchy: # of pages scanned while B
hits the limit

--Ying

>
>> > I don't get why this has to be done completely different from the way
>> > we usually do things, without any justification, whatsoever.
>> >
>> > Why do you want to pass a recording structure down the reclaim stack?
>>
>> Just for reducing number of passed variables.
>
> It's still sitting on bottom of the reclaim stack the whole time.
>
> With my proposal, you would only need to pass the extra root_mem
> pointer.
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-09-01  6:05                 ` Ying Han
  0 siblings, 0 replies; 54+ messages in thread
From: Ying Han @ 2011-09-01  6:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Andrew Brestic, Michal Hocko, linux-mm,
	linux-kernel

On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 30 Aug 2011 09:04:24 +0200
>> Johannes Weiner <jweiner@redhat.com> wrote:
>>
>> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
>> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
>> > >   spin_lock(&memcg->scanstat.lock);
>> > >   __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
>> > >   spin_unlock(&memcg->scanstat.lock);
>> > > -
>> > > - memcg = rec->root;
>> > > - spin_lock(&memcg->scanstat.lock);
>> > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
>> > > - spin_unlock(&memcg->scanstat.lock);
>> > > + cgroup = memcg->css.cgroup;
>> > > + do {
>> > > +         spin_lock(&memcg->scanstat.lock);
>> > > +         __mem_cgroup_record_scanstat(
>> > > +                 memcg->scanstat.hierarchy_stats[context], rec);
>> > > +         spin_unlock(&memcg->scanstat.lock);
>> > > +         if (!cgroup->parent)
>> > > +                 break;
>> > > +         cgroup = cgroup->parent;
>> > > +         memcg = mem_cgroup_from_cont(cgroup);
>> > > + } while (memcg->use_hierarchy && memcg != rec->root);
>> >
>> > Okay, so this looks correct, but it sums up all parents after each
>> > memcg scanned, which could have a performance impact.  Usually,
>> > hierarchy statistics are only summed up when a user reads them.
>> >
>> Hmm. But sum-at-read doesn't work.
>>
>> Assume 3 cgroups in a hierarchy.
>>
>>       A
>>        /
>>       B
>>      /
>>     C
>>
>> C's scan contains 3 causes.
>>       C's scan caused by limit of A.
>>       C's scan caused by limit of B.
>>       C's scan caused by limit of C.
>>
>> If we make hierarchy sum at read, we think
>>       B's scan_stat = B's scan_stat + C's scan_stat
>> But in precice, this is
>>
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> In orignal version.
>>       B's scan_stat = B's scan_stat caused by B +
>>                       C's scan_stat caused by B +
>>
>> After this patch,
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> Hmm...removing hierarchy part completely seems fine to me.
>
> I see.
>
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children.  IMO, that is asking the question
> backwards.  Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that.  Not the other way round.
>
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
>
> If you want to find out why C is under reclaim, look at its reclaim
> statistics.  If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
>
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
>
>        root_mem == victim -> account to victim as _limit
>        root_mem != victim -> account to victim as _hierarchical
>
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.

This is pretty much the stats I am currently using for debugging the
reclaim patches. For example:

scanned_pages_by_system 0
scanned_pages_by_system_under_hierarchy 50989

scanned_pages_by_limit 0
scanned_pages_by_limit_under_hierarchy 0

"_system" is count under global reclaim, and "_limit" is count under
per-memcg reclaim.
"_under_hiearchy" is set if memcg is not the one triggering pressure.

So in the previous example:

>       A (root)
>        /
>       B
>      /
>     C

For cgroup C:
scanned_pages_by_system:
scanned_pages_by_system_under_hierarchy: # of pages scanned under
global memory pressure

scanned_pages_by_limit: # of pages scanned while C hits the limit
scanned_pages_by_limit_under_hierarchy: # of pages scanned while B
hits the limit

--Ying

>
>> > I don't get why this has to be done completely different from the way
>> > we usually do things, without any justification, whatsoever.
>> >
>> > Why do you want to pass a recording structure down the reclaim stack?
>>
>> Just for reducing number of passed variables.
>
> It's still sitting on bottom of the reclaim stack the whole time.
>
> With my proposal, you would only need to pass the extra root_mem
> pointer.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-09-01  6:05                 ` Ying Han
@ 2011-09-01  6:40                   ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-09-01  6:40 UTC (permalink / raw)
  To: Ying Han
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Andrew Brestic, Michal Hocko, linux-mm,
	linux-kernel

On Wed, Aug 31, 2011 at 11:05:51PM -0700, Ying Han wrote:
> On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> > You want to look at A and see whether its limit was responsible for
> > reclaim scans in any children.  IMO, that is asking the question
> > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > find out the cause for that.  Not the other way round.
> >
> > In my original proposal I suggested differentiating reclaim caused by
> > internal pressure (due to own limit) and reclaim caused by
> > external/hierarchical pressure (due to limits from parents).
> >
> > If you want to find out why C is under reclaim, look at its reclaim
> > statistics.  If the _limit numbers are high, C's limit is the problem.
> > If the _hierarchical numbers are high, the problem is B, A, or
> > physical memory, so you check B for _limit and _hierarchical as well,
> > then move on to A.
> >
> > Implementing this would be as easy as passing not only the memcg to
> > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > reclaim (root_mem):
> >
> >        root_mem == victim -> account to victim as _limit
> >        root_mem != victim -> account to victim as _hierarchical
> >
> > This would make things much simpler and more natural, both the code
> > and the way of tracking down a problem, IMO.
> 
> This is pretty much the stats I am currently using for debugging the
> reclaim patches. For example:
> 
> scanned_pages_by_system 0
> scanned_pages_by_system_under_hierarchy 50989
> 
> scanned_pages_by_limit 0
> scanned_pages_by_limit_under_hierarchy 0
> 
> "_system" is count under global reclaim, and "_limit" is count under
> per-memcg reclaim.
> "_under_hiearchy" is set if memcg is not the one triggering pressure.

I don't get this distinction between _system and _limit.  How is it
orthogonal to _limit vs. _hierarchy, i.e. internal vs. external?

If the system scans memcgs then no limit is at fault.  It's just
external pressure.

For example, what is the distinction between scanned_pages_by_system
and scanned_pages_by_system_under_hierarchy?  The reason for
scanned_pages_by_system would be, per your definition, neither due to
the limit (_by_system -> global reclaim) nor not due to the limit
(!_under_hierarchy -> memcg is the one triggering pressure)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-09-01  6:40                   ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-09-01  6:40 UTC (permalink / raw)
  To: Ying Han
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Andrew Brestic, Michal Hocko, linux-mm,
	linux-kernel

On Wed, Aug 31, 2011 at 11:05:51PM -0700, Ying Han wrote:
> On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> > You want to look at A and see whether its limit was responsible for
> > reclaim scans in any children.  IMO, that is asking the question
> > backwards.  Instead, there is a cgroup under reclaim and one wants to
> > find out the cause for that.  Not the other way round.
> >
> > In my original proposal I suggested differentiating reclaim caused by
> > internal pressure (due to own limit) and reclaim caused by
> > external/hierarchical pressure (due to limits from parents).
> >
> > If you want to find out why C is under reclaim, look at its reclaim
> > statistics.  If the _limit numbers are high, C's limit is the problem.
> > If the _hierarchical numbers are high, the problem is B, A, or
> > physical memory, so you check B for _limit and _hierarchical as well,
> > then move on to A.
> >
> > Implementing this would be as easy as passing not only the memcg to
> > scan (victim) to the reclaim code, but also the memcg /causing/ the
> > reclaim (root_mem):
> >
> >        root_mem == victim -> account to victim as _limit
> >        root_mem != victim -> account to victim as _hierarchical
> >
> > This would make things much simpler and more natural, both the code
> > and the way of tracking down a problem, IMO.
> 
> This is pretty much the stats I am currently using for debugging the
> reclaim patches. For example:
> 
> scanned_pages_by_system 0
> scanned_pages_by_system_under_hierarchy 50989
> 
> scanned_pages_by_limit 0
> scanned_pages_by_limit_under_hierarchy 0
> 
> "_system" is count under global reclaim, and "_limit" is count under
> per-memcg reclaim.
> "_under_hiearchy" is set if memcg is not the one triggering pressure.

I don't get this distinction between _system and _limit.  How is it
orthogonal to _limit vs. _hierarchy, i.e. internal vs. external?

If the system scans memcgs then no limit is at fault.  It's just
external pressure.

For example, what is the distinction between scanned_pages_by_system
and scanned_pages_by_system_under_hierarchy?  The reason for
scanned_pages_by_system would be, per your definition, neither due to
the limit (_by_system -> global reclaim) nor not due to the limit
(!_under_hierarchy -> memcg is the one triggering pressure)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-09-01  6:40                   ` Johannes Weiner
@ 2011-09-01  7:04                     ` Ying Han
  -1 siblings, 0 replies; 54+ messages in thread
From: Ying Han @ 2011-09-01  7:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Michal Hocko, linux-mm, linux-kernel

On Wed, Aug 31, 2011 at 11:40 PM, Johannes Weiner <jweiner@redhat.com> wrote:
> On Wed, Aug 31, 2011 at 11:05:51PM -0700, Ying Han wrote:
>> On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
>> > You want to look at A and see whether its limit was responsible for
>> > reclaim scans in any children.  IMO, that is asking the question
>> > backwards.  Instead, there is a cgroup under reclaim and one wants to
>> > find out the cause for that.  Not the other way round.
>> >
>> > In my original proposal I suggested differentiating reclaim caused by
>> > internal pressure (due to own limit) and reclaim caused by
>> > external/hierarchical pressure (due to limits from parents).
>> >
>> > If you want to find out why C is under reclaim, look at its reclaim
>> > statistics.  If the _limit numbers are high, C's limit is the problem.
>> > If the _hierarchical numbers are high, the problem is B, A, or
>> > physical memory, so you check B for _limit and _hierarchical as well,
>> > then move on to A.
>> >
>> > Implementing this would be as easy as passing not only the memcg to
>> > scan (victim) to the reclaim code, but also the memcg /causing/ the
>> > reclaim (root_mem):
>> >
>> >        root_mem == victim -> account to victim as _limit
>> >        root_mem != victim -> account to victim as _hierarchical
>> >
>> > This would make things much simpler and more natural, both the code
>> > and the way of tracking down a problem, IMO.
>>
>> This is pretty much the stats I am currently using for debugging the
>> reclaim patches. For example:
>>
>> scanned_pages_by_system 0
>> scanned_pages_by_system_under_hierarchy 50989
>>
>> scanned_pages_by_limit 0
>> scanned_pages_by_limit_under_hierarchy 0
>>
>> "_system" is count under global reclaim, and "_limit" is count under
>> per-memcg reclaim.
>> "_under_hiearchy" is set if memcg is not the one triggering pressure.
>
> I don't get this distinction between _system and _limit.  How is it
> orthogonal to _limit vs. _hierarchy, i.e. internal vs. external?

Something like :

+enum mem_cgroup_scan_context {
+       SCAN_BY_SYSTEM,
+       SCAN_BY_SYSTEM_UNDER_HIERARCHY,
+       SCAN_BY_LIMIT,
+       SCAN_BY_LIMIT_UNDER_HIERARCHY,
+       NR_SCAN_CONTEXT,
+};

if (global_reclaim(sc))
   context = scan_by_system
else
   context = scan_by_limit

if (target != mem)
   context++;

>
> If the system scans memcgs then no limit is at fault.  It's just
> external pressure.
>
> For example, what is the distinction between scanned_pages_by_system
> and scanned_pages_by_system_under_hierarchy?

you are right about this, there is no much difference on these since
it is counting global reclaim and everyone
is under_hierarchy except root_cgroup. For root cgroup, it is counted
in "_system". (internal)

The reason for scanned_pages_by_system would be, per your definition,
neither due to
> the limit (_by_system -> global reclaim) nor not due to the limit
> (!_under_hierarchy -> memcg is the one triggering pressure)

This value "scanned_pages_by_system" only making senses for root
cgroup, which now could be counted as "# of pages scanned in root lru
under global reclaim".

--Ying

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-09-01  7:04                     ` Ying Han
  0 siblings, 0 replies; 54+ messages in thread
From: Ying Han @ 2011-09-01  7:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Michal Hocko, linux-mm, linux-kernel

On Wed, Aug 31, 2011 at 11:40 PM, Johannes Weiner <jweiner@redhat.com> wrote:
> On Wed, Aug 31, 2011 at 11:05:51PM -0700, Ying Han wrote:
>> On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
>> > You want to look at A and see whether its limit was responsible for
>> > reclaim scans in any children.  IMO, that is asking the question
>> > backwards.  Instead, there is a cgroup under reclaim and one wants to
>> > find out the cause for that.  Not the other way round.
>> >
>> > In my original proposal I suggested differentiating reclaim caused by
>> > internal pressure (due to own limit) and reclaim caused by
>> > external/hierarchical pressure (due to limits from parents).
>> >
>> > If you want to find out why C is under reclaim, look at its reclaim
>> > statistics.  If the _limit numbers are high, C's limit is the problem.
>> > If the _hierarchical numbers are high, the problem is B, A, or
>> > physical memory, so you check B for _limit and _hierarchical as well,
>> > then move on to A.
>> >
>> > Implementing this would be as easy as passing not only the memcg to
>> > scan (victim) to the reclaim code, but also the memcg /causing/ the
>> > reclaim (root_mem):
>> >
>> >        root_mem == victim -> account to victim as _limit
>> >        root_mem != victim -> account to victim as _hierarchical
>> >
>> > This would make things much simpler and more natural, both the code
>> > and the way of tracking down a problem, IMO.
>>
>> This is pretty much the stats I am currently using for debugging the
>> reclaim patches. For example:
>>
>> scanned_pages_by_system 0
>> scanned_pages_by_system_under_hierarchy 50989
>>
>> scanned_pages_by_limit 0
>> scanned_pages_by_limit_under_hierarchy 0
>>
>> "_system" is count under global reclaim, and "_limit" is count under
>> per-memcg reclaim.
>> "_under_hiearchy" is set if memcg is not the one triggering pressure.
>
> I don't get this distinction between _system and _limit.  How is it
> orthogonal to _limit vs. _hierarchy, i.e. internal vs. external?

Something like :

+enum mem_cgroup_scan_context {
+       SCAN_BY_SYSTEM,
+       SCAN_BY_SYSTEM_UNDER_HIERARCHY,
+       SCAN_BY_LIMIT,
+       SCAN_BY_LIMIT_UNDER_HIERARCHY,
+       NR_SCAN_CONTEXT,
+};

if (global_reclaim(sc))
   context = scan_by_system
else
   context = scan_by_limit

if (target != mem)
   context++;

>
> If the system scans memcgs then no limit is at fault.  It's just
> external pressure.
>
> For example, what is the distinction between scanned_pages_by_system
> and scanned_pages_by_system_under_hierarchy?

you are right about this, there is no much difference on these since
it is counting global reclaim and everyone
is under_hierarchy except root_cgroup. For root cgroup, it is counted
in "_system". (internal)

The reason for scanned_pages_by_system would be, per your definition,
neither due to
> the limit (_by_system -> global reclaim) nor not due to the limit
> (!_under_hierarchy -> memcg is the one triggering pressure)

This value "scanned_pages_by_system" only making senses for root
cgroup, which now could be counted as "# of pages scanned in root lru
under global reclaim".

--Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
  2011-09-01  7:04                     ` Ying Han
@ 2011-09-01  8:27                       ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-09-01  8:27 UTC (permalink / raw)
  To: Ying Han
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Michal Hocko, linux-mm, linux-kernel

On Thu, Sep 01, 2011 at 12:04:24AM -0700, Ying Han wrote:
> On Wed, Aug 31, 2011 at 11:40 PM, Johannes Weiner <jweiner@redhat.com> wrote:
> > On Wed, Aug 31, 2011 at 11:05:51PM -0700, Ying Han wrote:
> >> On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> >> > You want to look at A and see whether its limit was responsible for
> >> > reclaim scans in any children.  IMO, that is asking the question
> >> > backwards.  Instead, there is a cgroup under reclaim and one wants to
> >> > find out the cause for that.  Not the other way round.
> >> >
> >> > In my original proposal I suggested differentiating reclaim caused by
> >> > internal pressure (due to own limit) and reclaim caused by
> >> > external/hierarchical pressure (due to limits from parents).
> >> >
> >> > If you want to find out why C is under reclaim, look at its reclaim
> >> > statistics.  If the _limit numbers are high, C's limit is the problem.
> >> > If the _hierarchical numbers are high, the problem is B, A, or
> >> > physical memory, so you check B for _limit and _hierarchical as well,
> >> > then move on to A.
> >> >
> >> > Implementing this would be as easy as passing not only the memcg to
> >> > scan (victim) to the reclaim code, but also the memcg /causing/ the
> >> > reclaim (root_mem):
> >> >
> >> >        root_mem == victim -> account to victim as _limit
> >> >        root_mem != victim -> account to victim as _hierarchical
> >> >
> >> > This would make things much simpler and more natural, both the code
> >> > and the way of tracking down a problem, IMO.
> >>
> >> This is pretty much the stats I am currently using for debugging the
> >> reclaim patches. For example:
> >>
> >> scanned_pages_by_system 0
> >> scanned_pages_by_system_under_hierarchy 50989
> >>
> >> scanned_pages_by_limit 0
> >> scanned_pages_by_limit_under_hierarchy 0
> >>
> >> "_system" is count under global reclaim, and "_limit" is count under
> >> per-memcg reclaim.
> >> "_under_hiearchy" is set if memcg is not the one triggering pressure.
> >
> > I don't get this distinction between _system and _limit.  How is it
> > orthogonal to _limit vs. _hierarchy, i.e. internal vs. external?
> 
> Something like :
> 
> +enum mem_cgroup_scan_context {
> +       SCAN_BY_SYSTEM,
> +       SCAN_BY_SYSTEM_UNDER_HIERARCHY,
> +       SCAN_BY_LIMIT,
> +       SCAN_BY_LIMIT_UNDER_HIERARCHY,
> +       NR_SCAN_CONTEXT,
> +};
> 
> if (global_reclaim(sc))
>    context = scan_by_system
> else
>    context = scan_by_limit
> 
> if (target != mem)
>    context++;

I understand what you count, just not why.  If we just had

	SCAN_LIMIT
	SCAN_HIERARCHY

wouldn't it be able to convey all that is necessary?  Global pressure
is just hierarchical pressure, it comes from the outermost 'container'
that is the machine itself.

If you have one just memcg, SCAN_LIMIT shows reclaim pressure because
of the limit and SCAN_HIERARCHY shows global pressure.

With a hierarchical setup, you can find pressure either in SCAN_LIMIT
or by looking at SCAN_HIERARCHY and recursively check the parent.

        root_mem_cgroup
       /
      A
     /
    B

Where is the difference for B whether outside pressure is coming from
physical memory limitations or the limit in A?  The problem is not in
B, you have to check the parents anyway.

Or put differently:

                root_mem_cgroup
               /
              A
             /
            B
           /
          C

In C, you would account global pressure separately but would not make
a distinction between pressure from A's limit and pressure from B's
limit.

What makes the physical memory limit special that requires the
resulting reclaims to be designated over reclaims due to other
hierarchical limits?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch] Revert "memcg: add memory.vmscan_stat"
@ 2011-09-01  8:27                       ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2011-09-01  8:27 UTC (permalink / raw)
  To: Ying Han
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Daisuke Nishimura,
	Balbir Singh, Michal Hocko, linux-mm, linux-kernel

On Thu, Sep 01, 2011 at 12:04:24AM -0700, Ying Han wrote:
> On Wed, Aug 31, 2011 at 11:40 PM, Johannes Weiner <jweiner@redhat.com> wrote:
> > On Wed, Aug 31, 2011 at 11:05:51PM -0700, Ying Han wrote:
> >> On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> >> > You want to look at A and see whether its limit was responsible for
> >> > reclaim scans in any children.  IMO, that is asking the question
> >> > backwards.  Instead, there is a cgroup under reclaim and one wants to
> >> > find out the cause for that.  Not the other way round.
> >> >
> >> > In my original proposal I suggested differentiating reclaim caused by
> >> > internal pressure (due to own limit) and reclaim caused by
> >> > external/hierarchical pressure (due to limits from parents).
> >> >
> >> > If you want to find out why C is under reclaim, look at its reclaim
> >> > statistics.  If the _limit numbers are high, C's limit is the problem.
> >> > If the _hierarchical numbers are high, the problem is B, A, or
> >> > physical memory, so you check B for _limit and _hierarchical as well,
> >> > then move on to A.
> >> >
> >> > Implementing this would be as easy as passing not only the memcg to
> >> > scan (victim) to the reclaim code, but also the memcg /causing/ the
> >> > reclaim (root_mem):
> >> >
> >> >        root_mem == victim -> account to victim as _limit
> >> >        root_mem != victim -> account to victim as _hierarchical
> >> >
> >> > This would make things much simpler and more natural, both the code
> >> > and the way of tracking down a problem, IMO.
> >>
> >> This is pretty much the stats I am currently using for debugging the
> >> reclaim patches. For example:
> >>
> >> scanned_pages_by_system 0
> >> scanned_pages_by_system_under_hierarchy 50989
> >>
> >> scanned_pages_by_limit 0
> >> scanned_pages_by_limit_under_hierarchy 0
> >>
> >> "_system" is count under global reclaim, and "_limit" is count under
> >> per-memcg reclaim.
> >> "_under_hiearchy" is set if memcg is not the one triggering pressure.
> >
> > I don't get this distinction between _system and _limit.  How is it
> > orthogonal to _limit vs. _hierarchy, i.e. internal vs. external?
> 
> Something like :
> 
> +enum mem_cgroup_scan_context {
> +       SCAN_BY_SYSTEM,
> +       SCAN_BY_SYSTEM_UNDER_HIERARCHY,
> +       SCAN_BY_LIMIT,
> +       SCAN_BY_LIMIT_UNDER_HIERARCHY,
> +       NR_SCAN_CONTEXT,
> +};
> 
> if (global_reclaim(sc))
>    context = scan_by_system
> else
>    context = scan_by_limit
> 
> if (target != mem)
>    context++;

I understand what you count, just not why.  If we just had

	SCAN_LIMIT
	SCAN_HIERARCHY

wouldn't it be able to convey all that is necessary?  Global pressure
is just hierarchical pressure, it comes from the outermost 'container'
that is the machine itself.

If you have one just memcg, SCAN_LIMIT shows reclaim pressure because
of the limit and SCAN_HIERARCHY shows global pressure.

With a hierarchical setup, you can find pressure either in SCAN_LIMIT
or by looking at SCAN_HIERARCHY and recursively check the parent.

        root_mem_cgroup
       /
      A
     /
    B

Where is the difference for B whether outside pressure is coming from
physical memory limitations or the limit in A?  The problem is not in
B, you have to check the parents anyway.

Or put differently:

                root_mem_cgroup
               /
              A
             /
            B
           /
          C

In C, you would account global pressure separately but would not make
a distinction between pressure from A's limit and pressure from B's
limit.

What makes the physical memory limit special that requires the
resulting reclaims to be designated over reclaims due to other
hierarchical limits?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2011-09-01  8:28 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-22  8:15 [PATCH v3] memcg: add memory.vmscan_stat KAMEZAWA Hiroyuki
2011-07-22  8:15 ` KAMEZAWA Hiroyuki
2011-08-08 12:43 ` Johannes Weiner
2011-08-08 12:43   ` Johannes Weiner
2011-08-08 23:33   ` KAMEZAWA Hiroyuki
2011-08-08 23:33     ` KAMEZAWA Hiroyuki
2011-08-09  8:01     ` Johannes Weiner
2011-08-09  8:01       ` Johannes Weiner
2011-08-09  8:01       ` KAMEZAWA Hiroyuki
2011-08-09  8:01         ` KAMEZAWA Hiroyuki
2011-08-13  1:04         ` Ying Han
2011-08-13  1:04           ` Ying Han
2011-08-29 15:51     ` [patch] Revert "memcg: add memory.vmscan_stat" Johannes Weiner
2011-08-29 15:51       ` Johannes Weiner
2011-08-30  1:12       ` KAMEZAWA Hiroyuki
2011-08-30  1:12         ` KAMEZAWA Hiroyuki
2011-08-30  7:04         ` Johannes Weiner
2011-08-30  7:04           ` Johannes Weiner
2011-08-30  7:20           ` KAMEZAWA Hiroyuki
2011-08-30  7:20             ` KAMEZAWA Hiroyuki
2011-08-30  7:35             ` KAMEZAWA Hiroyuki
2011-08-30  7:35               ` KAMEZAWA Hiroyuki
2011-08-30  8:42             ` Johannes Weiner
2011-08-30  8:42               ` Johannes Weiner
2011-08-30  8:56               ` KAMEZAWA Hiroyuki
2011-08-30  8:56                 ` KAMEZAWA Hiroyuki
2011-08-30 10:17                 ` Johannes Weiner
2011-08-30 10:17                   ` Johannes Weiner
2011-08-30 10:34                   ` KAMEZAWA Hiroyuki
2011-08-30 10:34                     ` KAMEZAWA Hiroyuki
2011-08-30 11:03                     ` Johannes Weiner
2011-08-30 11:03                       ` Johannes Weiner
2011-08-30 23:38                       ` KAMEZAWA Hiroyuki
2011-08-30 23:38                         ` KAMEZAWA Hiroyuki
2011-08-30 10:38                   ` KAMEZAWA Hiroyuki
2011-08-30 10:38                     ` KAMEZAWA Hiroyuki
2011-08-30 11:32                     ` Johannes Weiner
2011-08-30 11:32                       ` Johannes Weiner
2011-08-30 23:29                       ` KAMEZAWA Hiroyuki
2011-08-30 23:29                         ` KAMEZAWA Hiroyuki
2011-08-31  6:23                         ` Johannes Weiner
2011-08-31  6:23                           ` Johannes Weiner
2011-08-31  6:30                           ` KAMEZAWA Hiroyuki
2011-08-31  6:30                             ` KAMEZAWA Hiroyuki
2011-08-31  8:33                             ` Johannes Weiner
2011-08-31  8:33                               ` Johannes Weiner
2011-09-01  6:05               ` Ying Han
2011-09-01  6:05                 ` Ying Han
2011-09-01  6:40                 ` Johannes Weiner
2011-09-01  6:40                   ` Johannes Weiner
2011-09-01  7:04                   ` Ying Han
2011-09-01  7:04                     ` Ying Han
2011-09-01  8:27                     ` Johannes Weiner
2011-09-01  8:27                       ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.