linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting
@ 2012-02-29  9:15 Konstantin Khlebnikov
  2012-02-29  9:15 ` [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled Konstantin Khlebnikov
                   ` (6 more replies)
  0 siblings, 7 replies; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:15 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

Here is some cleanup/rework patches from Hugh Dickins and me.
This is about one-third of my current patchset,
so I prefer to merge this set before going further.

---

Hugh Dickins (2):
      mm/memcg: scanning_global_lru means mem_cgroup_disabled
      mm/memcg: move reclaim_stat into lruvec

Konstantin Khlebnikov (5):
      mm: rework __isolate_lru_page() file/anon filter
      mm: push lru index into shrink_[in]active_list()
      mm: rework reclaim_stat counters
      mm/memcg: rework inactive_ratio calculation
      mm/memcg: use vm_swappiness from target memory cgroup


 include/linux/memcontrol.h |   25 -----
 include/linux/mmzone.h     |   43 ++++----
 include/linux/swap.h       |    2 
 mm/compaction.c            |    5 +
 mm/memcontrol.c            |   86 ++++-------------
 mm/page_alloc.c            |   50 ----------
 mm/swap.c                  |   40 ++------
 mm/vmscan.c                |  225 ++++++++++++++++++++++----------------------
 mm/vmstat.c                |    6 -
 9 files changed, 173 insertions(+), 309 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled
  2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
@ 2012-02-29  9:15 ` Konstantin Khlebnikov
  2012-03-02  5:12   ` KAMEZAWA Hiroyuki
  2012-02-29  9:15 ` [PATCH 2/7] mm/memcg: move reclaim_stat into lruvec Konstantin Khlebnikov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:15 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

From: Hugh Dickins <hughd@google.com>

Although one has to admire the skill with which it has been concealed,
scanning_global_lru(mz) is actually just an interesting way to test
mem_cgroup_disabled().  Too many developer hours have been wasted on
confusing it with global_reclaim(): just use mem_cgroup_disabled().

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |   18 ++++--------------
 1 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 003b3f5..082fbc2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -164,26 +164,16 @@ static bool global_reclaim(struct scan_control *sc)
 {
 	return !sc->target_mem_cgroup;
 }
-
-static bool scanning_global_lru(struct mem_cgroup_zone *mz)
-{
-	return !mz->mem_cgroup;
-}
 #else
 static bool global_reclaim(struct scan_control *sc)
 {
 	return true;
 }
-
-static bool scanning_global_lru(struct mem_cgroup_zone *mz)
-{
-	return true;
-}
 #endif
 
 static struct zone_reclaim_stat *get_reclaim_stat(struct mem_cgroup_zone *mz)
 {
-	if (!scanning_global_lru(mz))
+	if (!mem_cgroup_disabled())
 		return mem_cgroup_get_reclaim_stat(mz->mem_cgroup, mz->zone);
 
 	return &mz->zone->reclaim_stat;
@@ -192,7 +182,7 @@ static struct zone_reclaim_stat *get_reclaim_stat(struct mem_cgroup_zone *mz)
 static unsigned long zone_nr_lru_pages(struct mem_cgroup_zone *mz,
 				       enum lru_list lru)
 {
-	if (!scanning_global_lru(mz))
+	if (!mem_cgroup_disabled())
 		return mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
 						    zone_to_nid(mz->zone),
 						    zone_idx(mz->zone),
@@ -1806,7 +1796,7 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 	if (!total_swap_pages)
 		return 0;
 
-	if (!scanning_global_lru(mz))
+	if (!mem_cgroup_disabled())
 		return mem_cgroup_inactive_anon_is_low(mz->mem_cgroup,
 						       mz->zone);
 
@@ -1845,7 +1835,7 @@ static int inactive_file_is_low_global(struct zone *zone)
  */
 static int inactive_file_is_low(struct mem_cgroup_zone *mz)
 {
-	if (!scanning_global_lru(mz))
+	if (!mem_cgroup_disabled())
 		return mem_cgroup_inactive_file_is_low(mz->mem_cgroup,
 						       mz->zone);
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/7] mm/memcg: move reclaim_stat into lruvec
  2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
  2012-02-29  9:15 ` [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled Konstantin Khlebnikov
@ 2012-02-29  9:15 ` Konstantin Khlebnikov
  2012-03-02  5:14   ` KAMEZAWA Hiroyuki
  2012-02-29  9:15 ` [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter Konstantin Khlebnikov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:15 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

From: Hugh Dickins <hughd@google.com>

With mem_cgroup_disabled() now explicit, it becomes clear that the
zone_reclaim_stat structure actually belongs in lruvec, per-zone
when memcg is disabled but per-memcg per-zone when it's enabled.

We can delete mem_cgroup_get_reclaim_stat(), and change
update_page_reclaim_stat() to update just the one set of stats,
the one which get_scan_count() will actually use.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/memcontrol.h |    9 ---------
 include/linux/mmzone.h     |   29 ++++++++++++++---------------
 mm/memcontrol.c            |   27 +++++++--------------------
 mm/page_alloc.c            |    8 ++++----
 mm/swap.c                  |   14 ++++----------
 mm/vmscan.c                |    5 +----
 6 files changed, 30 insertions(+), 62 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ddfb52e..e2e1fac 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -124,8 +124,6 @@ int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg,
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg,
 					int nid, int zid, unsigned int lrumask);
-struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
-						      struct zone *zone);
 struct zone_reclaim_stat*
 mem_cgroup_get_reclaim_stat_from_page(struct page *page);
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
@@ -355,13 +353,6 @@ mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
 	return 0;
 }
 
-
-static inline struct zone_reclaim_stat*
-mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg, struct zone *zone)
-{
-	return NULL;
-}
-
 static inline struct zone_reclaim_stat*
 mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 {
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7d9d84e..eff4918 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -159,8 +159,22 @@ static inline int is_unevictable_lru(enum lru_list lru)
 	return (lru == LRU_UNEVICTABLE);
 }
 
+struct zone_reclaim_stat {
+	/*
+	 * The pageout code in vmscan.c keeps track of how many of the
+	 * mem/swap backed and file backed pages are refeferenced.
+	 * The higher the rotated/scanned ratio, the more valuable
+	 * that cache is.
+	 *
+	 * The anon LRU stats live in [0], file LRU stats in [1]
+	 */
+	unsigned long		recent_rotated[2];
+	unsigned long		recent_scanned[2];
+};
+
 struct lruvec {
 	struct list_head lists[NR_LRU_LISTS];
+	struct zone_reclaim_stat reclaim_stat;
 };
 
 /* Mask used at gathering information at once (see memcontrol.c) */
@@ -287,19 +301,6 @@ enum zone_type {
 #error ZONES_SHIFT -- too many zones configured adjust calculation
 #endif
 
-struct zone_reclaim_stat {
-	/*
-	 * The pageout code in vmscan.c keeps track of how many of the
-	 * mem/swap backed and file backed pages are refeferenced.
-	 * The higher the rotated/scanned ratio, the more valuable
-	 * that cache is.
-	 *
-	 * The anon LRU stats live in [0], file LRU stats in [1]
-	 */
-	unsigned long		recent_rotated[2];
-	unsigned long		recent_scanned[2];
-};
-
 struct zone {
 	/* Fields commonly accessed by the page allocator */
 
@@ -374,8 +375,6 @@ struct zone {
 	spinlock_t		lru_lock;
 	struct lruvec		lruvec;
 
-	struct zone_reclaim_stat reclaim_stat;
-
 	unsigned long		pages_scanned;	   /* since last reclaim */
 	unsigned long		flags;		   /* zone flags, see below */
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5c9a98a..aeebb9e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -138,7 +138,6 @@ struct mem_cgroup_per_zone {
 
 	struct mem_cgroup_reclaim_iter reclaim_iter[DEF_PRIORITY + 1];
 
-	struct zone_reclaim_stat reclaim_stat;
 	struct rb_node		tree_node;	/* RB tree node */
 	unsigned long long	usage_in_excess;/* Set to the value by which */
 						/* the soft limit is exceeded*/
@@ -1210,16 +1209,6 @@ int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
 	return (active > inactive);
 }
 
-struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
-						      struct zone *zone)
-{
-	int nid = zone_to_nid(zone);
-	int zid = zone_idx(zone);
-	struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
-
-	return &mz->reclaim_stat;
-}
-
 struct zone_reclaim_stat *
 mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 {
@@ -1235,7 +1224,7 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
 	smp_rmb();
 	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
-	return &mz->reclaim_stat;
+	return &mz->lruvec.reclaim_stat;
 }
 
 #define mem_cgroup_from_res_counter(counter, member)	\
@@ -4202,21 +4191,19 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
 	{
 		int nid, zid;
 		struct mem_cgroup_per_zone *mz;
+		struct zone_reclaim_stat *rstat;
 		unsigned long recent_rotated[2] = {0, 0};
 		unsigned long recent_scanned[2] = {0, 0};
 
 		for_each_online_node(nid)
 			for (zid = 0; zid < MAX_NR_ZONES; zid++) {
 				mz = mem_cgroup_zoneinfo(memcg, nid, zid);
+				rstat = &mz->lruvec.reclaim_stat;
 
-				recent_rotated[0] +=
-					mz->reclaim_stat.recent_rotated[0];
-				recent_rotated[1] +=
-					mz->reclaim_stat.recent_rotated[1];
-				recent_scanned[0] +=
-					mz->reclaim_stat.recent_scanned[0];
-				recent_scanned[1] +=
-					mz->reclaim_stat.recent_scanned[1];
+				recent_rotated[0] += rstat->recent_rotated[0];
+				recent_rotated[1] += rstat->recent_rotated[1];
+				recent_scanned[0] += rstat->recent_scanned[0];
+				recent_scanned[1] += rstat->recent_scanned[1];
 			}
 		cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
 		cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2b7f07b..ab2d210 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4365,10 +4365,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		zone_pcp_init(zone);
 		for_each_lru(lru)
 			INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
-		zone->reclaim_stat.recent_rotated[0] = 0;
-		zone->reclaim_stat.recent_rotated[1] = 0;
-		zone->reclaim_stat.recent_scanned[0] = 0;
-		zone->reclaim_stat.recent_scanned[1] = 0;
+		zone->lruvec.reclaim_stat.recent_rotated[0] = 0;
+		zone->lruvec.reclaim_stat.recent_rotated[1] = 0;
+		zone->lruvec.reclaim_stat.recent_scanned[0] = 0;
+		zone->lruvec.reclaim_stat.recent_scanned[1] = 0;
 		zap_zone_vm_stats(zone);
 		zone->flags = 0;
 		if (!size)
diff --git a/mm/swap.c b/mm/swap.c
index 38b2686..9a6850b 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -279,21 +279,15 @@ void rotate_reclaimable_page(struct page *page)
 static void update_page_reclaim_stat(struct zone *zone, struct page *page,
 				     int file, int rotated)
 {
-	struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
-	struct zone_reclaim_stat *memcg_reclaim_stat;
+	struct zone_reclaim_stat *reclaim_stat;
 
-	memcg_reclaim_stat = mem_cgroup_get_reclaim_stat_from_page(page);
+	reclaim_stat = mem_cgroup_get_reclaim_stat_from_page(page);
+	if (!reclaim_stat)
+		reclaim_stat = &zone->lruvec.reclaim_stat;
 
 	reclaim_stat->recent_scanned[file]++;
 	if (rotated)
 		reclaim_stat->recent_rotated[file]++;
-
-	if (!memcg_reclaim_stat)
-		return;
-
-	memcg_reclaim_stat->recent_scanned[file]++;
-	if (rotated)
-		memcg_reclaim_stat->recent_rotated[file]++;
 }
 
 static void __activate_page(struct page *page, void *arg)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 082fbc2..af6cfe7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -173,10 +173,7 @@ static bool global_reclaim(struct scan_control *sc)
 
 static struct zone_reclaim_stat *get_reclaim_stat(struct mem_cgroup_zone *mz)
 {
-	if (!mem_cgroup_disabled())
-		return mem_cgroup_get_reclaim_stat(mz->mem_cgroup, mz->zone);
-
-	return &mz->zone->reclaim_stat;
+	return &mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup)->reclaim_stat;
 }
 
 static unsigned long zone_nr_lru_pages(struct mem_cgroup_zone *mz,


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
  2012-02-29  9:15 ` [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled Konstantin Khlebnikov
  2012-02-29  9:15 ` [PATCH 2/7] mm/memcg: move reclaim_stat into lruvec Konstantin Khlebnikov
@ 2012-02-29  9:15 ` Konstantin Khlebnikov
  2012-03-02  5:17   ` KAMEZAWA Hiroyuki
                     ` (2 more replies)
  2012-02-29  9:15 ` [PATCH 4/7] mm: push lru index into shrink_[in]active_list() Konstantin Khlebnikov
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:15 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

This patch adds file/anon filter bits into isolate_mode_t,
this allows to simplify checks in __isolate_lru_page().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mmzone.h |    4 ++++
 include/linux/swap.h   |    2 +-
 mm/compaction.c        |    5 +++--
 mm/vmscan.c            |   27 +++++++++++++--------------
 4 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index eff4918..2fed935 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -193,6 +193,10 @@ struct lruvec {
 #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
 /* Isolate for asynchronous migration */
 #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
+/* Isolate swap-backed pages */
+#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
+/* Isolate file-backed pages */
+#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
 
 /* LRU Isolation modes. */
 typedef unsigned __bitwise__ isolate_mode_t;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ba2c8d7..dc6e6a3 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
-extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
+extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap);
 extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
diff --git a/mm/compaction.c b/mm/compaction.c
index 74a8c82..cc054f7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	unsigned long last_pageblock_nr = 0, pageblock_nr;
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
-	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
+	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
+			      ISOLATE_FILE | ISOLATE_ANON;
 
 	/* Do not scan outside zone boundaries */
 	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
@@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		/* Try isolate the page */
-		if (__isolate_lru_page(page, mode, 0) != 0)
+		if (__isolate_lru_page(page, mode) != 0)
 			continue;
 
 		VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index af6cfe7..1b70338 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1029,27 +1029,18 @@ keep_lumpy:
  *
  * returns 0 on success, -ve errno on failure.
  */
-int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
+int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 {
-	bool all_lru_mode;
 	int ret = -EINVAL;
 
 	/* Only take pages on the LRU. */
 	if (!PageLRU(page))
 		return ret;
 
-	all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
-		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
-
-	/*
-	 * When checking the active state, we need to be sure we are
-	 * dealing with comparible boolean values.  Take the logical not
-	 * of each.
-	 */
-	if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
+	if (!(mode & (PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE)))
 		return ret;
 
-	if (!all_lru_mode && !!page_is_file_cache(page) != file)
+	if (!(mode & (page_is_file_cache(page) ? ISOLATE_FILE : ISOLATE_ANON)))
 		return ret;
 
 	/*
@@ -1166,7 +1157,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 		VM_BUG_ON(!PageLRU(page));
 
-		switch (__isolate_lru_page(page, mode, file)) {
+		switch (__isolate_lru_page(page, mode)) {
 		case 0:
 			mem_cgroup_lru_del(page);
 			list_move(&page->lru, dst);
@@ -1224,7 +1215,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			    !PageSwapCache(cursor_page))
 				break;
 
-			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
+			if (__isolate_lru_page(cursor_page, mode) == 0) {
 				unsigned int isolated_pages;
 
 				mem_cgroup_lru_del(cursor_page);
@@ -1520,6 +1511,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)
 		isolate_mode |= ISOLATE_CLEAN;
+	if (file)
+		isolate_mode |= ISOLATE_FILE;
+	else
+		isolate_mode |= ISOLATE_ANON;
 
 	spin_lock_irq(&zone->lru_lock);
 
@@ -1682,6 +1677,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)
 		isolate_mode |= ISOLATE_CLEAN;
+	if (file)
+		isolate_mode |= ISOLATE_FILE;
+	else
+		isolate_mode |= ISOLATE_ANON;
 
 	spin_lock_irq(&zone->lru_lock);
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/7] mm: push lru index into shrink_[in]active_list()
  2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
                   ` (2 preceding siblings ...)
  2012-02-29  9:15 ` [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter Konstantin Khlebnikov
@ 2012-02-29  9:15 ` Konstantin Khlebnikov
  2012-03-02  5:21   ` KAMEZAWA Hiroyuki
  2012-03-03  0:24   ` Hugh Dickins
  2012-02-29  9:15 ` [PATCH 5/7] mm: rework reclaim_stat counters Konstantin Khlebnikov
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:15 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

Let's toss lru index through call stack to isolate_lru_pages(),
this is better than its reconstructing from individual bits.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |   42 +++++++++++++++++-------------------------
 1 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1b70338..483b98e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1119,15 +1119,14 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
  * @nr_scanned:	The number of pages that were scanned.
  * @sc:		The scan_control struct for this reclaim session
  * @mode:	One of the LRU isolation modes
- * @active:	True [1] if isolating active pages
- * @file:	True [1] if isolating file [!anon] pages
+ * @lru		LRU list id for isolating
  *
  * returns how many pages were moved onto *@dst.
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct mem_cgroup_zone *mz, struct list_head *dst,
 		unsigned long *nr_scanned, struct scan_control *sc,
-		isolate_mode_t mode, int active, int file)
+		isolate_mode_t mode, enum lru_list lru)
 {
 	struct lruvec *lruvec;
 	struct list_head *src;
@@ -1136,13 +1135,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	unsigned long nr_lumpy_dirty = 0;
 	unsigned long nr_lumpy_failed = 0;
 	unsigned long scan;
-	int lru = LRU_BASE;
 
 	lruvec = mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup);
-	if (active)
-		lru += LRU_ACTIVE;
-	if (file)
-		lru += LRU_FILE;
 	src = &lruvec->lists[lru];
 
 	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
@@ -1258,7 +1252,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			nr_to_scan, scan,
 			nr_taken,
 			nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed,
-			mode, file);
+			mode, is_file_lru(lru));
 	return nr_taken;
 }
 
@@ -1479,7 +1473,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
  */
 static noinline_for_stack unsigned long
 shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
-		     struct scan_control *sc, int priority, int file)
+		     struct scan_control *sc, int priority, enum lru_list lru)
 {
 	LIST_HEAD(page_list);
 	unsigned long nr_scanned;
@@ -1489,6 +1483,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	unsigned long nr_file;
 	unsigned long nr_dirty = 0;
 	unsigned long nr_writeback = 0;
+	int file = is_file_lru(lru);
 	isolate_mode_t isolate_mode = ISOLATE_INACTIVE;
 	struct zone *zone = mz->zone;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
@@ -1519,7 +1514,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	spin_lock_irq(&zone->lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, mz, &page_list, &nr_scanned,
-				     sc, isolate_mode, 0, file);
+				     sc, isolate_mode, lru);
 	if (global_reclaim(sc)) {
 		zone->pages_scanned += nr_scanned;
 		if (current_is_kswapd())
@@ -1657,7 +1652,7 @@ static void move_active_pages_to_lru(struct zone *zone,
 static void shrink_active_list(unsigned long nr_to_scan,
 			       struct mem_cgroup_zone *mz,
 			       struct scan_control *sc,
-			       int priority, int file)
+			       int priority, enum lru_list lru)
 {
 	unsigned long nr_taken;
 	unsigned long nr_scanned;
@@ -1668,6 +1663,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	struct page *page;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
 	unsigned long nr_rotated = 0;
+	int file = is_file_lru(lru);
 	isolate_mode_t isolate_mode = ISOLATE_ACTIVE;
 	struct zone *zone = mz->zone;
 
@@ -1685,17 +1681,14 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	spin_lock_irq(&zone->lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, mz, &l_hold, &nr_scanned, sc,
-				     isolate_mode, 1, file);
+				     isolate_mode, lru);
 	if (global_reclaim(sc))
 		zone->pages_scanned += nr_scanned;
 
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	__count_zone_vm_events(PGREFILL, zone, nr_scanned);
-	if (file)
-		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -nr_taken);
-	else
-		__mod_zone_page_state(zone, NR_ACTIVE_ANON, -nr_taken);
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken);
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
 	spin_unlock_irq(&zone->lru_lock);
 
@@ -1751,10 +1744,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
 
-	move_active_pages_to_lru(zone, &l_active, &l_hold,
-						LRU_ACTIVE + file * LRU_FILE);
-	move_active_pages_to_lru(zone, &l_inactive, &l_hold,
-						LRU_BASE   + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_active, &l_hold, lru);
+	move_active_pages_to_lru(zone, &l_inactive, &l_hold, lru - LRU_ACTIVE);
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
 	spin_unlock_irq(&zone->lru_lock);
 
@@ -1854,11 +1845,11 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 
 	if (is_active_lru(lru)) {
 		if (inactive_list_is_low(mz, file))
-			shrink_active_list(nr_to_scan, mz, sc, priority, file);
+			shrink_active_list(nr_to_scan, mz, sc, priority, lru);
 		return 0;
 	}
 
-	return shrink_inactive_list(nr_to_scan, mz, sc, priority, file);
+	return shrink_inactive_list(nr_to_scan, mz, sc, priority, lru);
 }
 
 static int vmscan_swappiness(struct mem_cgroup_zone *mz,
@@ -2109,7 +2100,8 @@ restart:
 	 * rebalance the anon lru active/inactive ratio.
 	 */
 	if (inactive_anon_is_low(mz))
-		shrink_active_list(SWAP_CLUSTER_MAX, mz, sc, priority, 0);
+		shrink_active_list(SWAP_CLUSTER_MAX, mz,
+				   sc, priority, LRU_ACTIVE_ANON);
 
 	/* reclaim/compaction might need reclaim to continue */
 	if (should_continue_reclaim(mz, nr_reclaimed,
@@ -2551,7 +2543,7 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc,
 
 		if (inactive_anon_is_low(&mz))
 			shrink_active_list(SWAP_CLUSTER_MAX, &mz,
-					   sc, priority, 0);
+					   sc, priority, LRU_ACTIVE_ANON);
 
 		memcg = mem_cgroup_iter(NULL, memcg, NULL);
 	} while (memcg);


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/7] mm: rework reclaim_stat counters
  2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
                   ` (3 preceding siblings ...)
  2012-02-29  9:15 ` [PATCH 4/7] mm: push lru index into shrink_[in]active_list() Konstantin Khlebnikov
@ 2012-02-29  9:15 ` Konstantin Khlebnikov
  2012-03-02  5:28   ` KAMEZAWA Hiroyuki
  2012-02-29  9:16 ` [PATCH 6/7] mm/memcg: rework inactive_ratio calculation Konstantin Khlebnikov
  2012-02-29  9:16 ` [PATCH 7/7] mm/memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
  6 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:15 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

Currently there is two types of reclaim-stat counters:
recent_scanned (pages picked from from lru),
recent_rotated (pages putted back to active lru).
Reclaimer uses ratio recent_rotated / recent_scanned
for balancing pressure between file and anon pages.

But if we pick page from lru we can either reclaim it or put it back to lru, thus:
recent_scanned == recent_rotated[inactive] + recent_rotated[active] + reclaimed
This can be called "The Law of Conservation of Memory" =)

Thus recent_rotated counters for each lru list is enough, reclaimed pages can be
counted as rotatation into inactive lru. After that reclaimer can use this ratio:
recent_rotated[active] / (recent_rotated[active] + recent_rotated[inactive])

After this patch struct zone_reclaimer_stat has only one array: recent_rotated,
which is directly indexed by lru list index.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mmzone.h |   11 +++++------
 mm/memcontrol.c        |   29 +++++++++++++++++------------
 mm/page_alloc.c        |    6 ++----
 mm/swap.c              |   26 ++++++++------------------
 mm/vmscan.c            |   42 ++++++++++++++++++++++--------------------
 5 files changed, 54 insertions(+), 60 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2fed935..fdcd683 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -137,12 +137,14 @@ enum lru_list {
 	LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
 	LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
 	LRU_UNEVICTABLE,
-	NR_LRU_LISTS
+	NR_LRU_LISTS,
+	NR_EVICTABLE_LRU_LISTS = LRU_UNEVICTABLE,
 };
 
 #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)
 
-#define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++)
+#define for_each_evictable_lru(lru) \
+	for (lru = 0; lru < NR_EVICTABLE_LRU_LISTS; lru++)
 
 static inline int is_file_lru(enum lru_list lru)
 {
@@ -165,11 +167,8 @@ struct zone_reclaim_stat {
 	 * mem/swap backed and file backed pages are refeferenced.
 	 * The higher the rotated/scanned ratio, the more valuable
 	 * that cache is.
-	 *
-	 * The anon LRU stats live in [0], file LRU stats in [1]
 	 */
-	unsigned long		recent_rotated[2];
-	unsigned long		recent_scanned[2];
+	unsigned long		recent_rotated[NR_EVICTABLE_LRU_LISTS];
 };
 
 struct lruvec {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aeebb9e..2809531 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4189,26 +4189,31 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
 
 #ifdef CONFIG_DEBUG_VM
 	{
-		int nid, zid;
+		int nid, zid, lru;
 		struct mem_cgroup_per_zone *mz;
 		struct zone_reclaim_stat *rstat;
-		unsigned long recent_rotated[2] = {0, 0};
-		unsigned long recent_scanned[2] = {0, 0};
+		unsigned long recent_rotated[NR_EVICTABLE_LRU_LISTS];
 
+		memset(recent_rotated, 0, sizeof(recent_rotated));
 		for_each_online_node(nid)
 			for (zid = 0; zid < MAX_NR_ZONES; zid++) {
 				mz = mem_cgroup_zoneinfo(memcg, nid, zid);
 				rstat = &mz->lruvec.reclaim_stat;
-
-				recent_rotated[0] += rstat->recent_rotated[0];
-				recent_rotated[1] += rstat->recent_rotated[1];
-				recent_scanned[0] += rstat->recent_scanned[0];
-				recent_scanned[1] += rstat->recent_scanned[1];
+				for_each_evictable_lru(lru)
+					recent_rotated[lru] +=
+						rstat->recent_rotated[lru];
 			}
-		cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
-		cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
-		cb->fill(cb, "recent_scanned_anon", recent_scanned[0]);
-		cb->fill(cb, "recent_scanned_file", recent_scanned[1]);
+
+		cb->fill(cb, "recent_rotated_anon",
+				recent_rotated[LRU_ACTIVE_ANON]);
+		cb->fill(cb, "recent_rotated_file",
+				recent_rotated[LRU_ACTIVE_FILE]);
+		cb->fill(cb, "recent_scanned_anon",
+				recent_rotated[LRU_ACTIVE_ANON] +
+				recent_rotated[LRU_INACTIVE_ANON]);
+		cb->fill(cb, "recent_scanned_file",
+				recent_rotated[LRU_ACTIVE_FILE] +
+				recent_rotated[LRU_INACTIVE_FILE]);
 	}
 #endif
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ab2d210..ea40034 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4365,10 +4365,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		zone_pcp_init(zone);
 		for_each_lru(lru)
 			INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
-		zone->lruvec.reclaim_stat.recent_rotated[0] = 0;
-		zone->lruvec.reclaim_stat.recent_rotated[1] = 0;
-		zone->lruvec.reclaim_stat.recent_scanned[0] = 0;
-		zone->lruvec.reclaim_stat.recent_scanned[1] = 0;
+		memset(&zone->lruvec.reclaim_stat, 0,
+				sizeof(struct zone_reclaim_stat));
 		zap_zone_vm_stats(zone);
 		zone->flags = 0;
 		if (!size)
diff --git a/mm/swap.c b/mm/swap.c
index 9a6850b..c7bcde7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -277,7 +277,7 @@ void rotate_reclaimable_page(struct page *page)
 }
 
 static void update_page_reclaim_stat(struct zone *zone, struct page *page,
-				     int file, int rotated)
+				     enum lru_list lru)
 {
 	struct zone_reclaim_stat *reclaim_stat;
 
@@ -285,9 +285,7 @@ static void update_page_reclaim_stat(struct zone *zone, struct page *page,
 	if (!reclaim_stat)
 		reclaim_stat = &zone->lruvec.reclaim_stat;
 
-	reclaim_stat->recent_scanned[file]++;
-	if (rotated)
-		reclaim_stat->recent_rotated[file]++;
+	reclaim_stat->recent_rotated[lru]++;
 }
 
 static void __activate_page(struct page *page, void *arg)
@@ -295,7 +293,6 @@ static void __activate_page(struct page *page, void *arg)
 	struct zone *zone = page_zone(page);
 
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
-		int file = page_is_file_cache(page);
 		int lru = page_lru_base_type(page);
 		del_page_from_lru_list(zone, page, lru);
 
@@ -304,7 +301,7 @@ static void __activate_page(struct page *page, void *arg)
 		add_page_to_lru_list(zone, page, lru);
 		__count_vm_event(PGACTIVATE);
 
-		update_page_reclaim_stat(zone, page, file, 1);
+		update_page_reclaim_stat(zone, page, lru);
 	}
 }
 
@@ -482,7 +479,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 
 	if (active)
 		__count_vm_event(PGDEACTIVATE);
-	update_page_reclaim_stat(zone, page, file, 0);
+	update_page_reclaim_stat(zone, page, lru);
 }
 
 /*
@@ -646,9 +643,7 @@ EXPORT_SYMBOL(__pagevec_release);
 void lru_add_page_tail(struct zone* zone,
 		       struct page *page, struct page *page_tail)
 {
-	int active;
 	enum lru_list lru;
-	const int file = 0;
 
 	VM_BUG_ON(!PageHead(page));
 	VM_BUG_ON(PageCompound(page_tail));
@@ -660,13 +655,10 @@ void lru_add_page_tail(struct zone* zone,
 	if (page_evictable(page_tail, NULL)) {
 		if (PageActive(page)) {
 			SetPageActive(page_tail);
-			active = 1;
 			lru = LRU_ACTIVE_ANON;
-		} else {
-			active = 0;
+		} else
 			lru = LRU_INACTIVE_ANON;
-		}
-		update_page_reclaim_stat(zone, page_tail, file, active);
+		update_page_reclaim_stat(zone, page_tail, lru);
 	} else {
 		SetPageUnevictable(page_tail);
 		lru = LRU_UNEVICTABLE;
@@ -694,17 +686,15 @@ static void __pagevec_lru_add_fn(struct page *page, void *arg)
 {
 	enum lru_list lru = (enum lru_list)arg;
 	struct zone *zone = page_zone(page);
-	int file = is_file_lru(lru);
-	int active = is_active_lru(lru);
 
 	VM_BUG_ON(PageActive(page));
 	VM_BUG_ON(PageUnevictable(page));
 	VM_BUG_ON(PageLRU(page));
 
 	SetPageLRU(page);
-	if (active)
+	if (is_active_lru(lru))
 		SetPageActive(page);
-	update_page_reclaim_stat(zone, page, file, active);
+	update_page_reclaim_stat(zone, page, lru);
 	add_page_to_lru_list(zone, page, lru);
 }
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 483b98e..fe00a22 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1355,11 +1355,7 @@ putback_inactive_pages(struct mem_cgroup_zone *mz,
 		SetPageLRU(page);
 		lru = page_lru(page);
 		add_page_to_lru_list(zone, page, lru);
-		if (is_active_lru(lru)) {
-			int file = is_file_lru(lru);
-			int numpages = hpage_nr_pages(page);
-			reclaim_stat->recent_rotated[file] += numpages;
-		}
+		reclaim_stat->recent_rotated[lru] += hpage_nr_pages(page);
 		if (put_page_testzero(page)) {
 			__ClearPageLRU(page);
 			__ClearPageActive(page);
@@ -1543,8 +1539,11 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 
 	spin_lock_irq(&zone->lru_lock);
 
-	reclaim_stat->recent_scanned[0] += nr_anon;
-	reclaim_stat->recent_scanned[1] += nr_file;
+	/*
+	 * Count reclaimed pages as rotated, this helps balance scan pressure
+	 * between file and anonymous pages in get_scan_ratio.
+	 */
+	reclaim_stat->recent_rotated[lru] += nr_reclaimed;
 
 	if (current_is_kswapd())
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
@@ -1685,8 +1684,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	if (global_reclaim(sc))
 		zone->pages_scanned += nr_scanned;
 
-	reclaim_stat->recent_scanned[file] += nr_taken;
-
 	__count_zone_vm_events(PGREFILL, zone, nr_scanned);
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken);
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
@@ -1742,7 +1739,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	 * helps balance scan pressure between file and anonymous pages in
 	 * get_scan_ratio.
 	 */
-	reclaim_stat->recent_rotated[file] += nr_rotated;
+	reclaim_stat->recent_rotated[lru] += nr_rotated;
 
 	move_active_pages_to_lru(zone, &l_active, &l_hold, lru);
 	move_active_pages_to_lru(zone, &l_inactive, &l_hold, lru - LRU_ACTIVE);
@@ -1875,6 +1872,7 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	unsigned long anon_prio, file_prio;
 	unsigned long ap, fp;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
+	unsigned long *recent_rotated = reclaim_stat->recent_rotated;
 	u64 fraction[2], denominator;
 	enum lru_list lru;
 	int noswap = 0;
@@ -1940,14 +1938,16 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 * anon in [0], file in [1]
 	 */
 	spin_lock_irq(&mz->zone->lru_lock);
-	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
-		reclaim_stat->recent_scanned[0] /= 2;
-		reclaim_stat->recent_rotated[0] /= 2;
+	if (unlikely(recent_rotated[LRU_INACTIVE_ANON] +
+		     recent_rotated[LRU_ACTIVE_ANON] > anon / 4)) {
+		recent_rotated[LRU_INACTIVE_ANON] /= 2;
+		recent_rotated[LRU_ACTIVE_ANON] /= 2;
 	}
 
-	if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
-		reclaim_stat->recent_scanned[1] /= 2;
-		reclaim_stat->recent_rotated[1] /= 2;
+	if (unlikely(recent_rotated[LRU_INACTIVE_FILE] +
+		     recent_rotated[LRU_ACTIVE_FILE] > file / 4)) {
+		recent_rotated[LRU_INACTIVE_FILE] /= 2;
+		recent_rotated[LRU_ACTIVE_FILE] /= 2;
 	}
 
 	/*
@@ -1955,11 +1955,13 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 * proportional to the fraction of recently scanned pages on
 	 * each list that were recently referenced and in active use.
 	 */
-	ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
-	ap /= reclaim_stat->recent_rotated[0] + 1;
+	ap = (anon_prio + 1) * (recent_rotated[LRU_INACTIVE_ANON] +
+				recent_rotated[LRU_ACTIVE_ANON] + 1);
+	ap /= recent_rotated[LRU_ACTIVE_ANON] + 1;
 
-	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
-	fp /= reclaim_stat->recent_rotated[1] + 1;
+	fp = (file_prio + 1) * (recent_rotated[LRU_INACTIVE_FILE] +
+				recent_rotated[LRU_ACTIVE_FILE] + 1);
+	fp /= recent_rotated[LRU_ACTIVE_FILE] + 1;
 	spin_unlock_irq(&mz->zone->lru_lock);
 
 	fraction[0] = ap;


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/7] mm/memcg: rework inactive_ratio calculation
  2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
                   ` (4 preceding siblings ...)
  2012-02-29  9:15 ` [PATCH 5/7] mm: rework reclaim_stat counters Konstantin Khlebnikov
@ 2012-02-29  9:16 ` Konstantin Khlebnikov
  2012-03-02  5:31   ` KAMEZAWA Hiroyuki
  2012-02-29  9:16 ` [PATCH 7/7] mm/memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
  6 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:16 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

This patch removes precalculated zone->inactive_ratio.
Now it always calculated in inactive_anon_is_low() from current lru size.
After that we can merge memcg and non-memcg cases and drop duplicated code.

We can drop precalculated ratio, because its calculation fast enough to do it
each time. Plus precalculation uses zone size as basis, this estimation not
always match with page lru size, for example if a significant proportion
of memory occupied by kernel objects.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/memcontrol.h |   16 --------
 include/linux/mmzone.h     |    7 ----
 mm/memcontrol.c            |   38 -------------------
 mm/page_alloc.c            |   44 ----------------------
 mm/vmscan.c                |   88 ++++++++++++++++++++++++++++----------------
 mm/vmstat.c                |    6 +--
 6 files changed, 58 insertions(+), 141 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e2e1fac..7e114f8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -117,10 +117,6 @@ void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
 /*
  * For memory reclaim.
  */
-int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg,
-				    struct zone *zone);
-int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg,
-				    struct zone *zone);
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg,
 					int nid, int zid, unsigned int lrumask);
@@ -334,18 +330,6 @@ static inline bool mem_cgroup_disabled(void)
 	return true;
 }
 
-static inline int
-mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	return 1;
-}
-
-static inline int
-mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	return 1;
-}
-
 static inline unsigned long
 mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
 				unsigned int lru_mask)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fdcd683..7edcf17 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -384,13 +384,6 @@ struct zone {
 	/* Zone statistics */
 	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
 
-	/*
-	 * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
-	 * this zone's LRU.  Maintained by the pageout code.
-	 */
-	unsigned int inactive_ratio;
-
-
 	ZONE_PADDING(_pad2_)
 	/* Rarely used or read-mostly fields */
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2809531..4bc6835 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1171,44 +1171,6 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *memcg)
 	return ret;
 }
 
-int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	unsigned long inactive_ratio;
-	int nid = zone_to_nid(zone);
-	int zid = zone_idx(zone);
-	unsigned long inactive;
-	unsigned long active;
-	unsigned long gb;
-
-	inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-						BIT(LRU_INACTIVE_ANON));
-	active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-					      BIT(LRU_ACTIVE_ANON));
-
-	gb = (inactive + active) >> (30 - PAGE_SHIFT);
-	if (gb)
-		inactive_ratio = int_sqrt(10 * gb);
-	else
-		inactive_ratio = 1;
-
-	return inactive * inactive_ratio < active;
-}
-
-int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	unsigned long active;
-	unsigned long inactive;
-	int zid = zone_idx(zone);
-	int nid = zone_to_nid(zone);
-
-	inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-						BIT(LRU_INACTIVE_FILE));
-	active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-					      BIT(LRU_ACTIVE_FILE));
-
-	return (active > inactive);
-}
-
 struct zone_reclaim_stat *
 mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ea40034..2e90931 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5051,49 +5051,6 @@ void setup_per_zone_wmarks(void)
 }
 
 /*
- * The inactive anon list should be small enough that the VM never has to
- * do too much work, but large enough that each inactive page has a chance
- * to be referenced again before it is swapped out.
- *
- * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
- * INACTIVE_ANON pages on this zone's LRU, maintained by the
- * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
- * the anonymous pages are kept on the inactive list.
- *
- * total     target    max
- * memory    ratio     inactive anon
- * -------------------------------------
- *   10MB       1         5MB
- *  100MB       1        50MB
- *    1GB       3       250MB
- *   10GB      10       0.9GB
- *  100GB      31         3GB
- *    1TB     101        10GB
- *   10TB     320        32GB
- */
-static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
-{
-	unsigned int gb, ratio;
-
-	/* Zone size in gigabytes */
-	gb = zone->present_pages >> (30 - PAGE_SHIFT);
-	if (gb)
-		ratio = int_sqrt(10 * gb);
-	else
-		ratio = 1;
-
-	zone->inactive_ratio = ratio;
-}
-
-static void __meminit setup_per_zone_inactive_ratio(void)
-{
-	struct zone *zone;
-
-	for_each_zone(zone)
-		calculate_zone_inactive_ratio(zone);
-}
-
-/*
  * Initialise min_free_kbytes.
  *
  * For small machines we want it small (128k min).  For large machines
@@ -5131,7 +5088,6 @@ int __meminit init_per_zone_wmark_min(void)
 	setup_per_zone_wmarks();
 	refresh_zone_stat_thresholds();
 	setup_per_zone_lowmem_reserve();
-	setup_per_zone_inactive_ratio();
 	return 0;
 }
 module_init(init_per_zone_wmark_min)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fe00a22..ab447df 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1750,29 +1750,38 @@ static void shrink_active_list(unsigned long nr_to_scan,
 }
 
 #ifdef CONFIG_SWAP
-static int inactive_anon_is_low_global(struct zone *zone)
-{
-	unsigned long active, inactive;
-
-	active = zone_page_state(zone, NR_ACTIVE_ANON);
-	inactive = zone_page_state(zone, NR_INACTIVE_ANON);
-
-	if (inactive * zone->inactive_ratio < active)
-		return 1;
-
-	return 0;
-}
-
 /**
  * inactive_anon_is_low - check if anonymous pages need to be deactivated
  * @zone: zone to check
- * @sc:   scan control of this context
  *
  * Returns true if the zone does not have enough inactive anon pages,
  * meaning some active anon pages need to be deactivated.
+ *
+ * The inactive anon list should be small enough that the VM never has to
+ * do too much work, but large enough that each inactive page has a chance
+ * to be referenced again before it is swapped out.
+ *
+ * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
+ * INACTIVE_ANON pages on this zone's LRU, maintained by the
+ * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
+ * the anonymous pages are kept on the inactive list.
+ *
+ * total     target    max
+ * memory    ratio     inactive anon
+ * -------------------------------------
+ *   10MB       1         5MB
+ *  100MB       1        50MB
+ *    1GB       3       250MB
+ *   10GB      10       0.9GB
+ *  100GB      31         3GB
+ *    1TB     101        10GB
+ *   10TB     320        32GB
  */
 static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 {
+	unsigned long active, inactive;
+	unsigned int gb, ratio;
+
 	/*
 	 * If we don't have swap space, anonymous page deactivation
 	 * is pointless.
@@ -1780,11 +1789,26 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 	if (!total_swap_pages)
 		return 0;
 
-	if (!mem_cgroup_disabled())
-		return mem_cgroup_inactive_anon_is_low(mz->mem_cgroup,
-						       mz->zone);
+	if (mem_cgroup_disabled()) {
+		active = zone_page_state(mz->zone, NR_ACTIVE_ANON);
+		inactive = zone_page_state(mz->zone, NR_INACTIVE_ANON);
+	} else {
+		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_ACTIVE_ANON));
+		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_INACTIVE_ANON));
+	}
+
+	/* Total size in gigabytes */
+	gb = (active + inactive) >> (30 - PAGE_SHIFT);
+	if (gb)
+		ratio = int_sqrt(10 * gb);
+	else
+		ratio = 1;
 
-	return inactive_anon_is_low_global(mz->zone);
+	return inactive * ratio < active;
 }
 #else
 static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
@@ -1793,16 +1817,6 @@ static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 }
 #endif
 
-static int inactive_file_is_low_global(struct zone *zone)
-{
-	unsigned long active, inactive;
-
-	active = zone_page_state(zone, NR_ACTIVE_FILE);
-	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
-
-	return (active > inactive);
-}
-
 /**
  * inactive_file_is_low - check if file pages need to be deactivated
  * @mz: memory cgroup and zone to check
@@ -1819,11 +1833,21 @@ static int inactive_file_is_low_global(struct zone *zone)
  */
 static int inactive_file_is_low(struct mem_cgroup_zone *mz)
 {
-	if (!mem_cgroup_disabled())
-		return mem_cgroup_inactive_file_is_low(mz->mem_cgroup,
-						       mz->zone);
+	unsigned long active, inactive;
+
+	if (mem_cgroup_disabled()) {
+		active = zone_page_state(mz->zone, NR_ACTIVE_FILE);
+		inactive = zone_page_state(mz->zone, NR_INACTIVE_FILE);
+	} else {
+		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_ACTIVE_FILE));
+		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_INACTIVE_FILE));
+	}
 
-	return inactive_file_is_low_global(mz->zone);
+	return inactive < active;
 }
 
 static int inactive_list_is_low(struct mem_cgroup_zone *mz, int file)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f600557..2c813e1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1017,11 +1017,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 	}
 	seq_printf(m,
 		   "\n  all_unreclaimable: %u"
-		   "\n  start_pfn:         %lu"
-		   "\n  inactive_ratio:    %u",
+		   "\n  start_pfn:         %lu",
 		   zone->all_unreclaimable,
-		   zone->zone_start_pfn,
-		   zone->inactive_ratio);
+		   zone->zone_start_pfn);
 	seq_putc(m, '\n');
 }
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/7] mm/memcg: use vm_swappiness from target memory cgroup
  2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
                   ` (5 preceding siblings ...)
  2012-02-29  9:16 ` [PATCH 6/7] mm/memcg: rework inactive_ratio calculation Konstantin Khlebnikov
@ 2012-02-29  9:16 ` Konstantin Khlebnikov
  2012-03-02  5:32   ` KAMEZAWA Hiroyuki
  6 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-29  9:16 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

Use vm_swappiness from memory cgroup which is triggered this memory reclaim.
This is more reasonable and allows to kill one argument.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ab447df..fd96eb8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1873,12 +1873,11 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 	return shrink_inactive_list(nr_to_scan, mz, sc, priority, lru);
 }
 
-static int vmscan_swappiness(struct mem_cgroup_zone *mz,
-			     struct scan_control *sc)
+static int vmscan_swappiness(struct scan_control *sc)
 {
 	if (global_reclaim(sc))
 		return vm_swappiness;
-	return mem_cgroup_swappiness(mz->mem_cgroup);
+	return mem_cgroup_swappiness(sc->target_mem_cgroup);
 }
 
 /*
@@ -1947,8 +1946,8 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 * With swappiness at 100, anonymous and file have the same priority.
 	 * This scanning priority is essentially the inverse of IO cost.
 	 */
-	anon_prio = vmscan_swappiness(mz, sc);
-	file_prio = 200 - vmscan_swappiness(mz, sc);
+	anon_prio = vmscan_swappiness(sc);
+	file_prio = 200 - vmscan_swappiness(sc);
 
 	/*
 	 * OK, so we have swap space and a fair amount of page cache


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled
  2012-02-29  9:15 ` [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled Konstantin Khlebnikov
@ 2012-03-02  5:12   ` KAMEZAWA Hiroyuki
  2012-03-06 11:46     ` Glauber Costa
  0 siblings, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  5:12 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Wed, 29 Feb 2012 13:15:39 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> From: Hugh Dickins <hughd@google.com>
> 
> Although one has to admire the skill with which it has been concealed,
> scanning_global_lru(mz) is actually just an interesting way to test
> mem_cgroup_disabled().  Too many developer hours have been wasted on
> confusing it with global_reclaim(): just use mem_cgroup_disabled().
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: KAMEZWA Hiroyuki <kamezawa.hiroyu@jp.fujitu.com>


> ---
>  mm/vmscan.c |   18 ++++--------------
>  1 files changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 003b3f5..082fbc2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -164,26 +164,16 @@ static bool global_reclaim(struct scan_control *sc)
>  {
>  	return !sc->target_mem_cgroup;
>  }
> -
> -static bool scanning_global_lru(struct mem_cgroup_zone *mz)
> -{
> -	return !mz->mem_cgroup;
> -}
>  #else
>  static bool global_reclaim(struct scan_control *sc)
>  {
>  	return true;
>  }
> -
> -static bool scanning_global_lru(struct mem_cgroup_zone *mz)
> -{
> -	return true;
> -}
>  #endif
>  
>  static struct zone_reclaim_stat *get_reclaim_stat(struct mem_cgroup_zone *mz)
>  {
> -	if (!scanning_global_lru(mz))
> +	if (!mem_cgroup_disabled())
>  		return mem_cgroup_get_reclaim_stat(mz->mem_cgroup, mz->zone);
>  
>  	return &mz->zone->reclaim_stat;
> @@ -192,7 +182,7 @@ static struct zone_reclaim_stat *get_reclaim_stat(struct mem_cgroup_zone *mz)
>  static unsigned long zone_nr_lru_pages(struct mem_cgroup_zone *mz,
>  				       enum lru_list lru)
>  {
> -	if (!scanning_global_lru(mz))
> +	if (!mem_cgroup_disabled())
>  		return mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
>  						    zone_to_nid(mz->zone),
>  						    zone_idx(mz->zone),
> @@ -1806,7 +1796,7 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>  	if (!total_swap_pages)
>  		return 0;
>  
> -	if (!scanning_global_lru(mz))
> +	if (!mem_cgroup_disabled())
>  		return mem_cgroup_inactive_anon_is_low(mz->mem_cgroup,
>  						       mz->zone);
>  
> @@ -1845,7 +1835,7 @@ static int inactive_file_is_low_global(struct zone *zone)
>   */
>  static int inactive_file_is_low(struct mem_cgroup_zone *mz)
>  {
> -	if (!scanning_global_lru(mz))
> +	if (!mem_cgroup_disabled())
>  		return mem_cgroup_inactive_file_is_low(mz->mem_cgroup,
>  						       mz->zone);
>  
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] mm/memcg: move reclaim_stat into lruvec
  2012-02-29  9:15 ` [PATCH 2/7] mm/memcg: move reclaim_stat into lruvec Konstantin Khlebnikov
@ 2012-03-02  5:14   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  5:14 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Wed, 29 Feb 2012 13:15:43 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> From: Hugh Dickins <hughd@google.com>
> 
> With mem_cgroup_disabled() now explicit, it becomes clear that the
> zone_reclaim_stat structure actually belongs in lruvec, per-zone
> when memcg is disabled but per-memcg per-zone when it's enabled.
> 
> We can delete mem_cgroup_get_reclaim_stat(), and change
> update_page_reclaim_stat() to update just the one set of stats,
> the one which get_scan_count() will actually use.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-02-29  9:15 ` [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter Konstantin Khlebnikov
@ 2012-03-02  5:17   ` KAMEZAWA Hiroyuki
  2012-03-02  5:51     ` Konstantin Khlebnikov
  2012-03-03  0:22   ` Hugh Dickins
  2012-03-03  9:16   ` [PATCH 3/7 v2] " Konstantin Khlebnikov
  2 siblings, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  5:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Wed, 29 Feb 2012 13:15:47 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This patch adds file/anon filter bits into isolate_mode_t,
> this allows to simplify checks in __isolate_lru_page().
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Hmm.. I like idea but..

> ---
>  include/linux/mmzone.h |    4 ++++
>  include/linux/swap.h   |    2 +-
>  mm/compaction.c        |    5 +++--
>  mm/vmscan.c            |   27 +++++++++++++--------------
>  4 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index eff4918..2fed935 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -193,6 +193,10 @@ struct lruvec {
>  #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>  /* Isolate for asynchronous migration */
>  #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
> +/* Isolate swap-backed pages */
> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
> +/* Isolate file-backed pages */
> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
>  
>  /* LRU Isolation modes. */
>  typedef unsigned __bitwise__ isolate_mode_t;
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ba2c8d7..dc6e6a3 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
>  /* linux/mm/vmscan.c */
>  extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  					gfp_t gfp_mask, nodemask_t *mask);
> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
> +extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
>  extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
>  						  gfp_t gfp_mask, bool noswap);
>  extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 74a8c82..cc054f7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	unsigned long last_pageblock_nr = 0, pageblock_nr;
>  	unsigned long nr_scanned = 0, nr_isolated = 0;
>  	struct list_head *migratelist = &cc->migratepages;
> -	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
> +	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
> +			      ISOLATE_FILE | ISOLATE_ANON;
>  
>  	/* Do not scan outside zone boundaries */
>  	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  			mode |= ISOLATE_ASYNC_MIGRATE;
>  
>  		/* Try isolate the page */
> -		if (__isolate_lru_page(page, mode, 0) != 0)
> +		if (__isolate_lru_page(page, mode) != 0)
>  			continue;
>  
>  		VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index af6cfe7..1b70338 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1029,27 +1029,18 @@ keep_lumpy:
>   *
>   * returns 0 on success, -ve errno on failure.
>   */
> -int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
> +int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  {
> -	bool all_lru_mode;
>  	int ret = -EINVAL;
>  
>  	/* Only take pages on the LRU. */
>  	if (!PageLRU(page))
>  		return ret;
>  
> -	all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
> -		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
> -
> -	/*
> -	 * When checking the active state, we need to be sure we are
> -	 * dealing with comparible boolean values.  Take the logical not
> -	 * of each.
> -	 */
> -	if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
> +	if (!(mode & (PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE)))
>  		return ret;

Isn't this complicated ?

>  
> -	if (!all_lru_mode && !!page_is_file_cache(page) != file)
> +	if (!(mode & (page_is_file_cache(page) ? ISOLATE_FILE : ISOLATE_ANON)))
>  		return ret;
>  
>  	/*

ditto.

Where is simple  ? 

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/7] mm: push lru index into shrink_[in]active_list()
  2012-02-29  9:15 ` [PATCH 4/7] mm: push lru index into shrink_[in]active_list() Konstantin Khlebnikov
@ 2012-03-02  5:21   ` KAMEZAWA Hiroyuki
  2012-03-03  0:24   ` Hugh Dickins
  1 sibling, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  5:21 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Wed, 29 Feb 2012 13:15:52 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Let's toss lru index through call stack to isolate_lru_pages(),
> this is better than its reconstructing from individual bits.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

I like this.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/7] mm: rework reclaim_stat counters
  2012-02-29  9:15 ` [PATCH 5/7] mm: rework reclaim_stat counters Konstantin Khlebnikov
@ 2012-03-02  5:28   ` KAMEZAWA Hiroyuki
  2012-03-02  6:11     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  5:28 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Wed, 29 Feb 2012 13:15:56 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Currently there is two types of reclaim-stat counters:
> recent_scanned (pages picked from from lru),
> recent_rotated (pages putted back to active lru).
> Reclaimer uses ratio recent_rotated / recent_scanned
> for balancing pressure between file and anon pages.
> 
> But if we pick page from lru we can either reclaim it or put it back to lru, thus:
> recent_scanned == recent_rotated[inactive] + recent_rotated[active] + reclaimed
> This can be called "The Law of Conservation of Memory" =)
> 
I'm sorry....where is the count for active->incative ?



> Thus recent_rotated counters for each lru list is enough, reclaimed pages can be
> counted as rotatation into inactive lru. After that reclaimer can use this ratio:
> recent_rotated[active] / (recent_rotated[active] + recent_rotated[inactive])
> 
> After this patch struct zone_reclaimer_stat has only one array: recent_rotated,
> which is directly indexed by lru list index.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

I'm sorry if I misunderstand..
Recent_scanned can be update by some logics other than vmscan..

For example, how lru_deactivate_fn() is handled ?

Thanks,
-Kame




> ---
>  include/linux/mmzone.h |   11 +++++------
>  mm/memcontrol.c        |   29 +++++++++++++++++------------
>  mm/page_alloc.c        |    6 ++----
>  mm/swap.c              |   26 ++++++++------------------
>  mm/vmscan.c            |   42 ++++++++++++++++++++++--------------------
>  5 files changed, 54 insertions(+), 60 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2fed935..fdcd683 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -137,12 +137,14 @@ enum lru_list {
>  	LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
>  	LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
>  	LRU_UNEVICTABLE,
> -	NR_LRU_LISTS
> +	NR_LRU_LISTS,
> +	NR_EVICTABLE_LRU_LISTS = LRU_UNEVICTABLE,
>  };
>  
>  #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)
>  
> -#define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++)
> +#define for_each_evictable_lru(lru) \
> +	for (lru = 0; lru < NR_EVICTABLE_LRU_LISTS; lru++)
>  
>  static inline int is_file_lru(enum lru_list lru)
>  {
> @@ -165,11 +167,8 @@ struct zone_reclaim_stat {
>  	 * mem/swap backed and file backed pages are refeferenced.
>  	 * The higher the rotated/scanned ratio, the more valuable
>  	 * that cache is.
> -	 *
> -	 * The anon LRU stats live in [0], file LRU stats in [1]
>  	 */
> -	unsigned long		recent_rotated[2];
> -	unsigned long		recent_scanned[2];
> +	unsigned long		recent_rotated[NR_EVICTABLE_LRU_LISTS];
>  };
>  
>  struct lruvec {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aeebb9e..2809531 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4189,26 +4189,31 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
>  
>  #ifdef CONFIG_DEBUG_VM
>  	{
> -		int nid, zid;
> +		int nid, zid, lru;
>  		struct mem_cgroup_per_zone *mz;
>  		struct zone_reclaim_stat *rstat;
> -		unsigned long recent_rotated[2] = {0, 0};
> -		unsigned long recent_scanned[2] = {0, 0};
> +		unsigned long recent_rotated[NR_EVICTABLE_LRU_LISTS];
>  
> +		memset(recent_rotated, 0, sizeof(recent_rotated));
>  		for_each_online_node(nid)
>  			for (zid = 0; zid < MAX_NR_ZONES; zid++) {
>  				mz = mem_cgroup_zoneinfo(memcg, nid, zid);
>  				rstat = &mz->lruvec.reclaim_stat;
> -
> -				recent_rotated[0] += rstat->recent_rotated[0];
> -				recent_rotated[1] += rstat->recent_rotated[1];
> -				recent_scanned[0] += rstat->recent_scanned[0];
> -				recent_scanned[1] += rstat->recent_scanned[1];
> +				for_each_evictable_lru(lru)
> +					recent_rotated[lru] +=
> +						rstat->recent_rotated[lru];
>  			}
> -		cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
> -		cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
> -		cb->fill(cb, "recent_scanned_anon", recent_scanned[0]);
> -		cb->fill(cb, "recent_scanned_file", recent_scanned[1]);
> +
> +		cb->fill(cb, "recent_rotated_anon",
> +				recent_rotated[LRU_ACTIVE_ANON]);
> +		cb->fill(cb, "recent_rotated_file",
> +				recent_rotated[LRU_ACTIVE_FILE]);
> +		cb->fill(cb, "recent_scanned_anon",
> +				recent_rotated[LRU_ACTIVE_ANON] +
> +				recent_rotated[LRU_INACTIVE_ANON]);
> +		cb->fill(cb, "recent_scanned_file",
> +				recent_rotated[LRU_ACTIVE_FILE] +
> +				recent_rotated[LRU_INACTIVE_FILE]);
>  	}
>  #endif
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ab2d210..ea40034 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4365,10 +4365,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  		zone_pcp_init(zone);
>  		for_each_lru(lru)
>  			INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
> -		zone->lruvec.reclaim_stat.recent_rotated[0] = 0;
> -		zone->lruvec.reclaim_stat.recent_rotated[1] = 0;
> -		zone->lruvec.reclaim_stat.recent_scanned[0] = 0;
> -		zone->lruvec.reclaim_stat.recent_scanned[1] = 0;
> +		memset(&zone->lruvec.reclaim_stat, 0,
> +				sizeof(struct zone_reclaim_stat));
>  		zap_zone_vm_stats(zone);
>  		zone->flags = 0;
>  		if (!size)
> diff --git a/mm/swap.c b/mm/swap.c
> index 9a6850b..c7bcde7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -277,7 +277,7 @@ void rotate_reclaimable_page(struct page *page)
>  }
>  
>  static void update_page_reclaim_stat(struct zone *zone, struct page *page,
> -				     int file, int rotated)
> +				     enum lru_list lru)
>  {
>  	struct zone_reclaim_stat *reclaim_stat;
>  
> @@ -285,9 +285,7 @@ static void update_page_reclaim_stat(struct zone *zone, struct page *page,
>  	if (!reclaim_stat)
>  		reclaim_stat = &zone->lruvec.reclaim_stat;
>  
> -	reclaim_stat->recent_scanned[file]++;
> -	if (rotated)
> -		reclaim_stat->recent_rotated[file]++;
> +	reclaim_stat->recent_rotated[lru]++;
>  }
>  
>  static void __activate_page(struct page *page, void *arg)
> @@ -295,7 +293,6 @@ static void __activate_page(struct page *page, void *arg)
>  	struct zone *zone = page_zone(page);
>  
>  	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
> -		int file = page_is_file_cache(page);
>  		int lru = page_lru_base_type(page);
>  		del_page_from_lru_list(zone, page, lru);
>  
> @@ -304,7 +301,7 @@ static void __activate_page(struct page *page, void *arg)
>  		add_page_to_lru_list(zone, page, lru);
>  		__count_vm_event(PGACTIVATE);
>  
> -		update_page_reclaim_stat(zone, page, file, 1);
> +		update_page_reclaim_stat(zone, page, lru);
>  	}
>  }
>  
> @@ -482,7 +479,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
>  
>  	if (active)
>  		__count_vm_event(PGDEACTIVATE);
> -	update_page_reclaim_stat(zone, page, file, 0);
> +	update_page_reclaim_stat(zone, page, lru);
>  }
>  
>  /*
> @@ -646,9 +643,7 @@ EXPORT_SYMBOL(__pagevec_release);
>  void lru_add_page_tail(struct zone* zone,
>  		       struct page *page, struct page *page_tail)
>  {
> -	int active;
>  	enum lru_list lru;
> -	const int file = 0;
>  
>  	VM_BUG_ON(!PageHead(page));
>  	VM_BUG_ON(PageCompound(page_tail));
> @@ -660,13 +655,10 @@ void lru_add_page_tail(struct zone* zone,
>  	if (page_evictable(page_tail, NULL)) {
>  		if (PageActive(page)) {
>  			SetPageActive(page_tail);
> -			active = 1;
>  			lru = LRU_ACTIVE_ANON;
> -		} else {
> -			active = 0;
> +		} else
>  			lru = LRU_INACTIVE_ANON;
> -		}
> -		update_page_reclaim_stat(zone, page_tail, file, active);
> +		update_page_reclaim_stat(zone, page_tail, lru);
>  	} else {
>  		SetPageUnevictable(page_tail);
>  		lru = LRU_UNEVICTABLE;
> @@ -694,17 +686,15 @@ static void __pagevec_lru_add_fn(struct page *page, void *arg)
>  {
>  	enum lru_list lru = (enum lru_list)arg;
>  	struct zone *zone = page_zone(page);
> -	int file = is_file_lru(lru);
> -	int active = is_active_lru(lru);
>  
>  	VM_BUG_ON(PageActive(page));
>  	VM_BUG_ON(PageUnevictable(page));
>  	VM_BUG_ON(PageLRU(page));
>  
>  	SetPageLRU(page);
> -	if (active)
> +	if (is_active_lru(lru))
>  		SetPageActive(page);
> -	update_page_reclaim_stat(zone, page, file, active);
> +	update_page_reclaim_stat(zone, page, lru);
>  	add_page_to_lru_list(zone, page, lru);
>  }
>  
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 483b98e..fe00a22 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1355,11 +1355,7 @@ putback_inactive_pages(struct mem_cgroup_zone *mz,
>  		SetPageLRU(page);
>  		lru = page_lru(page);
>  		add_page_to_lru_list(zone, page, lru);
> -		if (is_active_lru(lru)) {
> -			int file = is_file_lru(lru);
> -			int numpages = hpage_nr_pages(page);
> -			reclaim_stat->recent_rotated[file] += numpages;
> -		}
> +		reclaim_stat->recent_rotated[lru] += hpage_nr_pages(page);
>  		if (put_page_testzero(page)) {
>  			__ClearPageLRU(page);
>  			__ClearPageActive(page);
> @@ -1543,8 +1539,11 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>  
>  	spin_lock_irq(&zone->lru_lock);
>  
> -	reclaim_stat->recent_scanned[0] += nr_anon;
> -	reclaim_stat->recent_scanned[1] += nr_file;
> +	/*
> +	 * Count reclaimed pages as rotated, this helps balance scan pressure
> +	 * between file and anonymous pages in get_scan_ratio.
> +	 */
> +	reclaim_stat->recent_rotated[lru] += nr_reclaimed;
>  
>  	if (current_is_kswapd())
>  		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
> @@ -1685,8 +1684,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  	if (global_reclaim(sc))
>  		zone->pages_scanned += nr_scanned;
>  
> -	reclaim_stat->recent_scanned[file] += nr_taken;
> -
>  	__count_zone_vm_events(PGREFILL, zone, nr_scanned);
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken);
>  	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
> @@ -1742,7 +1739,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  	 * helps balance scan pressure between file and anonymous pages in
>  	 * get_scan_ratio.
>  	 */
> -	reclaim_stat->recent_rotated[file] += nr_rotated;
> +	reclaim_stat->recent_rotated[lru] += nr_rotated;
>  
>  	move_active_pages_to_lru(zone, &l_active, &l_hold, lru);
>  	move_active_pages_to_lru(zone, &l_inactive, &l_hold, lru - LRU_ACTIVE);
> @@ -1875,6 +1872,7 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
>  	unsigned long anon_prio, file_prio;
>  	unsigned long ap, fp;
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
> +	unsigned long *recent_rotated = reclaim_stat->recent_rotated;
>  	u64 fraction[2], denominator;
>  	enum lru_list lru;
>  	int noswap = 0;
> @@ -1940,14 +1938,16 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
>  	 * anon in [0], file in [1]
>  	 */
>  	spin_lock_irq(&mz->zone->lru_lock);
> -	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
> -		reclaim_stat->recent_scanned[0] /= 2;
> -		reclaim_stat->recent_rotated[0] /= 2;
> +	if (unlikely(recent_rotated[LRU_INACTIVE_ANON] +
> +		     recent_rotated[LRU_ACTIVE_ANON] > anon / 4)) {
> +		recent_rotated[LRU_INACTIVE_ANON] /= 2;
> +		recent_rotated[LRU_ACTIVE_ANON] /= 2;
>  	}
>  
> -	if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
> -		reclaim_stat->recent_scanned[1] /= 2;
> -		reclaim_stat->recent_rotated[1] /= 2;
> +	if (unlikely(recent_rotated[LRU_INACTIVE_FILE] +
> +		     recent_rotated[LRU_ACTIVE_FILE] > file / 4)) {
> +		recent_rotated[LRU_INACTIVE_FILE] /= 2;
> +		recent_rotated[LRU_ACTIVE_FILE] /= 2;
>  	}
>  
>  	/*
> @@ -1955,11 +1955,13 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
>  	 * proportional to the fraction of recently scanned pages on
>  	 * each list that were recently referenced and in active use.
>  	 */
> -	ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
> -	ap /= reclaim_stat->recent_rotated[0] + 1;
> +	ap = (anon_prio + 1) * (recent_rotated[LRU_INACTIVE_ANON] +
> +				recent_rotated[LRU_ACTIVE_ANON] + 1);
> +	ap /= recent_rotated[LRU_ACTIVE_ANON] + 1;
>  
> -	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
> -	fp /= reclaim_stat->recent_rotated[1] + 1;
> +	fp = (file_prio + 1) * (recent_rotated[LRU_INACTIVE_FILE] +
> +				recent_rotated[LRU_ACTIVE_FILE] + 1);
> +	fp /= recent_rotated[LRU_ACTIVE_FILE] + 1;
>  	spin_unlock_irq(&mz->zone->lru_lock);
>  
>  	fraction[0] = ap;
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/7] mm/memcg: rework inactive_ratio calculation
  2012-02-29  9:16 ` [PATCH 6/7] mm/memcg: rework inactive_ratio calculation Konstantin Khlebnikov
@ 2012-03-02  5:31   ` KAMEZAWA Hiroyuki
  2012-03-02  6:24     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  5:31 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Wed, 29 Feb 2012 13:16:00 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This patch removes precalculated zone->inactive_ratio.
> Now it always calculated in inactive_anon_is_low() from current lru size.
> After that we can merge memcg and non-memcg cases and drop duplicated code.
> 
> We can drop precalculated ratio, because its calculation fast enough to do it
> each time. Plus precalculation uses zone size as basis, this estimation not
> always match with page lru size, for example if a significant proportion
> of memory occupied by kernel objects.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Maybe good....but please don't change the user interface /proc/zoneinfo implicitly.
How about calculating inactive_ratio at reading /proc/zoneinfo ?

Thanks,
-Kame




> ---
>  include/linux/memcontrol.h |   16 --------
>  include/linux/mmzone.h     |    7 ----
>  mm/memcontrol.c            |   38 -------------------
>  mm/page_alloc.c            |   44 ----------------------
>  mm/vmscan.c                |   88 ++++++++++++++++++++++++++++----------------
>  mm/vmstat.c                |    6 +--
>  6 files changed, 58 insertions(+), 141 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e2e1fac..7e114f8 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -117,10 +117,6 @@ void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
>  /*
>   * For memory reclaim.
>   */
> -int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg,
> -				    struct zone *zone);
> -int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg,
> -				    struct zone *zone);
>  int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
>  unsigned long mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg,
>  					int nid, int zid, unsigned int lrumask);
> @@ -334,18 +330,6 @@ static inline bool mem_cgroup_disabled(void)
>  	return true;
>  }
>  
> -static inline int
> -mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
> -{
> -	return 1;
> -}
> -
> -static inline int
> -mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
> -{
> -	return 1;
> -}
> -
>  static inline unsigned long
>  mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
>  				unsigned int lru_mask)
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fdcd683..7edcf17 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -384,13 +384,6 @@ struct zone {
>  	/* Zone statistics */
>  	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
>  
> -	/*
> -	 * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
> -	 * this zone's LRU.  Maintained by the pageout code.
> -	 */
> -	unsigned int inactive_ratio;
> -
> -
>  	ZONE_PADDING(_pad2_)
>  	/* Rarely used or read-mostly fields */
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2809531..4bc6835 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1171,44 +1171,6 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *memcg)
>  	return ret;
>  }
>  
> -int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
> -{
> -	unsigned long inactive_ratio;
> -	int nid = zone_to_nid(zone);
> -	int zid = zone_idx(zone);
> -	unsigned long inactive;
> -	unsigned long active;
> -	unsigned long gb;
> -
> -	inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
> -						BIT(LRU_INACTIVE_ANON));
> -	active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
> -					      BIT(LRU_ACTIVE_ANON));
> -
> -	gb = (inactive + active) >> (30 - PAGE_SHIFT);
> -	if (gb)
> -		inactive_ratio = int_sqrt(10 * gb);
> -	else
> -		inactive_ratio = 1;
> -
> -	return inactive * inactive_ratio < active;
> -}
> -
> -int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
> -{
> -	unsigned long active;
> -	unsigned long inactive;
> -	int zid = zone_idx(zone);
> -	int nid = zone_to_nid(zone);
> -
> -	inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
> -						BIT(LRU_INACTIVE_FILE));
> -	active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
> -					      BIT(LRU_ACTIVE_FILE));
> -
> -	return (active > inactive);
> -}
> -
>  struct zone_reclaim_stat *
>  mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>  {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ea40034..2e90931 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5051,49 +5051,6 @@ void setup_per_zone_wmarks(void)
>  }
>  
>  /*
> - * The inactive anon list should be small enough that the VM never has to
> - * do too much work, but large enough that each inactive page has a chance
> - * to be referenced again before it is swapped out.
> - *
> - * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
> - * INACTIVE_ANON pages on this zone's LRU, maintained by the
> - * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
> - * the anonymous pages are kept on the inactive list.
> - *
> - * total     target    max
> - * memory    ratio     inactive anon
> - * -------------------------------------
> - *   10MB       1         5MB
> - *  100MB       1        50MB
> - *    1GB       3       250MB
> - *   10GB      10       0.9GB
> - *  100GB      31         3GB
> - *    1TB     101        10GB
> - *   10TB     320        32GB
> - */
> -static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
> -{
> -	unsigned int gb, ratio;
> -
> -	/* Zone size in gigabytes */
> -	gb = zone->present_pages >> (30 - PAGE_SHIFT);
> -	if (gb)
> -		ratio = int_sqrt(10 * gb);
> -	else
> -		ratio = 1;
> -
> -	zone->inactive_ratio = ratio;
> -}
> -
> -static void __meminit setup_per_zone_inactive_ratio(void)
> -{
> -	struct zone *zone;
> -
> -	for_each_zone(zone)
> -		calculate_zone_inactive_ratio(zone);
> -}
> -
> -/*
>   * Initialise min_free_kbytes.
>   *
>   * For small machines we want it small (128k min).  For large machines
> @@ -5131,7 +5088,6 @@ int __meminit init_per_zone_wmark_min(void)
>  	setup_per_zone_wmarks();
>  	refresh_zone_stat_thresholds();
>  	setup_per_zone_lowmem_reserve();
> -	setup_per_zone_inactive_ratio();
>  	return 0;
>  }
>  module_init(init_per_zone_wmark_min)
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fe00a22..ab447df 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1750,29 +1750,38 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  }
>  
>  #ifdef CONFIG_SWAP
> -static int inactive_anon_is_low_global(struct zone *zone)
> -{
> -	unsigned long active, inactive;
> -
> -	active = zone_page_state(zone, NR_ACTIVE_ANON);
> -	inactive = zone_page_state(zone, NR_INACTIVE_ANON);
> -
> -	if (inactive * zone->inactive_ratio < active)
> -		return 1;
> -
> -	return 0;
> -}
> -
>  /**
>   * inactive_anon_is_low - check if anonymous pages need to be deactivated
>   * @zone: zone to check
> - * @sc:   scan control of this context
>   *
>   * Returns true if the zone does not have enough inactive anon pages,
>   * meaning some active anon pages need to be deactivated.
> + *
> + * The inactive anon list should be small enough that the VM never has to
> + * do too much work, but large enough that each inactive page has a chance
> + * to be referenced again before it is swapped out.
> + *
> + * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
> + * INACTIVE_ANON pages on this zone's LRU, maintained by the
> + * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
> + * the anonymous pages are kept on the inactive list.
> + *
> + * total     target    max
> + * memory    ratio     inactive anon
> + * -------------------------------------
> + *   10MB       1         5MB
> + *  100MB       1        50MB
> + *    1GB       3       250MB
> + *   10GB      10       0.9GB
> + *  100GB      31         3GB
> + *    1TB     101        10GB
> + *   10TB     320        32GB
>   */
>  static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>  {
> +	unsigned long active, inactive;
> +	unsigned int gb, ratio;
> +
>  	/*
>  	 * If we don't have swap space, anonymous page deactivation
>  	 * is pointless.
> @@ -1780,11 +1789,26 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>  	if (!total_swap_pages)
>  		return 0;
>  
> -	if (!mem_cgroup_disabled())
> -		return mem_cgroup_inactive_anon_is_low(mz->mem_cgroup,
> -						       mz->zone);
> +	if (mem_cgroup_disabled()) {
> +		active = zone_page_state(mz->zone, NR_ACTIVE_ANON);
> +		inactive = zone_page_state(mz->zone, NR_INACTIVE_ANON);
> +	} else {
> +		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
> +				zone_to_nid(mz->zone), zone_idx(mz->zone),
> +				BIT(LRU_ACTIVE_ANON));
> +		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
> +				zone_to_nid(mz->zone), zone_idx(mz->zone),
> +				BIT(LRU_INACTIVE_ANON));
> +	}
> +
> +	/* Total size in gigabytes */
> +	gb = (active + inactive) >> (30 - PAGE_SHIFT);
> +	if (gb)
> +		ratio = int_sqrt(10 * gb);
> +	else
> +		ratio = 1;
>  
> -	return inactive_anon_is_low_global(mz->zone);
> +	return inactive * ratio < active;
>  }
>  #else
>  static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
> @@ -1793,16 +1817,6 @@ static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>  }
>  #endif
>  
> -static int inactive_file_is_low_global(struct zone *zone)
> -{
> -	unsigned long active, inactive;
> -
> -	active = zone_page_state(zone, NR_ACTIVE_FILE);
> -	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> -
> -	return (active > inactive);
> -}
> -
>  /**
>   * inactive_file_is_low - check if file pages need to be deactivated
>   * @mz: memory cgroup and zone to check
> @@ -1819,11 +1833,21 @@ static int inactive_file_is_low_global(struct zone *zone)
>   */
>  static int inactive_file_is_low(struct mem_cgroup_zone *mz)
>  {
> -	if (!mem_cgroup_disabled())
> -		return mem_cgroup_inactive_file_is_low(mz->mem_cgroup,
> -						       mz->zone);
> +	unsigned long active, inactive;
> +
> +	if (mem_cgroup_disabled()) {
> +		active = zone_page_state(mz->zone, NR_ACTIVE_FILE);
> +		inactive = zone_page_state(mz->zone, NR_INACTIVE_FILE);
> +	} else {
> +		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
> +				zone_to_nid(mz->zone), zone_idx(mz->zone),
> +				BIT(LRU_ACTIVE_FILE));
> +		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
> +				zone_to_nid(mz->zone), zone_idx(mz->zone),
> +				BIT(LRU_INACTIVE_FILE));
> +	}
>  
> -	return inactive_file_is_low_global(mz->zone);
> +	return inactive < active;
>  }
>  
>  static int inactive_list_is_low(struct mem_cgroup_zone *mz, int file)
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index f600557..2c813e1 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1017,11 +1017,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  	}
>  	seq_printf(m,
>  		   "\n  all_unreclaimable: %u"
> -		   "\n  start_pfn:         %lu"
> -		   "\n  inactive_ratio:    %u",
> +		   "\n  start_pfn:         %lu",
>  		   zone->all_unreclaimable,
> -		   zone->zone_start_pfn,
> -		   zone->inactive_ratio);
> +		   zone->zone_start_pfn);
>  	seq_putc(m, '\n');
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/7] mm/memcg: use vm_swappiness from target memory cgroup
  2012-02-29  9:16 ` [PATCH 7/7] mm/memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
@ 2012-03-02  5:32   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  5:32 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Wed, 29 Feb 2012 13:16:04 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Use vm_swappiness from memory cgroup which is triggered this memory reclaim.
> This is more reasonable and allows to kill one argument.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

seems reasonable to me.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-03-02  5:17   ` KAMEZAWA Hiroyuki
@ 2012-03-02  5:51     ` Konstantin Khlebnikov
  2012-03-02  8:17       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-02  5:51 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Feb 2012 13:15:47 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> This patch adds file/anon filter bits into isolate_mode_t,
>> this allows to simplify checks in __isolate_lru_page().
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> Hmm.. I like idea but..
>
>> ---
>>   include/linux/mmzone.h |    4 ++++
>>   include/linux/swap.h   |    2 +-
>>   mm/compaction.c        |    5 +++--
>>   mm/vmscan.c            |   27 +++++++++++++--------------
>>   4 files changed, 21 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index eff4918..2fed935 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -193,6 +193,10 @@ struct lruvec {
>>   #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>>   /* Isolate for asynchronous migration */
>>   #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
>> +/* Isolate swap-backed pages */
>> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
>> +/* Isolate file-backed pages */
>> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
>>
>>   /* LRU Isolation modes. */
>>   typedef unsigned __bitwise__ isolate_mode_t;
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index ba2c8d7..dc6e6a3 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
>>   /* linux/mm/vmscan.c */
>>   extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>>   					gfp_t gfp_mask, nodemask_t *mask);
>> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
>> +extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
>>   extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
>>   						  gfp_t gfp_mask, bool noswap);
>>   extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 74a8c82..cc054f7 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>   	unsigned long last_pageblock_nr = 0, pageblock_nr;
>>   	unsigned long nr_scanned = 0, nr_isolated = 0;
>>   	struct list_head *migratelist =&cc->migratepages;
>> -	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
>> +	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
>> +			      ISOLATE_FILE | ISOLATE_ANON;
>>
>>   	/* Do not scan outside zone boundaries */
>>   	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
>> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>   			mode |= ISOLATE_ASYNC_MIGRATE;
>>
>>   		/* Try isolate the page */
>> -		if (__isolate_lru_page(page, mode, 0) != 0)
>> +		if (__isolate_lru_page(page, mode) != 0)
>>   			continue;
>>
>>   		VM_BUG_ON(PageTransCompound(page));
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index af6cfe7..1b70338 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1029,27 +1029,18 @@ keep_lumpy:
>>    *
>>    * returns 0 on success, -ve errno on failure.
>>    */
>> -int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>> +int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>>   {
>> -	bool all_lru_mode;
>>   	int ret = -EINVAL;
>>
>>   	/* Only take pages on the LRU. */
>>   	if (!PageLRU(page))
>>   		return ret;
>>
>> -	all_lru_mode = (mode&  (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
>> -		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
>> -
>> -	/*
>> -	 * When checking the active state, we need to be sure we are
>> -	 * dealing with comparible boolean values.  Take the logical not
>> -	 * of each.
>> -	 */
>> -	if (!all_lru_mode&&  !PageActive(page) != !(mode&  ISOLATE_ACTIVE))
>> +	if (!(mode&  (PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE)))
>>   		return ret;
>
> Isn't this complicated ?

But it doesn't blows my mind as old code does =)

Maybe someone can propose more clear variant?

>
>>
>> -	if (!all_lru_mode&&  !!page_is_file_cache(page) != file)
>> +	if (!(mode&  (page_is_file_cache(page) ? ISOLATE_FILE : ISOLATE_ANON)))
>>   		return ret;
>>
>>   	/*
>
> ditto.
>
> Where is simple  ?
>
> Thanks,
> -Kame
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/7] mm: rework reclaim_stat counters
  2012-03-02  5:28   ` KAMEZAWA Hiroyuki
@ 2012-03-02  6:11     ` Konstantin Khlebnikov
  2012-03-02  8:03       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-02  6:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Feb 2012 13:15:56 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Currently there is two types of reclaim-stat counters:
>> recent_scanned (pages picked from from lru),
>> recent_rotated (pages putted back to active lru).
>> Reclaimer uses ratio recent_rotated / recent_scanned
>> for balancing pressure between file and anon pages.
>>
>> But if we pick page from lru we can either reclaim it or put it back to lru, thus:
>> recent_scanned == recent_rotated[inactive] + recent_rotated[active] + reclaimed
>> This can be called "The Law of Conservation of Memory" =)
>>
> I'm sorry....where is the count for active->incative ?

If reclaimer deactivates page it will bump recent_rotated[LRU_INACTIVE_ANON/FILE],
(if I understand your question right) recent_rotated[] now count each evictable lru independently

>
>
>
>> Thus recent_rotated counters for each lru list is enough, reclaimed pages can be
>> counted as rotatation into inactive lru. After that reclaimer can use this ratio:
>> recent_rotated[active] / (recent_rotated[active] + recent_rotated[inactive])
>>
>> After this patch struct zone_reclaimer_stat has only one array: recent_rotated,
>> which is directly indexed by lru list index.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> I'm sorry if I misunderstand..
> Recent_scanned can be update by some logics other than vmscan..
>
> For example, how lru_deactivate_fn() is handled ?

It increase recent_rotated[LRU_INACTIVE_ANON/LRU_INACTIVE_FILE]

>
> Thanks,
> -Kame
>
>
>
>
>> ---
>>   include/linux/mmzone.h |   11 +++++------
>>   mm/memcontrol.c        |   29 +++++++++++++++++------------
>>   mm/page_alloc.c        |    6 ++----
>>   mm/swap.c              |   26 ++++++++------------------
>>   mm/vmscan.c            |   42 ++++++++++++++++++++++--------------------
>>   5 files changed, 54 insertions(+), 60 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 2fed935..fdcd683 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -137,12 +137,14 @@ enum lru_list {
>>        LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
>>        LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
>>        LRU_UNEVICTABLE,
>> -     NR_LRU_LISTS
>> +     NR_LRU_LISTS,
>> +     NR_EVICTABLE_LRU_LISTS = LRU_UNEVICTABLE,
>>   };
>>
>>   #define for_each_lru(lru) for (lru = 0; lru<  NR_LRU_LISTS; lru++)
>>
>> -#define for_each_evictable_lru(lru) for (lru = 0; lru<= LRU_ACTIVE_FILE; lru++)
>> +#define for_each_evictable_lru(lru) \
>> +     for (lru = 0; lru<  NR_EVICTABLE_LRU_LISTS; lru++)
>>
>>   static inline int is_file_lru(enum lru_list lru)
>>   {
>> @@ -165,11 +167,8 @@ struct zone_reclaim_stat {
>>         * mem/swap backed and file backed pages are refeferenced.
>>         * The higher the rotated/scanned ratio, the more valuable
>>         * that cache is.
>> -      *
>> -      * The anon LRU stats live in [0], file LRU stats in [1]
>>         */
>> -     unsigned long           recent_rotated[2];
>> -     unsigned long           recent_scanned[2];
>> +     unsigned long           recent_rotated[NR_EVICTABLE_LRU_LISTS];
>>   };
>>
>>   struct lruvec {
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index aeebb9e..2809531 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -4189,26 +4189,31 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
>>
>>   #ifdef CONFIG_DEBUG_VM
>>        {
>> -             int nid, zid;
>> +             int nid, zid, lru;
>>                struct mem_cgroup_per_zone *mz;
>>                struct zone_reclaim_stat *rstat;
>> -             unsigned long recent_rotated[2] = {0, 0};
>> -             unsigned long recent_scanned[2] = {0, 0};
>> +             unsigned long recent_rotated[NR_EVICTABLE_LRU_LISTS];
>>
>> +             memset(recent_rotated, 0, sizeof(recent_rotated));
>>                for_each_online_node(nid)
>>                        for (zid = 0; zid<  MAX_NR_ZONES; zid++) {
>>                                mz = mem_cgroup_zoneinfo(memcg, nid, zid);
>>                                rstat =&mz->lruvec.reclaim_stat;
>> -
>> -                             recent_rotated[0] += rstat->recent_rotated[0];
>> -                             recent_rotated[1] += rstat->recent_rotated[1];
>> -                             recent_scanned[0] += rstat->recent_scanned[0];
>> -                             recent_scanned[1] += rstat->recent_scanned[1];
>> +                             for_each_evictable_lru(lru)
>> +                                     recent_rotated[lru] +=
>> +                                             rstat->recent_rotated[lru];
>>                        }
>> -             cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
>> -             cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
>> -             cb->fill(cb, "recent_scanned_anon", recent_scanned[0]);
>> -             cb->fill(cb, "recent_scanned_file", recent_scanned[1]);
>> +
>> +             cb->fill(cb, "recent_rotated_anon",
>> +                             recent_rotated[LRU_ACTIVE_ANON]);
>> +             cb->fill(cb, "recent_rotated_file",
>> +                             recent_rotated[LRU_ACTIVE_FILE]);
>> +             cb->fill(cb, "recent_scanned_anon",
>> +                             recent_rotated[LRU_ACTIVE_ANON] +
>> +                             recent_rotated[LRU_INACTIVE_ANON]);
>> +             cb->fill(cb, "recent_scanned_file",
>> +                             recent_rotated[LRU_ACTIVE_FILE] +
>> +                             recent_rotated[LRU_INACTIVE_FILE]);
>>        }
>>   #endif
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ab2d210..ea40034 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4365,10 +4365,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>>                zone_pcp_init(zone);
>>                for_each_lru(lru)
>>                        INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
>> -             zone->lruvec.reclaim_stat.recent_rotated[0] = 0;
>> -             zone->lruvec.reclaim_stat.recent_rotated[1] = 0;
>> -             zone->lruvec.reclaim_stat.recent_scanned[0] = 0;
>> -             zone->lruvec.reclaim_stat.recent_scanned[1] = 0;
>> +             memset(&zone->lruvec.reclaim_stat, 0,
>> +                             sizeof(struct zone_reclaim_stat));
>>                zap_zone_vm_stats(zone);
>>                zone->flags = 0;
>>                if (!size)
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 9a6850b..c7bcde7 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -277,7 +277,7 @@ void rotate_reclaimable_page(struct page *page)
>>   }
>>
>>   static void update_page_reclaim_stat(struct zone *zone, struct page *page,
>> -                                  int file, int rotated)
>> +                                  enum lru_list lru)
>>   {
>>        struct zone_reclaim_stat *reclaim_stat;
>>
>> @@ -285,9 +285,7 @@ static void update_page_reclaim_stat(struct zone *zone, struct page *page,
>>        if (!reclaim_stat)
>>                reclaim_stat =&zone->lruvec.reclaim_stat;
>>
>> -     reclaim_stat->recent_scanned[file]++;
>> -     if (rotated)
>> -             reclaim_stat->recent_rotated[file]++;
>> +     reclaim_stat->recent_rotated[lru]++;
>>   }
>>
>>   static void __activate_page(struct page *page, void *arg)
>> @@ -295,7 +293,6 @@ static void __activate_page(struct page *page, void *arg)
>>        struct zone *zone = page_zone(page);
>>
>>        if (PageLRU(page)&&  !PageActive(page)&&  !PageUnevictable(page)) {
>> -             int file = page_is_file_cache(page);
>>                int lru = page_lru_base_type(page);
>>                del_page_from_lru_list(zone, page, lru);
>>
>> @@ -304,7 +301,7 @@ static void __activate_page(struct page *page, void *arg)
>>                add_page_to_lru_list(zone, page, lru);
>>                __count_vm_event(PGACTIVATE);
>>
>> -             update_page_reclaim_stat(zone, page, file, 1);
>> +             update_page_reclaim_stat(zone, page, lru);
>>        }
>>   }
>>
>> @@ -482,7 +479,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
>>
>>        if (active)
>>                __count_vm_event(PGDEACTIVATE);
>> -     update_page_reclaim_stat(zone, page, file, 0);
>> +     update_page_reclaim_stat(zone, page, lru);
>>   }
>>
>>   /*
>> @@ -646,9 +643,7 @@ EXPORT_SYMBOL(__pagevec_release);
>>   void lru_add_page_tail(struct zone* zone,
>>                       struct page *page, struct page *page_tail)
>>   {
>> -     int active;
>>        enum lru_list lru;
>> -     const int file = 0;
>>
>>        VM_BUG_ON(!PageHead(page));
>>        VM_BUG_ON(PageCompound(page_tail));
>> @@ -660,13 +655,10 @@ void lru_add_page_tail(struct zone* zone,
>>        if (page_evictable(page_tail, NULL)) {
>>                if (PageActive(page)) {
>>                        SetPageActive(page_tail);
>> -                     active = 1;
>>                        lru = LRU_ACTIVE_ANON;
>> -             } else {
>> -                     active = 0;
>> +             } else
>>                        lru = LRU_INACTIVE_ANON;
>> -             }
>> -             update_page_reclaim_stat(zone, page_tail, file, active);
>> +             update_page_reclaim_stat(zone, page_tail, lru);
>>        } else {
>>                SetPageUnevictable(page_tail);
>>                lru = LRU_UNEVICTABLE;
>> @@ -694,17 +686,15 @@ static void __pagevec_lru_add_fn(struct page *page, void *arg)
>>   {
>>        enum lru_list lru = (enum lru_list)arg;
>>        struct zone *zone = page_zone(page);
>> -     int file = is_file_lru(lru);
>> -     int active = is_active_lru(lru);
>>
>>        VM_BUG_ON(PageActive(page));
>>        VM_BUG_ON(PageUnevictable(page));
>>        VM_BUG_ON(PageLRU(page));
>>
>>        SetPageLRU(page);
>> -     if (active)
>> +     if (is_active_lru(lru))
>>                SetPageActive(page);
>> -     update_page_reclaim_stat(zone, page, file, active);
>> +     update_page_reclaim_stat(zone, page, lru);
>>        add_page_to_lru_list(zone, page, lru);
>>   }
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 483b98e..fe00a22 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1355,11 +1355,7 @@ putback_inactive_pages(struct mem_cgroup_zone *mz,
>>                SetPageLRU(page);
>>                lru = page_lru(page);
>>                add_page_to_lru_list(zone, page, lru);
>> -             if (is_active_lru(lru)) {
>> -                     int file = is_file_lru(lru);
>> -                     int numpages = hpage_nr_pages(page);
>> -                     reclaim_stat->recent_rotated[file] += numpages;
>> -             }
>> +             reclaim_stat->recent_rotated[lru] += hpage_nr_pages(page);
>>                if (put_page_testzero(page)) {
>>                        __ClearPageLRU(page);
>>                        __ClearPageActive(page);
>> @@ -1543,8 +1539,11 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>>
>>        spin_lock_irq(&zone->lru_lock);
>>
>> -     reclaim_stat->recent_scanned[0] += nr_anon;
>> -     reclaim_stat->recent_scanned[1] += nr_file;
>> +     /*
>> +      * Count reclaimed pages as rotated, this helps balance scan pressure
>> +      * between file and anonymous pages in get_scan_ratio.
>> +      */
>> +     reclaim_stat->recent_rotated[lru] += nr_reclaimed;
>>
>>        if (current_is_kswapd())
>>                __count_vm_events(KSWAPD_STEAL, nr_reclaimed);
>> @@ -1685,8 +1684,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
>>        if (global_reclaim(sc))
>>                zone->pages_scanned += nr_scanned;
>>
>> -     reclaim_stat->recent_scanned[file] += nr_taken;
>> -
>>        __count_zone_vm_events(PGREFILL, zone, nr_scanned);
>>        __mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken);
>>        __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
>> @@ -1742,7 +1739,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
>>         * helps balance scan pressure between file and anonymous pages in
>>         * get_scan_ratio.
>>         */
>> -     reclaim_stat->recent_rotated[file] += nr_rotated;
>> +     reclaim_stat->recent_rotated[lru] += nr_rotated;
>>
>>        move_active_pages_to_lru(zone,&l_active,&l_hold, lru);
>>        move_active_pages_to_lru(zone,&l_inactive,&l_hold, lru - LRU_ACTIVE);
>> @@ -1875,6 +1872,7 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
>>        unsigned long anon_prio, file_prio;
>>        unsigned long ap, fp;
>>        struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
>> +     unsigned long *recent_rotated = reclaim_stat->recent_rotated;
>>        u64 fraction[2], denominator;
>>        enum lru_list lru;
>>        int noswap = 0;
>> @@ -1940,14 +1938,16 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
>>         * anon in [0], file in [1]
>>         */
>>        spin_lock_irq(&mz->zone->lru_lock);
>> -     if (unlikely(reclaim_stat->recent_scanned[0]>  anon / 4)) {
>> -             reclaim_stat->recent_scanned[0] /= 2;
>> -             reclaim_stat->recent_rotated[0] /= 2;
>> +     if (unlikely(recent_rotated[LRU_INACTIVE_ANON] +
>> +                  recent_rotated[LRU_ACTIVE_ANON]>  anon / 4)) {
>> +             recent_rotated[LRU_INACTIVE_ANON] /= 2;
>> +             recent_rotated[LRU_ACTIVE_ANON] /= 2;
>>        }
>>
>> -     if (unlikely(reclaim_stat->recent_scanned[1]>  file / 4)) {
>> -             reclaim_stat->recent_scanned[1] /= 2;
>> -             reclaim_stat->recent_rotated[1] /= 2;
>> +     if (unlikely(recent_rotated[LRU_INACTIVE_FILE] +
>> +                  recent_rotated[LRU_ACTIVE_FILE]>  file / 4)) {
>> +             recent_rotated[LRU_INACTIVE_FILE] /= 2;
>> +             recent_rotated[LRU_ACTIVE_FILE] /= 2;
>>        }
>>
>>        /*
>> @@ -1955,11 +1955,13 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
>>         * proportional to the fraction of recently scanned pages on
>>         * each list that were recently referenced and in active use.
>>         */
>> -     ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
>> -     ap /= reclaim_stat->recent_rotated[0] + 1;
>> +     ap = (anon_prio + 1) * (recent_rotated[LRU_INACTIVE_ANON] +
>> +                             recent_rotated[LRU_ACTIVE_ANON] + 1);
>> +     ap /= recent_rotated[LRU_ACTIVE_ANON] + 1;
>>
>> -     fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>> -     fp /= reclaim_stat->recent_rotated[1] + 1;
>> +     fp = (file_prio + 1) * (recent_rotated[LRU_INACTIVE_FILE] +
>> +                             recent_rotated[LRU_ACTIVE_FILE] + 1);
>> +     fp /= recent_rotated[LRU_ACTIVE_FILE] + 1;
>>        spin_unlock_irq(&mz->zone->lru_lock);
>>
>>        fraction[0] = ap;
>>
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/7] mm/memcg: rework inactive_ratio calculation
  2012-03-02  5:31   ` KAMEZAWA Hiroyuki
@ 2012-03-02  6:24     ` Konstantin Khlebnikov
  2012-03-08  5:36       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-02  6:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Feb 2012 13:16:00 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> This patch removes precalculated zone->inactive_ratio.
>> Now it always calculated in inactive_anon_is_low() from current lru size.
>> After that we can merge memcg and non-memcg cases and drop duplicated code.
>>
>> We can drop precalculated ratio, because its calculation fast enough to do it
>> each time. Plus precalculation uses zone size as basis, this estimation not
>> always match with page lru size, for example if a significant proportion
>> of memory occupied by kernel objects.om memory cgroup which is triggered this memory reclaim.
This is more reason
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> Maybe good....but please don't change the user interface /proc/zoneinfo implicitly.
> How about calculating inactive_ratio at reading /proc/zoneinfo ?

I don't know... Anybody need this?
Plus now it work in per-lruvec manner, why we should show it per-zone?

This field was introduced not long ago, in v2.6.27-5589-g556adec
For example prev_priority was there before v2.6.12 and was killed in v2.6.35-5854-g25edde0

>
> Thanks,
> -Kame
>
>
>
>
>> ---
>>   include/linux/memcontrol.h |   16 --------
>>   include/linux/mmzone.h     |    7 ----
>>   mm/memcontrol.c            |   38 -------------------
>>   mm/page_alloc.c            |   44 ----------------------
>>   mm/vmscan.c                |   88 ++++++++++++++++++++++++++++----------------
>>   mm/vmstat.c                |    6 +--
>>   6 files changed, 58 insertions(+), 141 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index e2e1fac..7e114f8 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -117,10 +117,6 @@ void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
>>   /*
>>    * For memory reclaim.
>>    */
>> -int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg,
>> -                                 struct zone *zone);
>> -int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg,
>> -                                 struct zone *zone);
>>   int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
>>   unsigned long mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg,
>>                                        int nid, int zid, unsigned int lrumask);
>> @@ -334,18 +330,6 @@ static inline bool mem_cgroup_disabled(void)
>>        return true;
>>   }
>>
>> -static inline int
>> -mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
>> -{
>> -     return 1;
>> -}
>> -
>> -static inline int
>> -mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
>> -{
>> -     return 1;
>> -}
>> -
>>   static inline unsigned long
>>   mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
>>                                unsigned int lru_mask)
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fdcd683..7edcf17 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -384,13 +384,6 @@ struct zone {
>>        /* Zone statistics */
>>        atomic_long_t           vm_stat[NR_VM_ZONE_STAT_ITEMS];
>>
>> -     /*
>> -      * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
>> -      * this zone's LRU.  Maintained by the pageout code.
>> -      */
>> -     unsigned int inactive_ratio;
>> -
>> -
>>        ZONE_PADDING(_pad2_)
>>        /* Rarely used or read-mostly fields */
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 2809531..4bc6835 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1171,44 +1171,6 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *memcg)
>>        return ret;
>>   }
>>
>> -int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
>> -{
>> -     unsigned long inactive_ratio;
>> -     int nid = zone_to_nid(zone);
>> -     int zid = zone_idx(zone);
>> -     unsigned long inactive;
>> -     unsigned long active;
>> -     unsigned long gb;
>> -
>> -     inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
>> -                                             BIT(LRU_INACTIVE_ANON));
>> -     active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
>> -                                           BIT(LRU_ACTIVE_ANON));
>> -
>> -     gb = (inactive + active)>>  (30 - PAGE_SHIFT);
>> -     if (gb)
>> -             inactive_ratio = int_sqrt(10 * gb);
>> -     else
>> -             inactive_ratio = 1;
>> -
>> -     return inactive * inactive_ratio<  active;
>> -}
>> -
>> -int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
>> -{
>> -     unsigned long active;
>> -     unsigned long inactive;
>> -     int zid = zone_idx(zone);
>> -     int nid = zone_to_nid(zone);
>> -
>> -     inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
>> -                                             BIT(LRU_INACTIVE_FILE));
>> -     active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
>> -                                           BIT(LRU_ACTIVE_FILE));
>> -
>> -     return (active>  inactive);
>> -}
>> -
>>   struct zone_reclaim_stat *
>>   mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>>   {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ea40034..2e90931 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5051,49 +5051,6 @@ void setup_per_zone_wmarks(void)
>>   }
>>
>>   /*
>> - * The inactive anon list should be small enough that the VM never has to
>> - * do too much work, but large enough that each inactive page has a chance
>> - * to be referenced again before it is swapped out.
>> - *
>> - * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
>> - * INACTIVE_ANON pages on this zone's LRU, maintained by the
>> - * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
>> - * the anonymous pages are kept on the inactive list.
>> - *
>> - * total     target    max
>> - * memory    ratio     inactive anon
>> - * -------------------------------------
>> - *   10MB       1         5MB
>> - *  100MB       1        50MB
>> - *    1GB       3       250MB
>> - *   10GB      10       0.9GB
>> - *  100GB      31         3GB
>> - *    1TB     101        10GB
>> - *   10TB     320        32GB
>> - */
>> -static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
>> -{
>> -     unsigned int gb, ratio;
>> -
>> -     /* Zone size in gigabytes */
>> -     gb = zone->present_pages>>  (30 - PAGE_SHIFT);
>> -     if (gb)
>> -             ratio = int_sqrt(10 * gb);
>> -     else
>> -             ratio = 1;
>> -
>> -     zone->inactive_ratio = ratio;
>> -}
>> -
>> -static void __meminit setup_per_zone_inactive_ratio(void)
>> -{
>> -     struct zone *zone;
>> -
>> -     for_each_zone(zone)
>> -             calculate_zone_inactive_ratio(zone);
>> -}
>> -
>> -/*
>>    * Initialise min_free_kbytes.
>>    *
>>    * For small machines we want it small (128k min).  For large machines
>> @@ -5131,7 +5088,6 @@ int __meminit init_per_zone_wmark_min(void)
>>        setup_per_zone_wmarks();
>>        refresh_zone_stat_thresholds();
>>        setup_per_zone_lowmem_reserve();
>> -     setup_per_zone_inactive_ratio();
>>        return 0;
>>   }
>>   module_init(init_per_zone_wmark_min)
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index fe00a22..ab447df 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1750,29 +1750,38 @@ static void shrink_active_list(unsigned long nr_to_scan,
>>   }
>>
>>   #ifdef CONFIG_SWAP
>> -static int inactive_anon_is_low_global(struct zone *zone)
>> -{
>> -     unsigned long active, inactive;
>> -
>> -     active = zone_page_state(zone, NR_ACTIVE_ANON);
>> -     inactive = zone_page_state(zone, NR_INACTIVE_ANON);
>> -
>> -     if (inactive * zone->inactive_ratio<  active)
>> -             return 1;
>> -
>> -     return 0;
>> -}
>> -
>>   /**
>>    * inactive_anon_is_low - check if anonymous pages need to be deactivated
>>    * @zone: zone to check
>> - * @sc:   scan control of this context
>>    *
>>    * Returns true if the zone does not have enough inactive anon pages,
>>    * meaning some active anon pages need to be deactivated.
>> + *
>> + * The inactive anon list should be small enough that the VM never has to
>> + * do too much work, but large enough that each inactive page has a chance
>> + * to be referenced again before it is swapped out.
>> + *
>> + * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
>> + * INACTIVE_ANON pages on this zone's LRU, maintained by the
>> + * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
>> + * the anonymous pages are kept on the inactive list.
>> + *
>> + * total     target    max
>> + * memory    ratio     inactive anon
>> + * -------------------------------------
>> + *   10MB       1         5MB
>> + *  100MB       1        50MB
>> + *    1GB       3       250MB
>> + *   10GB      10       0.9GB
>> + *  100GB      31         3GB
>> + *    1TB     101        10GB
>> + *   10TB     320        32GB
>>    */
>>   static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>>   {
>> +     unsigned long active, inactive;
>> +     unsigned int gb, ratio;
>> +
>>        /*
>>         * If we don't have swap space, anonymous page deactivation
>>         * is pointless.
>> @@ -1780,11 +1789,26 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>>        if (!total_swap_pages)
>>                return 0;
>>
>> -     if (!mem_cgroup_disabled())
>> -             return mem_cgroup_inactive_anon_is_low(mz->mem_cgroup,
>> -                                                    mz->zone);
>> +     if (mem_cgroup_disabled()) {
>> +             active = zone_page_state(mz->zone, NR_ACTIVE_ANON);
>> +             inactive = zone_page_state(mz->zone, NR_INACTIVE_ANON);
>> +     } else {
>> +             active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
>> +                             zone_to_nid(mz->zone), zone_idx(mz->zone),
>> +                             BIT(LRU_ACTIVE_ANON));
>> +             inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
>> +                             zone_to_nid(mz->zone), zone_idx(mz->zone),
>> +                             BIT(LRU_INACTIVE_ANON));
>> +     }
>> +
>> +     /* Total size in gigabytes */
>> +     gb = (active + inactive)>>  (30 - PAGE_SHIFT);
>> +     if (gb)
>> +             ratio = int_sqrt(10 * gb);
>> +     else
>> +             ratio = 1;
>>
>> -     return inactive_anon_is_low_global(mz->zone);
>> +     return inactive * ratio<  active;
>>   }
>>   #else
>>   static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>> @@ -1793,16 +1817,6 @@ static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
>>   }
>>   #endif
>>
>> -static int inactive_file_is_low_global(struct zone *zone)
>> -{
>> -     unsigned long active, inactive;
>> -
>> -     active = zone_page_state(zone, NR_ACTIVE_FILE);
>> -     inactive = zone_page_state(zone, NR_INACTIVE_FILE);
>> -
>> -     return (active>  inactive);
>> -}
>> -
>>   /**
>>    * inactive_file_is_low - check if file pages need to be deactivated
>>    * @mz: memory cgroup and zone to check
>> @@ -1819,11 +1833,21 @@ static int inactive_file_is_low_global(struct zone *zone)
>>    */
>>   static int inactive_file_is_low(struct mem_cgroup_zone *mz)
>>   {
>> -     if (!mem_cgroup_disabled())
>> -             return mem_cgroup_inactive_file_is_low(mz->mem_cgroup,
>> -                                                    mz->zone);
>> +     unsigned long active, inactive;
>> +
>> +     if (mem_cgroup_disabled()) {
>> +             active = zone_page_state(mz->zone, NR_ACTIVE_FILE);
>> +             inactive = zone_page_state(mz->zone, NR_INACTIVE_FILE);
>> +     } else {
>> +             active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
>> +                             zone_to_nid(mz->zone), zone_idx(mz->zone),
>> +                             BIT(LRU_ACTIVE_FILE));
>> +             inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
>> +                             zone_to_nid(mz->zone), zone_idx(mz->zone),
>> +                             BIT(LRU_INACTIVE_FILE));
>> +     }
>>
>> -     return inactive_file_is_low_global(mz->zone);
>> +     return inactive<  active;
>>   }
>>
>>   static int inactive_list_is_low(struct mem_cgroup_zone *mz, int file)
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index f600557..2c813e1 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1017,11 +1017,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>        }
>>        seq_printf(m,
>>                   "\n  all_unreclaimable: %u"
>> -                "\n  start_pfn:         %lu"
>> -                "\n  inactive_ratio:    %u",
>> +                "\n  start_pfn:         %lu",
>>                   zone->all_unreclaimable,
>> -                zone->zone_start_pfn,
>> -                zone->inactive_ratio);
>> +                zone->zone_start_pfn);
>>        seq_putc(m, '\n');
>>   }
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/7] mm: rework reclaim_stat counters
  2012-03-02  6:11     ` Konstantin Khlebnikov
@ 2012-03-02  8:03       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  8:03 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Fri, 02 Mar 2012 10:11:18 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 29 Feb 2012 13:15:56 +0400
> > Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
> >
> >> Currently there is two types of reclaim-stat counters:
> >> recent_scanned (pages picked from from lru),
> >> recent_rotated (pages putted back to active lru).
> >> Reclaimer uses ratio recent_rotated / recent_scanned
> >> for balancing pressure between file and anon pages.
> >>
> >> But if we pick page from lru we can either reclaim it or put it back to lru, thus:
> >> recent_scanned == recent_rotated[inactive] + recent_rotated[active] + reclaimed
> >> This can be called "The Law of Conservation of Memory" =)
> >>
> > I'm sorry....where is the count for active->incative ?
> 
> If reclaimer deactivates page it will bump recent_rotated[LRU_INACTIVE_ANON/FILE],
> (if I understand your question right) recent_rotated[] now count each evictable lru independently
> 

Hm, then

	active -> active   : recent_rotated[active]   += 1 
	active -> inactive : recent_rotated[inacitve] += 1
	inactive->inactive : recent_rotated[inactive] += 1
	inactive->active   : recent_rotated[active]   += 1 ?

Ok, it seems rotated[active] + rotated[inactive] == scan.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-03-02  5:51     ` Konstantin Khlebnikov
@ 2012-03-02  8:17       ` KAMEZAWA Hiroyuki
  2012-03-02  8:53         ` Konstantin Khlebnikov
  2012-03-06 11:57         ` Glauber Costa
  0 siblings, 2 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-02  8:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Fri, 02 Mar 2012 09:51:27 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 29 Feb 2012 13:15:47 +0400
> > Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
> >
> >> This patch adds file/anon filter bits into isolate_mode_t,
> >> this allows to simplify checks in __isolate_lru_page().
> >>
> >> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> >
> > Hmm.. I like idea but..
> >
> >> ---
> >>   include/linux/mmzone.h |    4 ++++
> >>   include/linux/swap.h   |    2 +-
> >>   mm/compaction.c        |    5 +++--
> >>   mm/vmscan.c            |   27 +++++++++++++--------------
> >>   4 files changed, 21 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index eff4918..2fed935 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -193,6 +193,10 @@ struct lruvec {
> >>   #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
> >>   /* Isolate for asynchronous migration */
> >>   #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
> >> +/* Isolate swap-backed pages */
> >> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
> >> +/* Isolate file-backed pages */
> >> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
> >>
> >>   /* LRU Isolation modes. */
> >>   typedef unsigned __bitwise__ isolate_mode_t;
> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> index ba2c8d7..dc6e6a3 100644
> >> --- a/include/linux/swap.h
> >> +++ b/include/linux/swap.h
> >> @@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
> >>   /* linux/mm/vmscan.c */
> >>   extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >>   					gfp_t gfp_mask, nodemask_t *mask);
> >> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
> >> +extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
> >>   extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
> >>   						  gfp_t gfp_mask, bool noswap);
> >>   extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> >> diff --git a/mm/compaction.c b/mm/compaction.c
> >> index 74a8c82..cc054f7 100644
> >> --- a/mm/compaction.c
> >> +++ b/mm/compaction.c
> >> @@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >>   	unsigned long last_pageblock_nr = 0, pageblock_nr;
> >>   	unsigned long nr_scanned = 0, nr_isolated = 0;
> >>   	struct list_head *migratelist =&cc->migratepages;
> >> -	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
> >> +	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
> >> +			      ISOLATE_FILE | ISOLATE_ANON;
> >>
> >>   	/* Do not scan outside zone boundaries */
> >>   	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
> >> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >>   			mode |= ISOLATE_ASYNC_MIGRATE;
> >>
> >>   		/* Try isolate the page */
> >> -		if (__isolate_lru_page(page, mode, 0) != 0)
> >> +		if (__isolate_lru_page(page, mode) != 0)
> >>   			continue;
> >>
> >>   		VM_BUG_ON(PageTransCompound(page));
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index af6cfe7..1b70338 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -1029,27 +1029,18 @@ keep_lumpy:
> >>    *
> >>    * returns 0 on success, -ve errno on failure.
> >>    */
> >> -int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
> >> +int __isolate_lru_page(struct page *page, isolate_mode_t mode)
> >>   {
> >> -	bool all_lru_mode;
> >>   	int ret = -EINVAL;
> >>
> >>   	/* Only take pages on the LRU. */
> >>   	if (!PageLRU(page))
> >>   		return ret;
> >>
> >> -	all_lru_mode = (mode&  (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
> >> -		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
> >> -
> >> -	/*
> >> -	 * When checking the active state, we need to be sure we are
> >> -	 * dealing with comparible boolean values.  Take the logical not
> >> -	 * of each.
> >> -	 */
> >> -	if (!all_lru_mode&&  !PageActive(page) != !(mode&  ISOLATE_ACTIVE))
> >> +	if (!(mode&  (PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE)))
> >>   		return ret;
> >
> > Isn't this complicated ?
> 
> But it doesn't blows my mind as old code does =)
> 
> Maybe someone can propose more clear variant?
> 

switch (mode & (ISOLATE_ACTIVE | ISOLATE_INACTIVE)) {
	case ISOLATE_ACTIVE :
		if (!PageActive(page))
			return ret;
	case ISOLATE_INACTIVE :
		if (PageActive(page))
			return ret;
	default:
		break;
	}
}

?

Thanks,
-Kame

> >
> >>
> >> -	if (!all_lru_mode&&  !!page_is_file_cache(page) != file)
> >> +	if (!(mode&  (page_is_file_cache(page) ? ISOLATE_FILE : ISOLATE_ANON)))
> >>   		return ret;
> >>
> >>   	/*
> >
> > ditto.
> >
> > Where is simple  ?
> >
> > Thanks,
> > -Kame
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-03-02  8:17       ` KAMEZAWA Hiroyuki
@ 2012-03-02  8:53         ` Konstantin Khlebnikov
  2012-03-06 11:57         ` Glauber Costa
  1 sibling, 0 replies; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-02  8:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Fri, 02 Mar 2012 09:51:27 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> KAMEZAWA Hiroyuki wrote:
>>> On Wed, 29 Feb 2012 13:15:47 +0400
>>> Konstantin Khlebnikov<khlebnikov@openvz.org>   wrote:
>>>
>>>> This patch adds file/anon filter bits into isolate_mode_t,
>>>> this allows to simplify checks in __isolate_lru_page().
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>>
>>> Hmm.. I like idea but..
>>>
>>>> ---
>>>>    include/linux/mmzone.h |    4 ++++
>>>>    include/linux/swap.h   |    2 +-
>>>>    mm/compaction.c        |    5 +++--
>>>>    mm/vmscan.c            |   27 +++++++++++++--------------
>>>>    4 files changed, 21 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index eff4918..2fed935 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -193,6 +193,10 @@ struct lruvec {
>>>>    #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>>>>    /* Isolate for asynchronous migration */
>>>>    #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
>>>> +/* Isolate swap-backed pages */
>>>> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
>>>> +/* Isolate file-backed pages */
>>>> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
>>>>
>>>>    /* LRU Isolation modes. */
>>>>    typedef unsigned __bitwise__ isolate_mode_t;
>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>> index ba2c8d7..dc6e6a3 100644
>>>> --- a/include/linux/swap.h
>>>> +++ b/include/linux/swap.h
>>>> @@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
>>>>    /* linux/mm/vmscan.c */
>>>>    extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>>>>    					gfp_t gfp_mask, nodemask_t *mask);
>>>> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
>>>> +extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
>>>>    extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
>>>>    						  gfp_t gfp_mask, bool noswap);
>>>>    extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index 74a8c82..cc054f7 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>>>    	unsigned long last_pageblock_nr = 0, pageblock_nr;
>>>>    	unsigned long nr_scanned = 0, nr_isolated = 0;
>>>>    	struct list_head *migratelist =&cc->migratepages;
>>>> -	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
>>>> +	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
>>>> +			      ISOLATE_FILE | ISOLATE_ANON;
>>>>
>>>>    	/* Do not scan outside zone boundaries */
>>>>    	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
>>>> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>>>    			mode |= ISOLATE_ASYNC_MIGRATE;
>>>>
>>>>    		/* Try isolate the page */
>>>> -		if (__isolate_lru_page(page, mode, 0) != 0)
>>>> +		if (__isolate_lru_page(page, mode) != 0)
>>>>    			continue;
>>>>
>>>>    		VM_BUG_ON(PageTransCompound(page));
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index af6cfe7..1b70338 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1029,27 +1029,18 @@ keep_lumpy:
>>>>     *
>>>>     * returns 0 on success, -ve errno on failure.
>>>>     */
>>>> -int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>>>> +int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>>>>    {
>>>> -	bool all_lru_mode;
>>>>    	int ret = -EINVAL;
>>>>
>>>>    	/* Only take pages on the LRU. */
>>>>    	if (!PageLRU(page))
>>>>    		return ret;
>>>>
>>>> -	all_lru_mode = (mode&   (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
>>>> -		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
>>>> -
>>>> -	/*
>>>> -	 * When checking the active state, we need to be sure we are
>>>> -	 * dealing with comparible boolean values.  Take the logical not
>>>> -	 * of each.
>>>> -	 */
>>>> -	if (!all_lru_mode&&   !PageActive(page) != !(mode&   ISOLATE_ACTIVE))
>>>> +	if (!(mode&   (PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE)))
>>>>    		return ret;
>>>
>>> Isn't this complicated ?
>>
>> But it doesn't blows my mind as old code does =)
>>
>> Maybe someone can propose more clear variant?
>>
>
> switch (mode&  (ISOLATE_ACTIVE | ISOLATE_INACTIVE)) {
> 	case ISOLATE_ACTIVE :
> 		if (!PageActive(page))
> 			return ret;
> 	case ISOLATE_INACTIVE :
> 		if (PageActive(page))
> 			return ret;
> 	default:
> 		break;
> 	}
> }
>
> ?

Thanks. It is more optimal and looks more simple. but you lost one "break"

>
> Thanks,
> -Kame
>
>>>
>>>>
>>>> -	if (!all_lru_mode&&   !!page_is_file_cache(page) != file)
>>>> +	if (!(mode&   (page_is_file_cache(page) ? ISOLATE_FILE : ISOLATE_ANON)))
>>>>    		return ret;
>>>>
>>>>    	/*
>>>
>>> ditto.
>>>
>>> Where is simple  ?
>>>
>>> Thanks,
>>> -Kame
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
>>
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-02-29  9:15 ` [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter Konstantin Khlebnikov
  2012-03-02  5:17   ` KAMEZAWA Hiroyuki
@ 2012-03-03  0:22   ` Hugh Dickins
  2012-03-03  8:27     ` Konstantin Khlebnikov
  2012-03-03  9:16   ` [PATCH 3/7 v2] " Konstantin Khlebnikov
  2 siblings, 1 reply; 41+ messages in thread
From: Hugh Dickins @ 2012-03-03  0:22 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

On Wed, 29 Feb 2012, Konstantin Khlebnikov wrote:

> This patch adds file/anon filter bits into isolate_mode_t,
> this allows to simplify checks in __isolate_lru_page().
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Almost-Acked-by: Hugh Dickins <hughd@google.com>

with one whitespace nit, and one functional addition requested.

I'm perfectly happy with your :?s myself, but some people do dislike
them.  I'm happy with the switch alternative if it's as efficient:
something that surprised me very much when trying to get convincing
performance numbers for per-memcg per-zone lru_lock at home...

... __isolate_lru_page() featured astonishly high on the perf report
of streaming from files on ext4 on /dev/ram0 to /dev/null, coming
immediately below the obvious zeroing and copying: okay, the zeroing
and copying were around 30% each, and __isolate_lru_page() down around
2% or below, but even so it seemed very odd that it should feature so
high, and any optimizations to it very welcome - unless it was purely
some bogus result.

> ---
>  include/linux/mmzone.h |    4 ++++
>  include/linux/swap.h   |    2 +-
>  mm/compaction.c        |    5 +++--
>  mm/vmscan.c            |   27 +++++++++++++--------------
>  4 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index eff4918..2fed935 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -193,6 +193,10 @@ struct lruvec {
>  #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>  /* Isolate for asynchronous migration */
>  #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
> +/* Isolate swap-backed pages */
> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
> +/* Isolate file-backed pages */
> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)

>From the patch you can see that the #defines above yours used a
space where you have used a tab: better to use a space as above.

> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  			mode |= ISOLATE_ASYNC_MIGRATE;
>  
>  		/* Try isolate the page */
> -		if (__isolate_lru_page(page, mode, 0) != 0)
> +		if (__isolate_lru_page(page, mode) != 0)
>  			continue;

I thought you were missing something there, but no, that's rather
the case you are simplifying.  However...

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index af6cfe7..1b70338 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1520,6 +1511,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>  		isolate_mode |= ISOLATE_UNMAPPED;
>  	if (!sc->may_writepage)
>  		isolate_mode |= ISOLATE_CLEAN;
> +	if (file)
> +		isolate_mode |= ISOLATE_FILE;
> +	else
> +		isolate_mode |= ISOLATE_ANON;

Above here, under "if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)",
don't you need

		isolate_mode |= ISOLATE_ACTIVE | ISOLATE_FILE | ISOLATE_ANON;

now to reproduce the same "all_lru_mode" behaviour as before?

Hugh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/7] mm: push lru index into shrink_[in]active_list()
  2012-02-29  9:15 ` [PATCH 4/7] mm: push lru index into shrink_[in]active_list() Konstantin Khlebnikov
  2012-03-02  5:21   ` KAMEZAWA Hiroyuki
@ 2012-03-03  0:24   ` Hugh Dickins
  1 sibling, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2012-03-03  0:24 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

On Wed, 29 Feb 2012, Konstantin Khlebnikov wrote:

> Let's toss lru index through call stack to isolate_lru_pages(),
> this is better than its reconstructing from individual bits.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Yes, this is an improvement, thanks:

Acked-by: Hugh Dickins <hughd@google.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-03-03  0:22   ` Hugh Dickins
@ 2012-03-03  8:27     ` Konstantin Khlebnikov
  2012-03-03  9:20       ` Konstantin Khlebnikov
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-03  8:27 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

Hugh Dickins wrote:
> On Wed, 29 Feb 2012, Konstantin Khlebnikov wrote:
>
>> This patch adds file/anon filter bits into isolate_mode_t,
>> this allows to simplify checks in __isolate_lru_page().
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> Almost-Acked-by: Hugh Dickins<hughd@google.com>
>
> with one whitespace nit, and one functional addition requested.
>
> I'm perfectly happy with your :?s myself, but some people do dislike
> them.  I'm happy with the switch alternative if it's as efficient:
> something that surprised me very much when trying to get convincing
> performance numbers for per-memcg per-zone lru_lock at home...
>
> ... __isolate_lru_page() featured astonishly high on the perf report
> of streaming from files on ext4 on /dev/ram0 to /dev/null, coming
> immediately below the obvious zeroing and copying: okay, the zeroing
> and copying were around 30% each, and __isolate_lru_page() down around
> 2% or below, but even so it seemed very odd that it should feature so
> high, and any optimizations to it very welcome - unless it was purely
> some bogus result.

Actually ANON/FILE ACTIVE/INACTIVE checks does not required at non-lumpy reclaim
(all pages are picked from right lru list) and compaction (it does not care).
But seems like removing these two bit-checks cannot give noticeable performance gain.

This patch can be postponed. It does not so important and
it does not share context with other patches in this set.

>
>> ---
>>   include/linux/mmzone.h |    4 ++++
>>   include/linux/swap.h   |    2 +-
>>   mm/compaction.c        |    5 +++--
>>   mm/vmscan.c            |   27 +++++++++++++--------------
>>   4 files changed, 21 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index eff4918..2fed935 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -193,6 +193,10 @@ struct lruvec {
>>   #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>>   /* Isolate for asynchronous migration */
>>   #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
>> +/* Isolate swap-backed pages */
>> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
>> +/* Isolate file-backed pages */
>> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
>
>  From the patch you can see that the #defines above yours used a
> space where you have used a tab: better to use a space as above.
>
>> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>   			mode |= ISOLATE_ASYNC_MIGRATE;
>>
>>   		/* Try isolate the page */
>> -		if (__isolate_lru_page(page, mode, 0) != 0)
>> +		if (__isolate_lru_page(page, mode) != 0)
>>   			continue;
>
> I thought you were missing something there, but no, that's rather
> the case you are simplifying.  However...
>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index af6cfe7..1b70338 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1520,6 +1511,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>>   		isolate_mode |= ISOLATE_UNMAPPED;
>>   	if (!sc->may_writepage)
>>   		isolate_mode |= ISOLATE_CLEAN;
>> +	if (file)
>> +		isolate_mode |= ISOLATE_FILE;
>> +	else
>> +		isolate_mode |= ISOLATE_ANON;
>
> Above here, under "if (sc->reclaim_mode&  RECLAIM_MODE_LUMPYRECLAIM)",
> don't you need
>
> 		isolate_mode |= ISOLATE_ACTIVE | ISOLATE_FILE | ISOLATE_ANON;
>
> now to reproduce the same "all_lru_mode" behaviour as before?

Yes, I missed this. Thanks.

>
> Hugh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-02-29  9:15 ` [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter Konstantin Khlebnikov
  2012-03-02  5:17   ` KAMEZAWA Hiroyuki
  2012-03-03  0:22   ` Hugh Dickins
@ 2012-03-03  9:16   ` Konstantin Khlebnikov
  2012-03-05  0:27     ` KAMEZAWA Hiroyuki
  2012-03-07  3:22     ` Hugh Dickins
  2 siblings, 2 replies; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-03  9:16 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Johannes Weiner, KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel

This patch adds file/anon filter bits into isolate_mode_t,
this allows to simplify checks in __isolate_lru_page().

v2:
* use switch () instead of if ()
* fixed lumpy-reclaim isolation mode

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mmzone.h |    4 ++++
 include/linux/swap.h   |    2 +-
 mm/compaction.c        |    5 +++--
 mm/vmscan.c            |   49 +++++++++++++++++++++++++++++++-----------------
 4 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5f1e4ee..e60dcbd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -192,6 +192,10 @@ struct lruvec {
 #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
 /* Isolate for asynchronous migration */
 #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
+/* Isolate swap-backed pages */
+#define ISOLATE_ANON		((__force isolate_mode_t)0x20)
+/* Isolate file-backed pages */
+#define ISOLATE_FILE		((__force isolate_mode_t)0x40)
 
 /* LRU Isolation modes. */
 typedef unsigned __bitwise__ isolate_mode_t;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ba2c8d7..dc6e6a3 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
-extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
+extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap);
 extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
diff --git a/mm/compaction.c b/mm/compaction.c
index 74a8c82..cc054f7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	unsigned long last_pageblock_nr = 0, pageblock_nr;
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
-	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
+	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
+			      ISOLATE_FILE | ISOLATE_ANON;
 
 	/* Do not scan outside zone boundaries */
 	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
@@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		/* Try isolate the page */
-		if (__isolate_lru_page(page, mode, 0) != 0)
+		if (__isolate_lru_page(page, mode) != 0)
 			continue;
 
 		VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7de3acc..cce1e14 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1029,28 +1029,35 @@ keep_lumpy:
  *
  * returns 0 on success, -ve errno on failure.
  */
-int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
+int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 {
-	bool all_lru_mode;
 	int ret = -EINVAL;
 
 	/* Only take pages on the LRU. */
 	if (!PageLRU(page))
 		return ret;
 
-	all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
-		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
-
-	/*
-	 * When checking the active state, we need to be sure we are
-	 * dealing with comparible boolean values.  Take the logical not
-	 * of each.
-	 */
-	if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
-		return ret;
+	switch (mode & (ISOLATE_ACTIVE | ISOLATE_INACTIVE)) {
+		case ISOLATE_ACTIVE:
+			if (!PageActive(page))
+				return ret;
+			break;
+		case ISOLATE_INACTIVE:
+			if (PageActive(page))
+				return ret;
+			break;
+	}
 
-	if (!all_lru_mode && !!page_is_file_cache(page) != file)
-		return ret;
+	switch (mode & (ISOLATE_FILE | ISOLATE_ANON)) {
+		case ISOLATE_FILE:
+			if (!page_is_file_cache(page))
+				return ret;
+			break;
+		case ISOLATE_ANON:
+			if (page_is_file_cache(page))
+				return ret;
+			break;
+	}
 
 	/*
 	 * When this function is being called for lumpy reclaim, we
@@ -1160,7 +1167,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 		VM_BUG_ON(!PageLRU(page));
 
-		switch (__isolate_lru_page(page, mode, file)) {
+		switch (__isolate_lru_page(page, mode)) {
 		case 0:
 			mem_cgroup_lru_del(page);
 			list_move(&page->lru, dst);
@@ -1218,7 +1225,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			    !PageSwapCache(cursor_page))
 				break;
 
-			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
+			if (__isolate_lru_page(cursor_page, mode) == 0) {
 				unsigned int isolated_pages;
 
 				mem_cgroup_lru_del(cursor_page);
@@ -1503,7 +1510,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 
 	set_reclaim_mode(priority, sc, false);
 	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
-		isolate_mode |= ISOLATE_ACTIVE;
+		isolate_mode |= ISOLATE_ACTIVE | ISOLATE_FILE | ISOLATE_ANON;
 
 	lru_add_drain();
 
@@ -1511,6 +1518,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)
 		isolate_mode |= ISOLATE_CLEAN;
+	if (file)
+		isolate_mode |= ISOLATE_FILE;
+	else
+		isolate_mode |= ISOLATE_ANON;
 
 	spin_lock_irq(&zone->lru_lock);
 
@@ -1677,6 +1688,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)
 		isolate_mode |= ISOLATE_CLEAN;
+	if (file)
+		isolate_mode |= ISOLATE_FILE;
+	else
+		isolate_mode |= ISOLATE_ANON;
 
 	spin_lock_irq(&zone->lru_lock);
 


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-03-03  8:27     ` Konstantin Khlebnikov
@ 2012-03-03  9:20       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-03  9:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
>> On Wed, 29 Feb 2012, Konstantin Khlebnikov wrote:
>>
>>> This patch adds file/anon filter bits into isolate_mode_t,
>>> this allows to simplify checks in __isolate_lru_page().
>>>
>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>
>> Almost-Acked-by: Hugh Dickins<hughd@google.com>
>>
>> with one whitespace nit, and one functional addition requested.
>>
>> I'm perfectly happy with your :?s myself, but some people do dislike
>> them.  I'm happy with the switch alternative if it's as efficient:
>> something that surprised me very much when trying to get convincing
>> performance numbers for per-memcg per-zone lru_lock at home...
>>
>> ... __isolate_lru_page() featured astonishly high on the perf report
>> of streaming from files on ext4 on /dev/ram0 to /dev/null, coming
>> immediately below the obvious zeroing and copying: okay, the zeroing
>> and copying were around 30% each, and __isolate_lru_page() down around
>> 2% or below, but even so it seemed very odd that it should feature so
>> high, and any optimizations to it very welcome - unless it was purely
>> some bogus result.
>
> Actually ANON/FILE ACTIVE/INACTIVE checks does not required at non-lumpy reclaim
> (all pages are picked from right lru list) and compaction (it does not care).
> But seems like removing these two bit-checks cannot give noticeable performance gain.
>
> This patch can be postponed. It does not so important and
> it does not share context with other patches in this set.

Oops, no it cannot be dropped. Next patch "mm: push lru index into shrink_[in]active_list()"
kills "file" variable in isolate_lru_pages(). I sent v2 for this patch only.

>
>>
>>> ---
>>>    include/linux/mmzone.h |    4 ++++
>>>    include/linux/swap.h   |    2 +-
>>>    mm/compaction.c        |    5 +++--
>>>    mm/vmscan.c            |   27 +++++++++++++--------------
>>>    4 files changed, 21 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index eff4918..2fed935 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -193,6 +193,10 @@ struct lruvec {
>>>    #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>>>    /* Isolate for asynchronous migration */
>>>    #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
>>> +/* Isolate swap-backed pages */
>>> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
>>> +/* Isolate file-backed pages */
>>> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
>>
>>    From the patch you can see that the #defines above yours used a
>> space where you have used a tab: better to use a space as above.
>>
>>> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>>    			mode |= ISOLATE_ASYNC_MIGRATE;
>>>
>>>    		/* Try isolate the page */
>>> -		if (__isolate_lru_page(page, mode, 0) != 0)
>>> +		if (__isolate_lru_page(page, mode) != 0)
>>>    			continue;
>>
>> I thought you were missing something there, but no, that's rather
>> the case you are simplifying.  However...
>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index af6cfe7..1b70338 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1520,6 +1511,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>>>    		isolate_mode |= ISOLATE_UNMAPPED;
>>>    	if (!sc->may_writepage)
>>>    		isolate_mode |= ISOLATE_CLEAN;
>>> +	if (file)
>>> +		isolate_mode |= ISOLATE_FILE;
>>> +	else
>>> +		isolate_mode |= ISOLATE_ANON;
>>
>> Above here, under "if (sc->reclaim_mode&   RECLAIM_MODE_LUMPYRECLAIM)",
>> don't you need
>>
>> 		isolate_mode |= ISOLATE_ACTIVE | ISOLATE_FILE | ISOLATE_ANON;
>>
>> now to reproduce the same "all_lru_mode" behaviour as before?
>
> Yes, I missed this. Thanks.
>
>>
>> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-03  9:16   ` [PATCH 3/7 v2] " Konstantin Khlebnikov
@ 2012-03-05  0:27     ` KAMEZAWA Hiroyuki
  2012-03-07  3:22     ` Hugh Dickins
  1 sibling, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-05  0:27 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Sat, 03 Mar 2012 13:16:48 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This patch adds file/anon filter bits into isolate_mode_t,
> this allows to simplify checks in __isolate_lru_page().
> 
> v2:
> * use switch () instead of if ()
> * fixed lumpy-reclaim isolation mode
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

seems simple.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled
  2012-03-02  5:12   ` KAMEZAWA Hiroyuki
@ 2012-03-06 11:46     ` Glauber Costa
  0 siblings, 0 replies; 41+ messages in thread
From: Glauber Costa @ 2012-03-06 11:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Konstantin Khlebnikov, Andrew Morton, Hugh Dickins,
	Johannes Weiner, linux-mm, linux-kernel

On 03/02/2012 09:12 AM, KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Feb 2012 13:15:39 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> From: Hugh Dickins<hughd@google.com>
>>
>> Although one has to admire the skill with which it has been concealed,
>> scanning_global_lru(mz) is actually just an interesting way to test
>> mem_cgroup_disabled().  Too many developer hours have been wasted on
>> confusing it with global_reclaim(): just use mem_cgroup_disabled().
>>
>> Signed-off-by: Hugh Dickins<hughd@google.com>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> Acked-by: KAMEZWA Hiroyuki<kamezawa.hiroyu@jp.fujitu.com>
>
>
Acked-by: Glauber Costa <glommer@parallels.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-03-02  8:17       ` KAMEZAWA Hiroyuki
  2012-03-02  8:53         ` Konstantin Khlebnikov
@ 2012-03-06 11:57         ` Glauber Costa
  2012-03-06 12:53           ` Konstantin Khlebnikov
  1 sibling, 1 reply; 41+ messages in thread
From: Glauber Costa @ 2012-03-06 11:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Konstantin Khlebnikov, Andrew Morton, Hugh Dickins,
	Johannes Weiner, linux-mm, linux-kernel

On 03/02/2012 12:17 PM, KAMEZAWA Hiroyuki wrote:
> On Fri, 02 Mar 2012 09:51:27 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> KAMEZAWA Hiroyuki wrote:
>>> On Wed, 29 Feb 2012 13:15:47 +0400
>>> Konstantin Khlebnikov<khlebnikov@openvz.org>   wrote:
>>>
>>>> This patch adds file/anon filter bits into isolate_mode_t,
>>>> this allows to simplify checks in __isolate_lru_page().
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>>
>>> Hmm.. I like idea but..
>>>
>>>> ---
>>>>    include/linux/mmzone.h |    4 ++++
>>>>    include/linux/swap.h   |    2 +-
>>>>    mm/compaction.c        |    5 +++--
>>>>    mm/vmscan.c            |   27 +++++++++++++--------------
>>>>    4 files changed, 21 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index eff4918..2fed935 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -193,6 +193,10 @@ struct lruvec {
>>>>    #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>>>>    /* Isolate for asynchronous migration */
>>>>    #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
>>>> +/* Isolate swap-backed pages */
>>>> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
>>>> +/* Isolate file-backed pages */
>>>> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
>>>>
>>>>    /* LRU Isolation modes. */
>>>>    typedef unsigned __bitwise__ isolate_mode_t;
>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>> index ba2c8d7..dc6e6a3 100644
>>>> --- a/include/linux/swap.h
>>>> +++ b/include/linux/swap.h
>>>> @@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
>>>>    /* linux/mm/vmscan.c */
>>>>    extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>>>>    					gfp_t gfp_mask, nodemask_t *mask);
>>>> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
>>>> +extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
>>>>    extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
>>>>    						  gfp_t gfp_mask, bool noswap);
>>>>    extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index 74a8c82..cc054f7 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>>>    	unsigned long last_pageblock_nr = 0, pageblock_nr;
>>>>    	unsigned long nr_scanned = 0, nr_isolated = 0;
>>>>    	struct list_head *migratelist =&cc->migratepages;
>>>> -	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
>>>> +	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
>>>> +			      ISOLATE_FILE | ISOLATE_ANON;
>>>>
>>>>    	/* Do not scan outside zone boundaries */
>>>>    	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
>>>> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>>>    			mode |= ISOLATE_ASYNC_MIGRATE;
>>>>
>>>>    		/* Try isolate the page */
>>>> -		if (__isolate_lru_page(page, mode, 0) != 0)
>>>> +		if (__isolate_lru_page(page, mode) != 0)
>>>>    			continue;
>>>>
>>>>    		VM_BUG_ON(PageTransCompound(page));
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index af6cfe7..1b70338 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1029,27 +1029,18 @@ keep_lumpy:
>>>>     *
>>>>     * returns 0 on success, -ve errno on failure.
>>>>     */
>>>> -int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>>>> +int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>>>>    {
>>>> -	bool all_lru_mode;
>>>>    	int ret = -EINVAL;
>>>>
>>>>    	/* Only take pages on the LRU. */
>>>>    	if (!PageLRU(page))
>>>>    		return ret;
>>>>
>>>> -	all_lru_mode = (mode&   (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
>>>> -		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
>>>> -
>>>> -	/*
>>>> -	 * When checking the active state, we need to be sure we are
>>>> -	 * dealing with comparible boolean values.  Take the logical not
>>>> -	 * of each.
>>>> -	 */
>>>> -	if (!all_lru_mode&&   !PageActive(page) != !(mode&   ISOLATE_ACTIVE))
>>>> +	if (!(mode&   (PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE)))
>>>>    		return ret;
>>>
>>> Isn't this complicated ?
>>
>> But it doesn't blows my mind as old code does =)
>>
>> Maybe someone can propose more clear variant?
>>
>
> switch (mode&  (ISOLATE_ACTIVE | ISOLATE_INACTIVE)) {
> 	case ISOLATE_ACTIVE :
> 		if (!PageActive(page))
> 			return ret;
> 	case ISOLATE_INACTIVE :
> 		if (PageActive(page))
> 			return ret;
> 	default:
> 		break;
> 	}
> }
>
> ?
>
> Thanks,
> -Kame
>

The switch gets a little bit too big (vertical-wise). Maybe just 
splitting it into two lines is enough to clarify its purpose.
How about:

int tmp_var = PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE
if (!(mode & tmp_var))
    ret;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter
  2012-03-06 11:57         ` Glauber Costa
@ 2012-03-06 12:53           ` Konstantin Khlebnikov
  0 siblings, 0 replies; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-06 12:53 UTC (permalink / raw)
  To: Glauber Costa
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Hugh Dickins, Johannes Weiner,
	linux-mm, linux-kernel

Glauber Costa wrote:
> On 03/02/2012 12:17 PM, KAMEZAWA Hiroyuki wrote:
>> On Fri, 02 Mar 2012 09:51:27 +0400
>> Konstantin Khlebnikov<khlebnikov@openvz.org>   wrote:
>>
>>> KAMEZAWA Hiroyuki wrote:
>>>> On Wed, 29 Feb 2012 13:15:47 +0400
>>>> Konstantin Khlebnikov<khlebnikov@openvz.org>    wrote:
>>>>
>>>>> This patch adds file/anon filter bits into isolate_mode_t,
>>>>> this allows to simplify checks in __isolate_lru_page().
>>>>>
>>>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>>>
>>>> Hmm.. I like idea but..
>>>>
>>>>> ---
>>>>>     include/linux/mmzone.h |    4 ++++
>>>>>     include/linux/swap.h   |    2 +-
>>>>>     mm/compaction.c        |    5 +++--
>>>>>     mm/vmscan.c            |   27 +++++++++++++--------------
>>>>>     4 files changed, 21 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>>> index eff4918..2fed935 100644
>>>>> --- a/include/linux/mmzone.h
>>>>> +++ b/include/linux/mmzone.h
>>>>> @@ -193,6 +193,10 @@ struct lruvec {
>>>>>     #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>>>>>     /* Isolate for asynchronous migration */
>>>>>     #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
>>>>> +/* Isolate swap-backed pages */
>>>>> +#define	ISOLATE_ANON		((__force isolate_mode_t)0x20)
>>>>> +/* Isolate file-backed pages */
>>>>> +#define	ISOLATE_FILE		((__force isolate_mode_t)0x40)
>>>>>
>>>>>     /* LRU Isolation modes. */
>>>>>     typedef unsigned __bitwise__ isolate_mode_t;
>>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>>> index ba2c8d7..dc6e6a3 100644
>>>>> --- a/include/linux/swap.h
>>>>> +++ b/include/linux/swap.h
>>>>> @@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
>>>>>     /* linux/mm/vmscan.c */
>>>>>     extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>>>>>     					gfp_t gfp_mask, nodemask_t *mask);
>>>>> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
>>>>> +extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
>>>>>     extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
>>>>>     						  gfp_t gfp_mask, bool noswap);
>>>>>     extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index 74a8c82..cc054f7 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>>>>     	unsigned long last_pageblock_nr = 0, pageblock_nr;
>>>>>     	unsigned long nr_scanned = 0, nr_isolated = 0;
>>>>>     	struct list_head *migratelist =&cc->migratepages;
>>>>> -	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
>>>>> +	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
>>>>> +			      ISOLATE_FILE | ISOLATE_ANON;
>>>>>
>>>>>     	/* Do not scan outside zone boundaries */
>>>>>     	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
>>>>> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>>>>     			mode |= ISOLATE_ASYNC_MIGRATE;
>>>>>
>>>>>     		/* Try isolate the page */
>>>>> -		if (__isolate_lru_page(page, mode, 0) != 0)
>>>>> +		if (__isolate_lru_page(page, mode) != 0)
>>>>>     			continue;
>>>>>
>>>>>     		VM_BUG_ON(PageTransCompound(page));
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index af6cfe7..1b70338 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -1029,27 +1029,18 @@ keep_lumpy:
>>>>>      *
>>>>>      * returns 0 on success, -ve errno on failure.
>>>>>      */
>>>>> -int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>>>>> +int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>>>>>     {
>>>>> -	bool all_lru_mode;
>>>>>     	int ret = -EINVAL;
>>>>>
>>>>>     	/* Only take pages on the LRU. */
>>>>>     	if (!PageLRU(page))
>>>>>     		return ret;
>>>>>
>>>>> -	all_lru_mode = (mode&    (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
>>>>> -		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
>>>>> -
>>>>> -	/*
>>>>> -	 * When checking the active state, we need to be sure we are
>>>>> -	 * dealing with comparible boolean values.  Take the logical not
>>>>> -	 * of each.
>>>>> -	 */
>>>>> -	if (!all_lru_mode&&    !PageActive(page) != !(mode&    ISOLATE_ACTIVE))
>>>>> +	if (!(mode&    (PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE)))
>>>>>     		return ret;
>>>>
>>>> Isn't this complicated ?
>>>
>>> But it doesn't blows my mind as old code does =)
>>>
>>> Maybe someone can propose more clear variant?
>>>
>>
>> switch (mode&   (ISOLATE_ACTIVE | ISOLATE_INACTIVE)) {
>> 	case ISOLATE_ACTIVE :
>> 		if (!PageActive(page))
>> 			return ret;
>> 	case ISOLATE_INACTIVE :
>> 		if (PageActive(page))
>> 			return ret;
>> 	default:
>> 		break;
>> 	}
>> }
>>
>> ?
>>
>> Thanks,
>> -Kame
>>
>
> The switch gets a little bit too big (vertical-wise). Maybe just
> splitting it into two lines is enough to clarify its purpose.
> How about:
>
> int tmp_var = PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE
> if (!(mode&  tmp_var))
>      ret;

Code lines are cheap, if code is clear.
I already sent [PATCH 3/7 v2] in reply to this patch.

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-03  9:16   ` [PATCH 3/7 v2] " Konstantin Khlebnikov
  2012-03-05  0:27     ` KAMEZAWA Hiroyuki
@ 2012-03-07  3:22     ` Hugh Dickins
  2012-03-08  5:30       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 41+ messages in thread
From: Hugh Dickins @ 2012-03-07  3:22 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

On Sat, 3 Mar 2012, Konstantin Khlebnikov wrote:

> This patch adds file/anon filter bits into isolate_mode_t,
> this allows to simplify checks in __isolate_lru_page().
> 
> v2:
> * use switch () instead of if ()
> * fixed lumpy-reclaim isolation mode
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

I'm sorry to be messing you around on this one, Konstantin, but...

(a) if you do go with the switch statements,
in kernel we align the "case"s underneath the "switch"

but

(b) I seem to be at odds with Kamezawa-san, I much preferred your
original, which did in 2 lines what the switches do in 10 lines.
And I'd say there's more opportunity for error in 10 lines than 2.

What does the compiler say (4.5.1 here, OPTIMIZE_FOR_SIZE off)?
   text	   data	    bss	    dec	    hex	filename
  17723	    113	     17	  17853	   45bd	vmscan.o.0
  17671	    113	     17	  17801	   4589	vmscan.o.1
  17803	    113	     17	  17933	   460d	vmscan.o.2

That suggests that your v2 is the worst and your v1 the best.
Kame, can I persuade you to let the compiler decide on this?

If people have to think for a moment about the concise expression,
well, great, I don't mind people having to think about kernel code.

Hugh

> ---
>  include/linux/mmzone.h |    4 ++++
>  include/linux/swap.h   |    2 +-
>  mm/compaction.c        |    5 +++--
>  mm/vmscan.c            |   49 +++++++++++++++++++++++++++++++-----------------
>  4 files changed, 40 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 5f1e4ee..e60dcbd 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -192,6 +192,10 @@ struct lruvec {
>  #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
>  /* Isolate for asynchronous migration */
>  #define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
> +/* Isolate swap-backed pages */
> +#define ISOLATE_ANON		((__force isolate_mode_t)0x20)
> +/* Isolate file-backed pages */
> +#define ISOLATE_FILE		((__force isolate_mode_t)0x40)
>  
>  /* LRU Isolation modes. */
>  typedef unsigned __bitwise__ isolate_mode_t;
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ba2c8d7..dc6e6a3 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -254,7 +254,7 @@ static inline void lru_cache_add_file(struct page *page)
>  /* linux/mm/vmscan.c */
>  extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  					gfp_t gfp_mask, nodemask_t *mask);
> -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
> +extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
>  extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
>  						  gfp_t gfp_mask, bool noswap);
>  extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 74a8c82..cc054f7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -261,7 +261,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	unsigned long last_pageblock_nr = 0, pageblock_nr;
>  	unsigned long nr_scanned = 0, nr_isolated = 0;
>  	struct list_head *migratelist = &cc->migratepages;
> -	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
> +	isolate_mode_t mode = ISOLATE_ACTIVE | ISOLATE_INACTIVE |
> +			      ISOLATE_FILE | ISOLATE_ANON;
>  
>  	/* Do not scan outside zone boundaries */
>  	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
> @@ -375,7 +376,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  			mode |= ISOLATE_ASYNC_MIGRATE;
>  
>  		/* Try isolate the page */
> -		if (__isolate_lru_page(page, mode, 0) != 0)
> +		if (__isolate_lru_page(page, mode) != 0)
>  			continue;
>  
>  		VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7de3acc..cce1e14 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1029,28 +1029,35 @@ keep_lumpy:
>   *
>   * returns 0 on success, -ve errno on failure.
>   */
> -int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
> +int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  {
> -	bool all_lru_mode;
>  	int ret = -EINVAL;
>  
>  	/* Only take pages on the LRU. */
>  	if (!PageLRU(page))
>  		return ret;
>  
> -	all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
> -		(ISOLATE_ACTIVE|ISOLATE_INACTIVE);
> -
> -	/*
> -	 * When checking the active state, we need to be sure we are
> -	 * dealing with comparible boolean values.  Take the logical not
> -	 * of each.
> -	 */
> -	if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
> -		return ret;
> +	switch (mode & (ISOLATE_ACTIVE | ISOLATE_INACTIVE)) {
> +		case ISOLATE_ACTIVE:
> +			if (!PageActive(page))
> +				return ret;
> +			break;
> +		case ISOLATE_INACTIVE:
> +			if (PageActive(page))
> +				return ret;
> +			break;
> +	}
>  
> -	if (!all_lru_mode && !!page_is_file_cache(page) != file)
> -		return ret;
> +	switch (mode & (ISOLATE_FILE | ISOLATE_ANON)) {
> +		case ISOLATE_FILE:
> +			if (!page_is_file_cache(page))
> +				return ret;
> +			break;
> +		case ISOLATE_ANON:
> +			if (page_is_file_cache(page))
> +				return ret;
> +			break;
> +	}
>  
>  	/*
>  	 * When this function is being called for lumpy reclaim, we
> @@ -1160,7 +1167,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  
>  		VM_BUG_ON(!PageLRU(page));
>  
> -		switch (__isolate_lru_page(page, mode, file)) {
> +		switch (__isolate_lru_page(page, mode)) {
>  		case 0:
>  			mem_cgroup_lru_del(page);
>  			list_move(&page->lru, dst);
> @@ -1218,7 +1225,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			    !PageSwapCache(cursor_page))
>  				break;
>  
> -			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> +			if (__isolate_lru_page(cursor_page, mode) == 0) {
>  				unsigned int isolated_pages;
>  
>  				mem_cgroup_lru_del(cursor_page);
> @@ -1503,7 +1510,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>  
>  	set_reclaim_mode(priority, sc, false);
>  	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
> -		isolate_mode |= ISOLATE_ACTIVE;
> +		isolate_mode |= ISOLATE_ACTIVE | ISOLATE_FILE | ISOLATE_ANON;
>  
>  	lru_add_drain();
>  
> @@ -1511,6 +1518,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>  		isolate_mode |= ISOLATE_UNMAPPED;
>  	if (!sc->may_writepage)
>  		isolate_mode |= ISOLATE_CLEAN;
> +	if (file)
> +		isolate_mode |= ISOLATE_FILE;
> +	else
> +		isolate_mode |= ISOLATE_ANON;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  
> @@ -1677,6 +1688,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  		isolate_mode |= ISOLATE_UNMAPPED;
>  	if (!sc->may_writepage)
>  		isolate_mode |= ISOLATE_CLEAN;
> +	if (file)
> +		isolate_mode |= ISOLATE_FILE;
> +	else
> +		isolate_mode |= ISOLATE_ANON;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-07  3:22     ` Hugh Dickins
@ 2012-03-08  5:30       ` KAMEZAWA Hiroyuki
  2012-03-09  2:06         ` Hugh Dickins
  0 siblings, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-08  5:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Konstantin Khlebnikov, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

On Tue, 6 Mar 2012 19:22:21 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:

> On Sat, 3 Mar 2012, Konstantin Khlebnikov wrote:
> 
> > This patch adds file/anon filter bits into isolate_mode_t,
> > this allows to simplify checks in __isolate_lru_page().
> > 
> > v2:
> > * use switch () instead of if ()
> > * fixed lumpy-reclaim isolation mode
> > 
> > Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> 
> I'm sorry to be messing you around on this one, Konstantin, but...
> 
> (a) if you do go with the switch statements,
> in kernel we align the "case"s underneath the "switch"
> 
> but
> 
> (b) I seem to be at odds with Kamezawa-san, I much preferred your
> original, which did in 2 lines what the switches do in 10 lines.
> And I'd say there's more opportunity for error in 10 lines than 2.
> 
> What does the compiler say (4.5.1 here, OPTIMIZE_FOR_SIZE off)?
>    text	   data	    bss	    dec	    hex	filename
>   17723	    113	     17	  17853	   45bd	vmscan.o.0
>   17671	    113	     17	  17801	   4589	vmscan.o.1
>   17803	    113	     17	  17933	   460d	vmscan.o.2
> 
> That suggests that your v2 is the worst and your v1 the best.
> Kame, can I persuade you to let the compiler decide on this?
> 

Hmm. How about Costa' proposal ? as

int tmp_var = PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE
if (!(mode & tmp_var))
    ret;

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/7] mm/memcg: rework inactive_ratio calculation
  2012-03-02  6:24     ` Konstantin Khlebnikov
@ 2012-03-08  5:36       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-08  5:36 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, Johannes Weiner, linux-mm, linux-kernel

On Fri, 02 Mar 2012 10:24:12 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 29 Feb 2012 13:16:00 +0400
> > Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
> >
> >> This patch removes precalculated zone->inactive_ratio.
> >> Now it always calculated in inactive_anon_is_low() from current lru size.
> >> After that we can merge memcg and non-memcg cases and drop duplicated code.
> >>
> >> We can drop precalculated ratio, because its calculation fast enough to do it
> >> each time. Plus precalculation uses zone size as basis, this estimation not
> >> always match with page lru size, for example if a significant proportion
> >> of memory occupied by kernel objects.om memory cgroup which is triggered this memory reclaim.
> This is more reason
> >>
> >> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> >
> > Maybe good....but please don't change the user interface /proc/zoneinfo implicitly.
> > How about calculating inactive_ratio at reading /proc/zoneinfo ?
> 
> I don't know... Anybody need this?

I don't like changes in user interface without any caution in feature-removal-schedule.txt
I just don't Ack. If others say ok, please go ahead with them.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-08  5:30       ` KAMEZAWA Hiroyuki
@ 2012-03-09  2:06         ` Hugh Dickins
  2012-03-09  7:16           ` Konstantin Khlebnikov
  0 siblings, 1 reply; 41+ messages in thread
From: Hugh Dickins @ 2012-03-09  2:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Konstantin Khlebnikov, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

On Thu, 8 Mar 2012, KAMEZAWA Hiroyuki wrote:
> On Tue, 6 Mar 2012 19:22:21 -0800 (PST)
> Hugh Dickins <hughd@google.com> wrote:
> > 
> > What does the compiler say (4.5.1 here, OPTIMIZE_FOR_SIZE off)?
> >    text	   data	    bss	    dec	    hex	filename
> >   17723	    113	     17	  17853	   45bd	vmscan.o.0
> >   17671	    113	     17	  17801	   4589	vmscan.o.1
> >   17803	    113	     17	  17933	   460d	vmscan.o.2
> > 
> > That suggests that your v2 is the worst and your v1 the best.
> > Kame, can I persuade you to let the compiler decide on this?
> > 
> 
> Hmm. How about Costa' proposal ? as
> 
> int tmp_var = PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE
> if (!(mode & tmp_var))
>     ret;

Yes, that would have been a good compromise (given a better name
than "tmp_var"!), I didn't realize that one was acceptable to you.

But I see that Konstantin has been inspired by our disagreement to a
more creative solution.

I like very much the look of what he's come up with, but I'm still
puzzling over why it barely makes any improvement to __isolate_lru_page():
seems significantly inferior (in code size terms) to his original (which
I imagine Glauber's compromise would be equivalent to).

At some point I ought to give up on niggling about this,
but I haven't quite got there yet.

Hugh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-09  2:06         ` Hugh Dickins
@ 2012-03-09  7:16           ` Konstantin Khlebnikov
  2012-03-10  0:04             ` Hugh Dickins
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-09  7:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

Hugh Dickins wrote:
> On Thu, 8 Mar 2012, KAMEZAWA Hiroyuki wrote:
>> On Tue, 6 Mar 2012 19:22:21 -0800 (PST)
>> Hugh Dickins<hughd@google.com>  wrote:
>>>
>>> What does the compiler say (4.5.1 here, OPTIMIZE_FOR_SIZE off)?
>>>     text	   data	    bss	    dec	    hex	filename
>>>    17723	    113	     17	  17853	   45bd	vmscan.o.0
>>>    17671	    113	     17	  17801	   4589	vmscan.o.1
>>>    17803	    113	     17	  17933	   460d	vmscan.o.2
>>>
>>> That suggests that your v2 is the worst and your v1 the best.
>>> Kame, can I persuade you to let the compiler decide on this?
>>>
>>
>> Hmm. How about Costa' proposal ? as
>>
>> int tmp_var = PageActive(page) ? ISOLATE_ACTIVE : ISOLATE_INACTIVE
>> if (!(mode&  tmp_var))
>>      ret;
>
> Yes, that would have been a good compromise (given a better name
> than "tmp_var"!), I didn't realize that one was acceptable to you.
>
> But I see that Konstantin has been inspired by our disagreement to a
> more creative solution.
>
> I like very much the look of what he's come up with, but I'm still
> puzzling over why it barely makes any improvement to __isolate_lru_page():
> seems significantly inferior (in code size terms) to his original (which
> I imagine Glauber's compromise would be equivalent to).
>
> At some point I ought to give up on niggling about this,
> but I haven't quite got there yet.

(with if())
$ ./scripts/bloat-o-meter built-in.o built-in.o-v1
add/remove: 0/0 grow/shrink: 2/1 up/down: 32/-20 (12)
function                                     old     new   delta
static.shrink_active_list                    837     853     +16
shrink_inactive_list                        1259    1275     +16
static.isolate_lru_pages                    1055    1035     -20

(with switch())
$ ./scripts/bloat-o-meter built-in.o built-in.o-v2
add/remove: 0/0 grow/shrink: 4/2 up/down: 111/-23 (88)
function                                     old     new   delta
__isolate_lru_page                           301     377     +76
static.shrink_active_list                    837     853     +16
shrink_inactive_list                        1259    1275     +16
page_evictable                               170     173      +3
__remove_mapping                             322     319      -3
static.isolate_lru_pages                    1055    1035     -20

(without __always_inline on page_lru())
$ ./scripts/bloat-o-meter built-in.o built-in.o-v5-noinline
add/remove: 0/0 grow/shrink: 5/2 up/down: 93/-23 (70)
function                                     old     new   delta
__isolate_lru_page                           301     333     +32
isolate_lru_page                             359     385     +26
static.shrink_active_list                    837     853     +16
putback_inactive_pages                       635     651     +16
page_evictable                               170     173      +3
__remove_mapping                             322     319      -3
static.isolate_lru_pages                    1055    1035     -20

$ ./scripts/bloat-o-meter built-in.o built-in.o-v5
add/remove: 0/0 grow/shrink: 3/4 up/down: 35/-67 (-32)
function                                     old     new   delta
static.shrink_active_list                    837     853     +16
__isolate_lru_page                           301     317     +16
page_evictable                               170     173      +3
__remove_mapping                             322     319      -3
mem_cgroup_lru_del                            73      65      -8
static.isolate_lru_pages                    1055    1035     -20
__mem_cgroup_commit_charge                   676     640     -36

Actually __isolate_lru_page() even little bit bigger

>
> Hugh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-09  7:16           ` Konstantin Khlebnikov
@ 2012-03-10  0:04             ` Hugh Dickins
  2012-03-10  6:55               ` Konstantin Khlebnikov
  0 siblings, 1 reply; 41+ messages in thread
From: Hugh Dickins @ 2012-03-10  0:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

On Fri, 9 Mar 2012, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> > 
> > I like very much the look of what he's come up with, but I'm still
> > puzzling over why it barely makes any improvement to __isolate_lru_page():
> > seems significantly inferior (in code size terms) to his original (which
> > I imagine Glauber's compromise would be equivalent to).
> > 
> > At some point I ought to give up on niggling about this,
> > but I haven't quite got there yet.
> 
> (with if())
> $ ./scripts/bloat-o-meter built-in.o built-in.o-v1
> add/remove: 0/0 grow/shrink: 2/1 up/down: 32/-20 (12)
> function                                     old     new   delta
> static.shrink_active_list                    837     853     +16
> shrink_inactive_list                        1259    1275     +16
> static.isolate_lru_pages                    1055    1035     -20
> 
> (with switch())
> $ ./scripts/bloat-o-meter built-in.o built-in.o-v2
> add/remove: 0/0 grow/shrink: 4/2 up/down: 111/-23 (88)
> function                                     old     new   delta
> __isolate_lru_page                           301     377     +76
> static.shrink_active_list                    837     853     +16
> shrink_inactive_list                        1259    1275     +16
> page_evictable                               170     173      +3
> __remove_mapping                             322     319      -3
> static.isolate_lru_pages                    1055    1035     -20
> 
> (without __always_inline on page_lru())
> $ ./scripts/bloat-o-meter built-in.o built-in.o-v5-noinline
> add/remove: 0/0 grow/shrink: 5/2 up/down: 93/-23 (70)
> function                                     old     new   delta
> __isolate_lru_page                           301     333     +32
> isolate_lru_page                             359     385     +26
> static.shrink_active_list                    837     853     +16
> putback_inactive_pages                       635     651     +16
> page_evictable                               170     173      +3
> __remove_mapping                             322     319      -3
> static.isolate_lru_pages                    1055    1035     -20
> 
> $ ./scripts/bloat-o-meter built-in.o built-in.o-v5
> add/remove: 0/0 grow/shrink: 3/4 up/down: 35/-67 (-32)
> function                                     old     new   delta
> static.shrink_active_list                    837     853     +16
> __isolate_lru_page                           301     317     +16
> page_evictable                               170     173      +3
> __remove_mapping                             322     319      -3
> mem_cgroup_lru_del                            73      65      -8
> static.isolate_lru_pages                    1055    1035     -20
> __mem_cgroup_commit_charge                   676     640     -36
> 
> Actually __isolate_lru_page() even little bit bigger

I was coming to realize that it must be your page_lru()ing:
although it's dressed up in one line, there's several branches there.

I think you'll find you have a clear winner at last, if you just pass
lru on down as third arg to __isolate_lru_page(), where file used to
be passed, instead of re-evaluating it inside.

shrink callers already have the lru, and compaction works it out
immediately afterwards.

Though we do need to be careful: the lumpy case would then have to
pass page_lru(cursor_page).  Oh, actually no (though it would deserve
a comment): since the lumpy case selects LRU_ALL_EVICTABLE, it's
irrelevant what it passes for lru, so might as well stick with
the one passed down.  Though you may decide I'm being too tricky
there, and prefer to calculate page_lru(cursor_page) anyway, it
not being the hottest path.

Whether you'd still want page_lru(page) __always_inline, I don't know.

Hugh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-10  0:04             ` Hugh Dickins
@ 2012-03-10  6:55               ` Konstantin Khlebnikov
  2012-03-10  9:46                 ` Konstantin Khlebnikov
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-10  6:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

Hugh Dickins wrote:
> On Fri, 9 Mar 2012, Konstantin Khlebnikov wrote:
>> Hugh Dickins wrote:
>>>
>>> I like very much the look of what he's come up with, but I'm still
>>> puzzling over why it barely makes any improvement to __isolate_lru_page():
>>> seems significantly inferior (in code size terms) to his original (which
>>> I imagine Glauber's compromise would be equivalent to).
>>>
>>> At some point I ought to give up on niggling about this,
>>> but I haven't quite got there yet.
>>
>> (with if())
>> $ ./scripts/bloat-o-meter built-in.o built-in.o-v1
>> add/remove: 0/0 grow/shrink: 2/1 up/down: 32/-20 (12)
>> function                                     old     new   delta
>> static.shrink_active_list                    837     853     +16
>> shrink_inactive_list                        1259    1275     +16
>> static.isolate_lru_pages                    1055    1035     -20
>>
>> (with switch())
>> $ ./scripts/bloat-o-meter built-in.o built-in.o-v2
>> add/remove: 0/0 grow/shrink: 4/2 up/down: 111/-23 (88)
>> function                                     old     new   delta
>> __isolate_lru_page                           301     377     +76
>> static.shrink_active_list                    837     853     +16
>> shrink_inactive_list                        1259    1275     +16
>> page_evictable                               170     173      +3
>> __remove_mapping                             322     319      -3
>> static.isolate_lru_pages                    1055    1035     -20
>>
>> (without __always_inline on page_lru())
>> $ ./scripts/bloat-o-meter built-in.o built-in.o-v5-noinline
>> add/remove: 0/0 grow/shrink: 5/2 up/down: 93/-23 (70)
>> function                                     old     new   delta
>> __isolate_lru_page                           301     333     +32
>> isolate_lru_page                             359     385     +26
>> static.shrink_active_list                    837     853     +16
>> putback_inactive_pages                       635     651     +16
>> page_evictable                               170     173      +3
>> __remove_mapping                             322     319      -3
>> static.isolate_lru_pages                    1055    1035     -20
>>
>> $ ./scripts/bloat-o-meter built-in.o built-in.o-v5
>> add/remove: 0/0 grow/shrink: 3/4 up/down: 35/-67 (-32)
>> function                                     old     new   delta
>> static.shrink_active_list                    837     853     +16
>> __isolate_lru_page                           301     317     +16
>> page_evictable                               170     173      +3
>> __remove_mapping                             322     319      -3
>> mem_cgroup_lru_del                            73      65      -8
>> static.isolate_lru_pages                    1055    1035     -20
>> __mem_cgroup_commit_charge                   676     640     -36
>>
>> Actually __isolate_lru_page() even little bit bigger
>
> I was coming to realize that it must be your page_lru()ing:
> although it's dressed up in one line, there's several branches there.

Yes, but I think we can optimize page_lru(): we can prepare ready-to-use
page lru index in lower bits of page->flags, if we swap page flags and split
LRU_UNEVICTABLE into FILE/ANON parts.

>
> I think you'll find you have a clear winner at last, if you just pass
> lru on down as third arg to __isolate_lru_page(), where file used to
> be passed, instead of re-evaluating it inside.
>
> shrink callers already have the lru, and compaction works it out
> immediately afterwards.

No, for non-lumpy isolation we don't need this check at all,
because all pages already picked from right lru list.

I'll send separate patch for this (on top v5 patchset), after meditation =)

>
> Though we do need to be careful: the lumpy case would then have to
> pass page_lru(cursor_page).  Oh, actually no (though it would deserve
> a comment): since the lumpy case selects LRU_ALL_EVICTABLE, it's
> irrelevant what it passes for lru, so might as well stick with
> the one passed down.  Though you may decide I'm being too tricky
> there, and prefer to calculate page_lru(cursor_page) anyway, it
> not being the hottest path.
>
> Whether you'd still want page_lru(page) __always_inline, I don't know.
>
> Hugh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-10  6:55               ` Konstantin Khlebnikov
@ 2012-03-10  9:46                 ` Konstantin Khlebnikov
  2012-03-15  1:47                   ` Hugh Dickins
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-10  9:46 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
>> On Fri, 9 Mar 2012, Konstantin Khlebnikov wrote:
>>>
>>> Actually __isolate_lru_page() even little bit bigger
>>
>> I was coming to realize that it must be your page_lru()ing:
>> although it's dressed up in one line, there's several branches there.
>
> Yes, but I think we can optimize page_lru(): we can prepare ready-to-use
> page lru index in lower bits of page->flags, if we swap page flags and split
> LRU_UNEVICTABLE into FILE/ANON parts.
>
>>
>> I think you'll find you have a clear winner at last, if you just pass
>> lru on down as third arg to __isolate_lru_page(), where file used to
>> be passed, instead of re-evaluating it inside.
>>
>> shrink callers already have the lru, and compaction works it out
>> immediately afterwards.
>
> No, for non-lumpy isolation we don't need this check at all,
> because all pages already picked from right lru list.
>
> I'll send separate patch for this (on top v5 patchset), after meditation =)

Heh, looks like we don't need these checks at all:
without RECLAIM_MODE_LUMPYRECLAIM we isolate only pages from right lru,
with RECLAIM_MODE_LUMPYRECLAIM we isolate pages from all evictable lru.
Thus we should check only PageUnevictable() on lumpy reclaim.

>
>>
>> Though we do need to be careful: the lumpy case would then have to
>> pass page_lru(cursor_page).  Oh, actually no (though it would deserve
>> a comment): since the lumpy case selects LRU_ALL_EVICTABLE, it's
>> irrelevant what it passes for lru, so might as well stick with
>> the one passed down.  Though you may decide I'm being too tricky
>> there, and prefer to calculate page_lru(cursor_page) anyway, it
>> not being the hottest path.
>>
>> Whether you'd still want page_lru(page) __always_inline, I don't know.
>>
>> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-10  9:46                 ` Konstantin Khlebnikov
@ 2012-03-15  1:47                   ` Hugh Dickins
  2012-03-15  6:03                     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 41+ messages in thread
From: Hugh Dickins @ 2012-03-15  1:47 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

On Sat, 10 Mar 2012, Konstantin Khlebnikov wrote:
> Konstantin Khlebnikov wrote:
> > 
> > No, for non-lumpy isolation we don't need this check at all,
> > because all pages already picked from right lru list.
> > 
> > I'll send separate patch for this (on top v5 patchset), after meditation =)
> 
> Heh, looks like we don't need these checks at all:
> without RECLAIM_MODE_LUMPYRECLAIM we isolate only pages from right lru,
> with RECLAIM_MODE_LUMPYRECLAIM we isolate pages from all evictable lru.
> Thus we should check only PageUnevictable() on lumpy reclaim.

Yes, those were great simplfying insights: I'm puzzling over why you
didn't follow through on them in your otherwise nice 4.5/7, which
still involves lru bits in the isolate mode?

Hugh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-15  1:47                   ` Hugh Dickins
@ 2012-03-15  6:03                     ` Konstantin Khlebnikov
  2012-03-15 23:58                       ` Hugh Dickins
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-15  6:03 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

Hugh Dickins wrote:
> On Sat, 10 Mar 2012, Konstantin Khlebnikov wrote:
>> Konstantin Khlebnikov wrote:
>>>
>>> No, for non-lumpy isolation we don't need this check at all,
>>> because all pages already picked from right lru list.
>>>
>>> I'll send separate patch for this (on top v5 patchset), after meditation =)
>>
>> Heh, looks like we don't need these checks at all:
>> without RECLAIM_MODE_LUMPYRECLAIM we isolate only pages from right lru,
>> with RECLAIM_MODE_LUMPYRECLAIM we isolate pages from all evictable lru.
>> Thus we should check only PageUnevictable() on lumpy reclaim.
>
> Yes, those were great simplfying insights: I'm puzzling over why you
> didn't follow through on them in your otherwise nice 4.5/7, which
> still involves lru bits in the isolate mode?

Actually filter is required for single case: lumpy isolation for shrink_active_list().
Maybe I'm wrong, or this is bug, but I don't see any reasons why this can not happen:
sc->reclaim_mode manipulations are very tricky.

>
> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7 v2] mm: rework __isolate_lru_page() file/anon filter
  2012-03-15  6:03                     ` Konstantin Khlebnikov
@ 2012-03-15 23:58                       ` Hugh Dickins
  0 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2012-03-15 23:58 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Johannes Weiner, linux-mm,
	linux-kernel

On Thu, 15 Mar 2012, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> > On Sat, 10 Mar 2012, Konstantin Khlebnikov wrote:
> > > Konstantin Khlebnikov wrote:
> > > 
> > > Heh, looks like we don't need these checks at all:
> > > without RECLAIM_MODE_LUMPYRECLAIM we isolate only pages from right lru,
> > > with RECLAIM_MODE_LUMPYRECLAIM we isolate pages from all evictable lru.
> > > Thus we should check only PageUnevictable() on lumpy reclaim.
> > 
> > Yes, those were great simplfying insights: I'm puzzling over why you
> > didn't follow through on them in your otherwise nice 4.5/7, which
> > still involves lru bits in the isolate mode?
> 
> Actually filter is required for single case: lumpy isolation for
> shrink_active_list().
> Maybe I'm wrong,

You are right.  I thought you were wrong, but testing proved you right.

> or this is bug, but I don't see any reasons why this can not happen:

It's very close to being a bug: perhaps I'd call it overlooked silliness.

> sc->reclaim_mode manipulations are very tricky.

And you are right on that too.  Particularly those reset_reclaim_mode()s
in shrink_page_list(), when the set_reclaim_mode() is done at the head of
shrink_inactive_list().

With no set_reclaim_mode() or reset_reclaim_mode() in shrink_active_list(),
so its isolate_lru_pages() test for RECLAIM_MODE_LUMPYRECLAIM just picks up
whatever sc->reclaim_mode was left over from earlier shrink_inactive_list().

Or none: the age_active_anon() call to shrink_active_list() never sets
sc->reclaim_mode, and is lucky that the only test for RECLAIM_MODE_SINGLE
is in code that it won't reach.

(Or maybe I've got some of those details wrong again, it's convoluted.)

I contend that what we want is

--- next/mm/vmscan.c	2012-03-13 03:52:15.360030839 -0700
+++ linux/mm/vmscan.c	2012-03-15 14:53:47.035519540 -0700
@@ -1690,6 +1690,8 @@ static void shrink_active_list(unsigned
 
 	lru_add_drain();
 
+	reset_reclaim_mode(sc);
+
 	if (!sc->may_unmap)
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)

but admit that's a slight change in behaviour - though I think only
a sensible one.  It's silly to embark upon lumpy reclaim of adjacent
pages, while tying your hands to pull only from file for file or from
anon for anon (though not so silly to restrict in/activity).

Shrinking the active list is about repopulating an inactive list when
it's low: shrinking the active list is not going to free any pages
(except when they're concurrently freed while it holds them isolated),
it's just going to move them to inactive; so aiming for adjacency at
this stage is pointless.  Counter-productive even: if it's going to
make any contribution to the lumpy reclaim, it should be populating
the inactive list with a variety of good candidates to seed the next
lump (whose adjacent pages will be isolated whichever list they're on):
by populating with adjacent pages itself, it lowers the chance of
later success, and increases the perturbation of lru-ness.

And if we do a reset_reclaim_mode(sc) in shrink_active_list(), then you
can remove the leftover lru stuff which spoils the simplicity of 4.5/7.

But you are right not to mix such a change in with your reorganization:
would you like to add the patch above into your series as a separate
patch (Cc Rik and Mel), or would you like me to send it separately
for discusssion, or do you see reason to disagree with it?

Hugh

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2012-03-15 23:59 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-29  9:15 [PATCH v4 ch1 0/7] mm: some cleanup/rework before lru_lock splitting Konstantin Khlebnikov
2012-02-29  9:15 ` [PATCH 1/7] mm/memcg: scanning_global_lru means mem_cgroup_disabled Konstantin Khlebnikov
2012-03-02  5:12   ` KAMEZAWA Hiroyuki
2012-03-06 11:46     ` Glauber Costa
2012-02-29  9:15 ` [PATCH 2/7] mm/memcg: move reclaim_stat into lruvec Konstantin Khlebnikov
2012-03-02  5:14   ` KAMEZAWA Hiroyuki
2012-02-29  9:15 ` [PATCH 3/7] mm: rework __isolate_lru_page() file/anon filter Konstantin Khlebnikov
2012-03-02  5:17   ` KAMEZAWA Hiroyuki
2012-03-02  5:51     ` Konstantin Khlebnikov
2012-03-02  8:17       ` KAMEZAWA Hiroyuki
2012-03-02  8:53         ` Konstantin Khlebnikov
2012-03-06 11:57         ` Glauber Costa
2012-03-06 12:53           ` Konstantin Khlebnikov
2012-03-03  0:22   ` Hugh Dickins
2012-03-03  8:27     ` Konstantin Khlebnikov
2012-03-03  9:20       ` Konstantin Khlebnikov
2012-03-03  9:16   ` [PATCH 3/7 v2] " Konstantin Khlebnikov
2012-03-05  0:27     ` KAMEZAWA Hiroyuki
2012-03-07  3:22     ` Hugh Dickins
2012-03-08  5:30       ` KAMEZAWA Hiroyuki
2012-03-09  2:06         ` Hugh Dickins
2012-03-09  7:16           ` Konstantin Khlebnikov
2012-03-10  0:04             ` Hugh Dickins
2012-03-10  6:55               ` Konstantin Khlebnikov
2012-03-10  9:46                 ` Konstantin Khlebnikov
2012-03-15  1:47                   ` Hugh Dickins
2012-03-15  6:03                     ` Konstantin Khlebnikov
2012-03-15 23:58                       ` Hugh Dickins
2012-02-29  9:15 ` [PATCH 4/7] mm: push lru index into shrink_[in]active_list() Konstantin Khlebnikov
2012-03-02  5:21   ` KAMEZAWA Hiroyuki
2012-03-03  0:24   ` Hugh Dickins
2012-02-29  9:15 ` [PATCH 5/7] mm: rework reclaim_stat counters Konstantin Khlebnikov
2012-03-02  5:28   ` KAMEZAWA Hiroyuki
2012-03-02  6:11     ` Konstantin Khlebnikov
2012-03-02  8:03       ` KAMEZAWA Hiroyuki
2012-02-29  9:16 ` [PATCH 6/7] mm/memcg: rework inactive_ratio calculation Konstantin Khlebnikov
2012-03-02  5:31   ` KAMEZAWA Hiroyuki
2012-03-02  6:24     ` Konstantin Khlebnikov
2012-03-08  5:36       ` KAMEZAWA Hiroyuki
2012-02-29  9:16 ` [PATCH 7/7] mm/memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
2012-03-02  5:32   ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).