[PATCH v3 00/21] mm: lru

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/21] mm: lru_lock splitting
@ 2012-02-23 13:51 Konstantin Khlebnikov
  2012-02-23 13:51 ` [PATCH v3 01/21] memcg: unify inactive_ratio calculation Konstantin Khlebnikov
                   ` (23 more replies)
  0 siblings, 24 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:51 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

v3 changes:
* inactive-ratio reworked again, now it always calculated from from scratch
* hierarchical pte reference bits filter in memory-cgroup reclaimer
* fixed two bugs in locking, found by Hugh Dickins
* locking functions slightly simplified
* new patch for isolated pages accounting
* new patch with lru interleaving

This patchset is based on next-20120210

git: https://github.com/koct9i/linux/commits/lruvec-v3

---

Konstantin Khlebnikov (21):
      memcg: unify inactive_ratio calculation
      memcg: make mm_match_cgroup() hirarchical
      memcg: fix page_referencies cgroup filter on global reclaim
      memcg: use vm_swappiness from target memory cgroup
      mm: rename lruvec->lists into lruvec->pages_lru
      mm: lruvec linking functions
      mm: add lruvec->pages_count
      mm: unify inactive_list_is_low()
      mm: add lruvec->reclaim_stat
      mm: kill struct mem_cgroup_zone
      mm: move page-to-lruvec translation upper
      mm: push lruvec into update_page_reclaim_stat()
      mm: push lruvecs from pagevec_lru_move_fn() to iterator
      mm: introduce lruvec locking primitives
      mm: handle lruvec relocks on lumpy reclaim
      mm: handle lruvec relocks in compaction
      mm: handle lruvec relock in memory controller
      mm: add to lruvec isolated pages counters
      memcg: check lru vectors emptiness in pre-destroy
      mm: split zone->lru_lock
      mm: zone lru vectors interleaving


 include/linux/huge_mm.h    |    3 
 include/linux/memcontrol.h |   75 ------
 include/linux/mm.h         |   66 +++++
 include/linux/mm_inline.h  |   19 +-
 include/linux/mmzone.h     |   39 ++-
 include/linux/swap.h       |    6 
 mm/Kconfig                 |   16 +
 mm/compaction.c            |   31 +--
 mm/huge_memory.c           |   14 +
 mm/internal.h              |  204 +++++++++++++++++
 mm/ksm.c                   |    2 
 mm/memcontrol.c            |  343 +++++++++++-----------------
 mm/migrate.c               |    2 
 mm/page_alloc.c            |   70 +-----
 mm/rmap.c                  |    2 
 mm/swap.c                  |  217 ++++++++++--------
 mm/vmscan.c                |  534 ++++++++++++++++++++++++--------------------
 mm/vmstat.c                |    6 
 18 files changed, 932 insertions(+), 717 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3 01/21] memcg: unify inactive_ratio calculation
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
@ 2012-02-23 13:51 ` Konstantin Khlebnikov
  2012-02-28  0:05   ` KAMEZAWA Hiroyuki
  2012-02-23 13:51 ` [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical Konstantin Khlebnikov
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:51 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

This patch removes precalculated zone->inactive_ratio.
Now it always calculated in inactive_anon_is_low() from current lru sizes.
After that we can merge memcg and non-memcg cases and drop duplicated code.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/memcontrol.h |   16 --------
 include/linux/mmzone.h     |    7 ----
 mm/memcontrol.c            |   38 -------------------
 mm/page_alloc.c            |   44 ----------------------
 mm/vmscan.c                |   88 ++++++++++++++++++++++++++++----------------
 mm/vmstat.c                |    6 +--
 6 files changed, 58 insertions(+), 141 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bf4e1f4..8c4d74f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -113,10 +113,6 @@ void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
 /*
  * For memory reclaim.
  */
-int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg,
-				    struct zone *zone);
-int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg,
-				    struct zone *zone);
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg,
 					int nid, int zid, unsigned int lrumask);
@@ -319,18 +315,6 @@ static inline bool mem_cgroup_disabled(void)
 	return true;
 }
 
-static inline int
-mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	return 1;
-}
-
-static inline int
-mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	return 1;
-}
-
 static inline unsigned long
 mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
 				unsigned int lru_mask)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f10a54c..3e1f7ff 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -382,13 +382,6 @@ struct zone {
 	/* Zone statistics */
 	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
 
-	/*
-	 * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
-	 * this zone's LRU.  Maintained by the pageout code.
-	 */
-	unsigned int inactive_ratio;
-
-
 	ZONE_PADDING(_pad2_)
 	/* Rarely used or read-mostly fields */
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab315ab..b8039d2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1157,44 +1157,6 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *memcg)
 	return ret;
 }
 
-int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	unsigned long inactive_ratio;
-	int nid = zone_to_nid(zone);
-	int zid = zone_idx(zone);
-	unsigned long inactive;
-	unsigned long active;
-	unsigned long gb;
-
-	inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-						BIT(LRU_INACTIVE_ANON));
-	active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-					      BIT(LRU_ACTIVE_ANON));
-
-	gb = (inactive + active) >> (30 - PAGE_SHIFT);
-	if (gb)
-		inactive_ratio = int_sqrt(10 * gb);
-	else
-		inactive_ratio = 1;
-
-	return inactive * inactive_ratio < active;
-}
-
-int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg, struct zone *zone)
-{
-	unsigned long active;
-	unsigned long inactive;
-	int zid = zone_idx(zone);
-	int nid = zone_to_nid(zone);
-
-	inactive = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-						BIT(LRU_INACTIVE_FILE));
-	active = mem_cgroup_zone_nr_lru_pages(memcg, nid, zid,
-					      BIT(LRU_ACTIVE_FILE));
-
-	return (active > inactive);
-}
-
 struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
 						      struct zone *zone)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a547177..38f6744 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5051,49 +5051,6 @@ void setup_per_zone_wmarks(void)
 }
 
 /*
- * The inactive anon list should be small enough that the VM never has to
- * do too much work, but large enough that each inactive page has a chance
- * to be referenced again before it is swapped out.
- *
- * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
- * INACTIVE_ANON pages on this zone's LRU, maintained by the
- * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
- * the anonymous pages are kept on the inactive list.
- *
- * total     target    max
- * memory    ratio     inactive anon
- * -------------------------------------
- *   10MB       1         5MB
- *  100MB       1        50MB
- *    1GB       3       250MB
- *   10GB      10       0.9GB
- *  100GB      31         3GB
- *    1TB     101        10GB
- *   10TB     320        32GB
- */
-static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
-{
-	unsigned int gb, ratio;
-
-	/* Zone size in gigabytes */
-	gb = zone->present_pages >> (30 - PAGE_SHIFT);
-	if (gb)
-		ratio = int_sqrt(10 * gb);
-	else
-		ratio = 1;
-
-	zone->inactive_ratio = ratio;
-}
-
-static void __meminit setup_per_zone_inactive_ratio(void)
-{
-	struct zone *zone;
-
-	for_each_zone(zone)
-		calculate_zone_inactive_ratio(zone);
-}
-
-/*
  * Initialise min_free_kbytes.
  *
  * For small machines we want it small (128k min).  For large machines
@@ -5131,7 +5088,6 @@ int __meminit init_per_zone_wmark_min(void)
 	setup_per_zone_wmarks();
 	refresh_zone_stat_thresholds();
 	setup_per_zone_lowmem_reserve();
-	setup_per_zone_inactive_ratio();
 	return 0;
 }
 module_init(init_per_zone_wmark_min)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 87e4d6a..39aa4d7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1779,29 +1779,38 @@ static void shrink_active_list(unsigned long nr_to_scan,
 }
 
 #ifdef CONFIG_SWAP
-static int inactive_anon_is_low_global(struct zone *zone)
-{
-	unsigned long active, inactive;
-
-	active = zone_page_state(zone, NR_ACTIVE_ANON);
-	inactive = zone_page_state(zone, NR_INACTIVE_ANON);
-
-	if (inactive * zone->inactive_ratio < active)
-		return 1;
-
-	return 0;
-}
-
 /**
  * inactive_anon_is_low - check if anonymous pages need to be deactivated
  * @zone: zone to check
- * @sc:   scan control of this context
  *
  * Returns true if the zone does not have enough inactive anon pages,
  * meaning some active anon pages need to be deactivated.
+ *
+ * The inactive anon list should be small enough that the VM never has to
+ * do too much work, but large enough that each inactive page has a chance
+ * to be referenced again before it is swapped out.
+ *
+ * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
+ * INACTIVE_ANON pages on this zone's LRU, maintained by the
+ * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
+ * the anonymous pages are kept on the inactive list.
+ *
+ * total     target    max
+ * memory    ratio     inactive anon
+ * -------------------------------------
+ *   10MB       1         5MB
+ *  100MB       1        50MB
+ *    1GB       3       250MB
+ *   10GB      10       0.9GB
+ *  100GB      31         3GB
+ *    1TB     101        10GB
+ *   10TB     320        32GB
  */
 static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 {
+	unsigned long active, inactive;
+	unsigned int gb, ratio;
+
 	/*
 	 * If we don't have swap space, anonymous page deactivation
 	 * is pointless.
@@ -1809,11 +1818,26 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 	if (!total_swap_pages)
 		return 0;
 
-	if (!scanning_global_lru(mz))
-		return mem_cgroup_inactive_anon_is_low(mz->mem_cgroup,
-						       mz->zone);
+	if (scanning_global_lru(mz)) {
+		active = zone_page_state(mz->zone, NR_ACTIVE_ANON);
+		inactive = zone_page_state(mz->zone, NR_INACTIVE_ANON);
+	} else {
+		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_ACTIVE_ANON));
+		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_INACTIVE_ANON));
+	}
+
+	/* Total size in gigabytes */
+	gb = (active + inactive) >> (30 - PAGE_SHIFT);
+	if (gb)
+		ratio = int_sqrt(10 * gb);
+	else
+		ratio = 1;
 
-	return inactive_anon_is_low_global(mz->zone);
+	return inactive * ratio < active;
 }
 #else
 static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
@@ -1822,16 +1846,6 @@ static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 }
 #endif
 
-static int inactive_file_is_low_global(struct zone *zone)
-{
-	unsigned long active, inactive;
-
-	active = zone_page_state(zone, NR_ACTIVE_FILE);
-	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
-
-	return (active > inactive);
-}
-
 /**
  * inactive_file_is_low - check if file pages need to be deactivated
  * @mz: memory cgroup and zone to check
@@ -1848,11 +1862,21 @@ static int inactive_file_is_low_global(struct zone *zone)
  */
 static int inactive_file_is_low(struct mem_cgroup_zone *mz)
 {
-	if (!scanning_global_lru(mz))
-		return mem_cgroup_inactive_file_is_low(mz->mem_cgroup,
-						       mz->zone);
+	unsigned long active, inactive;
+
+	if (scanning_global_lru(mz)) {
+		active = zone_page_state(mz->zone, NR_ACTIVE_FILE);
+		inactive = zone_page_state(mz->zone, NR_INACTIVE_FILE);
+	} else {
+		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_ACTIVE_FILE));
+		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
+				zone_to_nid(mz->zone), zone_idx(mz->zone),
+				BIT(LRU_INACTIVE_FILE));
+	}
 
-	return inactive_file_is_low_global(mz->zone);
+	return inactive < active;
 }
 
 static int inactive_list_is_low(struct mem_cgroup_zone *mz, int file)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f600557..2c813e1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1017,11 +1017,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 	}
 	seq_printf(m,
 		   "\n  all_unreclaimable: %u"
-		   "\n  start_pfn:         %lu"
-		   "\n  inactive_ratio:    %u",
+		   "\n  start_pfn:         %lu",
 		   zone->all_unreclaimable,
-		   zone->zone_start_pfn,
-		   zone->inactive_ratio);
+		   zone->zone_start_pfn);
 	seq_putc(m, '\n');
 }
 


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
  2012-02-23 13:51 ` [PATCH v3 01/21] memcg: unify inactive_ratio calculation Konstantin Khlebnikov
@ 2012-02-23 13:51 ` Konstantin Khlebnikov
  2012-02-23 18:03   ` Johannes Weiner
  2012-02-28  0:11   ` KAMEZAWA Hiroyuki
  2012-02-23 13:51 ` [PATCH v3 03/21] memcg: fix page_referencies cgroup filter on global reclaim Konstantin Khlebnikov
                   ` (21 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:51 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Check mm-owner cgroup membership hierarchically.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/memcontrol.h |   11 ++---------
 mm/memcontrol.c            |   20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8c4d74f..4822d53 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -87,15 +87,8 @@ extern struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm);
 extern struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg);
 extern struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont);
 
-static inline
-int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
-{
-	struct mem_cgroup *memcg;
-	rcu_read_lock();
-	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
-	rcu_read_unlock();
-	return cgroup == memcg;
-}
+extern int mm_match_cgroup(const struct mm_struct *mm,
+			   const struct mem_cgroup *cgroup);
 
 extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b8039d2..77f5d48 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -821,6 +821,26 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
 				struct mem_cgroup, css);
 }
 
+/**
+ * mm_match_cgroup - cgroup hierarchy mm membership test
+ * @mm		mm_struct to test
+ * @cgroup	target cgroup
+ *
+ * Returns true if mm belong this cgroup or any its child in hierarchy
+ */
+int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
+	while (memcg != cgroup && memcg && memcg->use_hierarchy)
+		memcg = parent_mem_cgroup(memcg);
+	rcu_read_unlock();
+
+	return cgroup == memcg;
+}
+
 struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
 {
 	struct mem_cgroup *memcg = NULL;


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 03/21] memcg: fix page_referencies cgroup filter on global reclaim
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
  2012-02-23 13:51 ` [PATCH v3 01/21] memcg: unify inactive_ratio calculation Konstantin Khlebnikov
  2012-02-23 13:51 ` [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical Konstantin Khlebnikov
@ 2012-02-23 13:51 ` Konstantin Khlebnikov
  2012-02-28  0:13   ` KAMEZAWA Hiroyuki
  2012-02-23 13:51 ` [PATCH v3 04/21] memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:51 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Global memory reclaimer shouldn't skip any page referencies.

This patch pass sc->target_mem_cgroup into page_referenced().
On global memory reclaim it always NULL, so we will account all.
Cgroup reclaimer will account only referencies from target cgroup and its childs.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 39aa4d7..d133ac6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -701,13 +701,13 @@ enum page_references {
 };
 
 static enum page_references page_check_references(struct page *page,
-						  struct mem_cgroup_zone *mz,
 						  struct scan_control *sc)
 {
 	int referenced_ptes, referenced_page;
 	unsigned long vm_flags;
 
-	referenced_ptes = page_referenced(page, 1, mz->mem_cgroup, &vm_flags);
+	referenced_ptes = page_referenced(page, 1,
+					  sc->target_mem_cgroup, &vm_flags);
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
@@ -828,7 +828,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-		references = page_check_references(page, mz, sc);
+		references = page_check_references(page, sc);
 		switch (references) {
 		case PAGEREF_ACTIVATE:
 			goto activate_locked;
@@ -1735,7 +1735,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
 			continue;
 		}
 
-		if (page_referenced(page, 0, mz->mem_cgroup, &vm_flags)) {
+		if (page_referenced(page, 0,
+				    sc->target_mem_cgroup, &vm_flags)) {
 			nr_rotated += hpage_nr_pages(page);
 			/*
 			 * Identify referenced, file-backed active pages and


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 04/21] memcg: use vm_swappiness from target memory cgroup
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (2 preceding siblings ...)
  2012-02-23 13:51 ` [PATCH v3 03/21] memcg: fix page_referencies cgroup filter on global reclaim Konstantin Khlebnikov
@ 2012-02-23 13:51 ` Konstantin Khlebnikov
  2012-02-28  0:15   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 05/21] mm: rename lruvec->lists into lruvec->pages_lru Konstantin Khlebnikov
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:51 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Use vm_swappiness from memory cgroup which is triggered this memory reclaim.
This is more reasonable and allows to kill one argument.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d133ac6..8b59cb5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1903,12 +1903,11 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 	return shrink_inactive_list(nr_to_scan, mz, sc, priority, file);
 }
 
-static int vmscan_swappiness(struct mem_cgroup_zone *mz,
-			     struct scan_control *sc)
+static int vmscan_swappiness(struct scan_control *sc)
 {
 	if (global_reclaim(sc))
 		return vm_swappiness;
-	return mem_cgroup_swappiness(mz->mem_cgroup);
+	return mem_cgroup_swappiness(sc->target_mem_cgroup);
 }
 
 /*
@@ -1976,8 +1975,8 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 * With swappiness at 100, anonymous and file have the same priority.
 	 * This scanning priority is essentially the inverse of IO cost.
 	 */
-	anon_prio = vmscan_swappiness(mz, sc);
-	file_prio = 200 - vmscan_swappiness(mz, sc);
+	anon_prio = vmscan_swappiness(sc);
+	file_prio = 200 - vmscan_swappiness(sc);
 
 	/*
 	 * OK, so we have swap space and a fair amount of page cache


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 05/21] mm: rename lruvec->lists into lruvec->pages_lru
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (3 preceding siblings ...)
  2012-02-23 13:51 ` [PATCH v3 04/21] memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:20   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 06/21] mm: lruvec linking functions Konstantin Khlebnikov
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

This is much more unique and grep-friendly name.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mm_inline.h |    2 +-
 include/linux/mmzone.h    |    2 +-
 mm/memcontrol.c           |    6 +++---
 mm/page_alloc.c           |    2 +-
 mm/swap.c                 |    4 ++--
 mm/vmscan.c               |    6 +++---
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 227fd3e..8415596 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -27,7 +27,7 @@ add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
 	struct lruvec *lruvec;
 
 	lruvec = mem_cgroup_lru_add_list(zone, page, lru);
-	list_add(&page->lru, &lruvec->lists[lru]);
+	list_add(&page->lru, &lruvec->pages_lru[lru]);
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, hpage_nr_pages(page));
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e1f7ff..ddd0fd2 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -160,7 +160,7 @@ static inline int is_unevictable_lru(enum lru_list lru)
 }
 
 struct lruvec {
-	struct list_head lists[NR_LRU_LISTS];
+	struct list_head pages_lru[NR_LRU_LISTS];
 };
 
 /* Mask used at gathering information at once (see memcontrol.c) */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 77f5d48..8f8c7c4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1050,7 +1050,7 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
  * the lruvec for the given @zone and the memcg @page is charged to.
  *
  * The callsite is then responsible for physically linking the page to
- * the returned lruvec->lists[@lru].
+ * the returned lruvec->pages_lru[@lru].
  */
 struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 				       enum lru_list lru)
@@ -3592,7 +3592,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 
 	zone = &NODE_DATA(node)->node_zones[zid];
 	mz = mem_cgroup_zoneinfo(memcg, node, zid);
-	list = &mz->lruvec.lists[lru];
+	list = &mz->lruvec.pages_lru[lru];
 
 	loop = mz->lru_size[lru];
 	/* give some margin against EBUSY etc...*/
@@ -4716,7 +4716,7 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
 		mz = &pn->zoneinfo[zone];
 		for_each_lru(lru)
-			INIT_LIST_HEAD(&mz->lruvec.lists[lru]);
+			INIT_LIST_HEAD(&mz->lruvec.pages_lru[lru]);
 		mz->usage_in_excess = 0;
 		mz->on_tree = false;
 		mz->memcg = memcg;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 38f6744..5f19392 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4363,7 +4363,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone_pcp_init(zone);
 		for_each_lru(lru)
-			INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
+			INIT_LIST_HEAD(&zone->lruvec.pages_lru[lru]);
 		zone->reclaim_stat.recent_rotated[0] = 0;
 		zone->reclaim_stat.recent_rotated[1] = 0;
 		zone->reclaim_stat.recent_scanned[0] = 0;
diff --git a/mm/swap.c b/mm/swap.c
index fff1ff7..17993c0 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -238,7 +238,7 @@ static void pagevec_move_tail_fn(struct page *page, void *arg)
 
 		lruvec = mem_cgroup_lru_move_lists(page_zone(page),
 						   page, lru, lru);
-		list_move_tail(&page->lru, &lruvec->lists[lru]);
+		list_move_tail(&page->lru, &lruvec->pages_lru[lru]);
 		(*pgmoved)++;
 	}
 }
@@ -482,7 +482,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 		 * We moves tha page into tail of inactive.
 		 */
 		lruvec = mem_cgroup_lru_move_lists(zone, page, lru, lru);
-		list_move_tail(&page->lru, &lruvec->lists[lru]);
+		list_move_tail(&page->lru, &lruvec->pages_lru[lru]);
 		__count_vm_event(PGROTATED);
 	}
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8b59cb5..e41ad52 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1164,7 +1164,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		lru += LRU_ACTIVE;
 	if (file)
 		lru += LRU_FILE;
-	src = &lruvec->lists[lru];
+	src = &lruvec->pages_lru[lru];
 
 	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
 		struct page *page;
@@ -1663,7 +1663,7 @@ static void move_active_pages_to_lru(struct zone *zone,
 		SetPageLRU(page);
 
 		lruvec = mem_cgroup_lru_add_list(zone, page, lru);
-		list_move(&page->lru, &lruvec->lists[lru]);
+		list_move(&page->lru, &lruvec->pages_lru[lru]);
 		pgmoved += hpage_nr_pages(page);
 
 		if (put_page_testzero(page)) {
@@ -3592,7 +3592,7 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
 			__dec_zone_state(zone, NR_UNEVICTABLE);
 			lruvec = mem_cgroup_lru_move_lists(zone, page,
 						LRU_UNEVICTABLE, lru);
-			list_move(&page->lru, &lruvec->lists[lru]);
+			list_move(&page->lru, &lruvec->pages_lru[lru]);
 			__inc_zone_state(zone, NR_INACTIVE_ANON + lru);
 			pgrescued++;
 		}


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 06/21] mm: lruvec linking functions
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (4 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 05/21] mm: rename lruvec->lists into lruvec->pages_lru Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:27   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 07/21] mm: add lruvec->pages_count Konstantin Khlebnikov
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

This patch adds links from page to its lruvec and from lruvec to its zone and node.
If CONFIG_CGROUP_MEM_RES_CTLR=n they just page_zone() and container_of().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mm.h     |   37 +++++++++++++++++++++++++++++++++++++
 include/linux/mmzone.h |   12 ++++++++----
 mm/internal.h          |    1 +
 mm/memcontrol.c        |   27 ++++++++++++++++++++++++---
 mm/page_alloc.c        |   17 ++++++++++++++---
 5 files changed, 84 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ee3ebc1..c6dc4ab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -728,6 +728,43 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
 #endif
 }
 
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+
+/* Multiple lruvecs in zone */
+
+extern struct lruvec *page_lruvec(struct page *page);
+
+static inline struct zone *lruvec_zone(struct lruvec *lruvec)
+{
+	return lruvec->zone;
+}
+
+static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
+{
+	return lruvec->node;
+}
+
+#else /* CONFIG_CGROUP_MEM_RES_CTLR */
+
+/* Single lruvec in zone */
+
+static inline struct lruvec *page_lruvec(struct page *page)
+{
+	return &page_zone(page)->lruvec;
+}
+
+static inline struct zone *lruvec_zone(struct lruvec *lruvec)
+{
+	return container_of(lruvec, struct zone, lruvec);
+}
+
+static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
+{
+	return lruvec_zone(lruvec)->zone_pgdat;
+}
+
+#endif /* CONFIG_CGROUP_MEM_RES_CTLR */
+
 /*
  * Some inline functions in vmstat.h depend on page_zone()
  */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ddd0fd2..be8873a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -159,10 +159,6 @@ static inline int is_unevictable_lru(enum lru_list lru)
 	return (lru == LRU_UNEVICTABLE);
 }
 
-struct lruvec {
-	struct list_head pages_lru[NR_LRU_LISTS];
-};
-
 /* Mask used at gathering information at once (see memcontrol.c) */
 #define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE))
 #define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON))
@@ -300,6 +296,14 @@ struct zone_reclaim_stat {
 	unsigned long		recent_scanned[2];
 };
 
+struct lruvec {
+	struct list_head	pages_lru[NR_LRU_LISTS];
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+	struct zone		*zone;
+	struct pglist_data	*node;
+#endif
+};
+
 struct zone {
 	/* Fields commonly accessed by the page allocator */
 
diff --git a/mm/internal.h b/mm/internal.h
index 2189af4..ef49dbf 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -100,6 +100,7 @@ extern void prep_compound_page(struct page *page, unsigned long order);
 extern bool is_free_buddy_page(struct page *page);
 #endif
 
+extern void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec);
 
 /*
  * function for dealing with page's order in buddy system.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8f8c7c4..8b53150 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1026,6 +1026,28 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
 	return &mz->lruvec;
 }
 
+/**
+ * page_lruvec - get the lruvec there this page is located
+ * @page: the struct page pointer with stable reference
+ *
+ * page_cgroup->mem_cgroup pointer validity guaranteed by caller.
+ *
+ * Returns pointer to struct lruvec.
+ */
+struct lruvec *page_lruvec(struct page *page)
+{
+	struct mem_cgroup_per_zone *mz;
+	struct page_cgroup *pc;
+
+	if (mem_cgroup_disabled())
+		return &page_zone(page)->lruvec;
+
+	pc = lookup_page_cgroup(page);
+	mz = mem_cgroup_zoneinfo(pc->mem_cgroup,
+			page_to_nid(page), page_zonenum(page));
+	return &mz->lruvec;
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -4697,7 +4719,6 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
 	struct mem_cgroup_per_zone *mz;
-	enum lru_list lru;
 	int zone, tmp = node;
 	/*
 	 * This routine is called against possible nodes.
@@ -4715,8 +4736,8 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
 		mz = &pn->zoneinfo[zone];
-		for_each_lru(lru)
-			INIT_LIST_HEAD(&mz->lruvec.pages_lru[lru]);
+		init_zone_lruvec(&NODE_DATA(node)->node_zones[zone],
+				 &mz->lruvec);
 		mz->usage_in_excess = 0;
 		mz->on_tree = false;
 		mz->memcg = memcg;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5f19392..1cc3afe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4289,6 +4289,19 @@ static inline int pageblock_default_order(unsigned int order)
 
 #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
 
+void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
+{
+	enum lru_list lru;
+
+	memset(lruvec, 0, sizeof(struct lruvec));
+	for_each_lru(lru)
+		INIT_LIST_HEAD(&lruvec->pages_lru[lru]);
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+	lruvec->node = zone->zone_pgdat;
+	lruvec->zone = zone;
+#endif
+}
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -4312,7 +4325,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize, memmap_pages;
-		enum lru_list lru;
 
 		size = zone_spanned_pages_in_node(nid, j, zones_size);
 		realsize = size - zone_absent_pages_in_node(nid, j,
@@ -4362,8 +4374,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		zone->zone_pgdat = pgdat;
 
 		zone_pcp_init(zone);
-		for_each_lru(lru)
-			INIT_LIST_HEAD(&zone->lruvec.pages_lru[lru]);
+		init_zone_lruvec(zone, &zone->lruvec);
 		zone->reclaim_stat.recent_rotated[0] = 0;
 		zone->reclaim_stat.recent_rotated[1] = 0;
 		zone->reclaim_stat.recent_scanned[0] = 0;


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 07/21] mm: add lruvec->pages_count
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (5 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 06/21] mm: lruvec linking functions Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:35   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 08/21] mm: unify inactive_list_is_low() Konstantin Khlebnikov
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Move lru pages counter from mem_cgroup_per_zone->count[] to lruvec->pages_count[]

Account pages in all lruvecs, incuding root,
this isn't a huge overhead, but it greatly simplifies all code.

Redundant page_lruvec() calls will be optimized in further patches.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/memcontrol.h |   29 --------------
 include/linux/mm_inline.h  |   15 +++++--
 include/linux/mmzone.h     |    1 
 mm/memcontrol.c            |   93 +-------------------------------------------
 mm/swap.c                  |    7 +--
 mm/vmscan.c                |   25 +++++++++---
 6 files changed, 34 insertions(+), 136 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4822d53..b9d555b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -63,12 +63,6 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 					gfp_t gfp_mask);
 
 struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
-struct lruvec *mem_cgroup_lru_add_list(struct zone *, struct page *,
-				       enum lru_list);
-void mem_cgroup_lru_del_list(struct page *, enum lru_list);
-void mem_cgroup_lru_del(struct page *);
-struct lruvec *mem_cgroup_lru_move_lists(struct zone *, struct page *,
-					 enum lru_list, enum lru_list);
 
 /* For coalescing uncharge for reducing memcg' overhead*/
 extern void mem_cgroup_uncharge_start(void);
@@ -212,29 +206,6 @@ static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
 	return &zone->lruvec;
 }
 
-static inline struct lruvec *mem_cgroup_lru_add_list(struct zone *zone,
-						     struct page *page,
-						     enum lru_list lru)
-{
-	return &zone->lruvec;
-}
-
-static inline void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
-{
-}
-
-static inline void mem_cgroup_lru_del(struct page *page)
-{
-}
-
-static inline struct lruvec *mem_cgroup_lru_move_lists(struct zone *zone,
-						       struct page *page,
-						       enum lru_list from,
-						       enum lru_list to)
-{
-	return &zone->lruvec;
-}
-
 static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 {
 	return NULL;
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 8415596..daa3d15 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -24,19 +24,24 @@ static inline int page_is_file_cache(struct page *page)
 static inline void
 add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
 {
-	struct lruvec *lruvec;
+	struct lruvec *lruvec = page_lruvec(page);
+	int numpages = hpage_nr_pages(page);
 
-	lruvec = mem_cgroup_lru_add_list(zone, page, lru);
 	list_add(&page->lru, &lruvec->pages_lru[lru]);
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, hpage_nr_pages(page));
+	lruvec->pages_count[lru] += numpages;
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, numpages);
 }
 
 static inline void
 del_page_from_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
 {
-	mem_cgroup_lru_del_list(page, lru);
+	struct lruvec *lruvec = page_lruvec(page);
+	int numpages = hpage_nr_pages(page);
+
 	list_del(&page->lru);
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -hpage_nr_pages(page));
+	lruvec->pages_count[lru] -= numpages;
+	VM_BUG_ON((long)lruvec->pages_count[lru] < 0);
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -numpages);
 }
 
 /**
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index be8873a..69b0f31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -298,6 +298,7 @@ struct zone_reclaim_stat {
 
 struct lruvec {
 	struct list_head	pages_lru[NR_LRU_LISTS];
+	unsigned long		pages_count[NR_LRU_LISTS];
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 	struct zone		*zone;
 	struct pglist_data	*node;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8b53150..80ce60c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -135,7 +135,6 @@ struct mem_cgroup_reclaim_iter {
  */
 struct mem_cgroup_per_zone {
 	struct lruvec		lruvec;
-	unsigned long		lru_size[NR_LRU_LISTS];
 
 	struct mem_cgroup_reclaim_iter reclaim_iter[DEF_PRIORITY + 1];
 
@@ -710,7 +709,7 @@ mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
 
 	for_each_lru(lru) {
 		if (BIT(lru) & lru_mask)
-			ret += mz->lru_size[lru];
+			ret += mz->lruvec.pages_count[lru];
 	}
 	return ret;
 }
@@ -1062,93 +1061,6 @@ struct lruvec *page_lruvec(struct page *page)
  * When moving account, the page is not on LRU. It's isolated.
  */
 
-/**
- * mem_cgroup_lru_add_list - account for adding an lru page and return lruvec
- * @zone: zone of the page
- * @page: the page
- * @lru: current lru
- *
- * This function accounts for @page being added to @lru, and returns
- * the lruvec for the given @zone and the memcg @page is charged to.
- *
- * The callsite is then responsible for physically linking the page to
- * the returned lruvec->pages_lru[@lru].
- */
-struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
-				       enum lru_list lru)
-{
-	struct mem_cgroup_per_zone *mz;
-	struct mem_cgroup *memcg;
-	struct page_cgroup *pc;
-
-	if (mem_cgroup_disabled())
-		return &zone->lruvec;
-
-	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
-	mz = page_cgroup_zoneinfo(memcg, page);
-	/* compound_order() is stabilized through lru_lock */
-	mz->lru_size[lru] += 1 << compound_order(page);
-	return &mz->lruvec;
-}
-
-/**
- * mem_cgroup_lru_del_list - account for removing an lru page
- * @page: the page
- * @lru: target lru
- *
- * This function accounts for @page being removed from @lru.
- *
- * The callsite is then responsible for physically unlinking
- * @page->lru.
- */
-void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
-{
-	struct mem_cgroup_per_zone *mz;
-	struct mem_cgroup *memcg;
-	struct page_cgroup *pc;
-
-	if (mem_cgroup_disabled())
-		return;
-
-	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
-	VM_BUG_ON(!memcg);
-	mz = page_cgroup_zoneinfo(memcg, page);
-	/* huge page split is done under lru_lock. so, we have no races. */
-	VM_BUG_ON(mz->lru_size[lru] < (1 << compound_order(page)));
-	mz->lru_size[lru] -= 1 << compound_order(page);
-}
-
-void mem_cgroup_lru_del(struct page *page)
-{
-	mem_cgroup_lru_del_list(page, page_lru(page));
-}
-
-/**
- * mem_cgroup_lru_move_lists - account for moving a page between lrus
- * @zone: zone of the page
- * @page: the page
- * @from: current lru
- * @to: target lru
- *
- * This function accounts for @page being moved between the lrus @from
- * and @to, and returns the lruvec for the given @zone and the memcg
- * @page is charged to.
- *
- * The callsite is then responsible for physically relinking
- * @page->lru to the returned lruvec->lists[@to].
- */
-struct lruvec *mem_cgroup_lru_move_lists(struct zone *zone,
-					 struct page *page,
-					 enum lru_list from,
-					 enum lru_list to)
-{
-	/* XXX: Optimize this, especially for @from == @to */
-	mem_cgroup_lru_del_list(page, from);
-	return mem_cgroup_lru_add_list(zone, page, to);
-}
-
 /*
  * Checks whether given mem is same or in the root_mem_cgroup's
  * hierarchy subtree
@@ -3615,8 +3527,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 	zone = &NODE_DATA(node)->node_zones[zid];
 	mz = mem_cgroup_zoneinfo(memcg, node, zid);
 	list = &mz->lruvec.pages_lru[lru];
-
-	loop = mz->lru_size[lru];
+	loop = mz->lruvec.pages_count[lru];
 	/* give some margin against EBUSY etc...*/
 	loop += 256;
 	busy = NULL;
diff --git a/mm/swap.c b/mm/swap.c
index 17993c0..2afe02c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -234,10 +234,8 @@ static void pagevec_move_tail_fn(struct page *page, void *arg)
 
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
 		enum lru_list lru = page_lru_base_type(page);
-		struct lruvec *lruvec;
+		struct lruvec *lruvec = page_lruvec(page);
 
-		lruvec = mem_cgroup_lru_move_lists(page_zone(page),
-						   page, lru, lru);
 		list_move_tail(&page->lru, &lruvec->pages_lru[lru]);
 		(*pgmoved)++;
 	}
@@ -476,12 +474,11 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 		 */
 		SetPageReclaim(page);
 	} else {
-		struct lruvec *lruvec;
+		struct lruvec *lruvec = page_lruvec(page);
 		/*
 		 * The page's writeback ends up during pagevec
 		 * We moves tha page into tail of inactive.
 		 */
-		lruvec = mem_cgroup_lru_move_lists(zone, page, lru, lru);
 		list_move_tail(&page->lru, &lruvec->pages_lru[lru]);
 		__count_vm_event(PGROTATED);
 	}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e41ad52..f3c0fbe 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1180,7 +1180,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 		switch (__isolate_lru_page(page, mode, file)) {
 		case 0:
-			mem_cgroup_lru_del(page);
 			list_move(&page->lru, dst);
 			nr_taken += hpage_nr_pages(page);
 			break;
@@ -1238,10 +1237,16 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
 				unsigned int isolated_pages;
+				struct lruvec *cursor_lruvec;
+				int cursor_lru = page_lru(cursor_page);
 
-				mem_cgroup_lru_del(cursor_page);
 				list_move(&cursor_page->lru, dst);
 				isolated_pages = hpage_nr_pages(cursor_page);
+				cursor_lruvec = page_lruvec(cursor_page);
+				cursor_lruvec->pages_count[cursor_lru] -=
+								isolated_pages;
+				VM_BUG_ON((long)cursor_lruvec->
+						pages_count[cursor_lru] < 0);
 				nr_taken += isolated_pages;
 				nr_lumpy_taken += isolated_pages;
 				if (PageDirty(cursor_page))
@@ -1273,6 +1278,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			nr_lumpy_failed++;
 	}
 
+	lruvec->pages_count[lru] -= nr_taken - nr_lumpy_taken;
+	VM_BUG_ON((long)lruvec->pages_count[lru] < 0);
+
 	*nr_scanned = scan;
 
 	trace_mm_vmscan_lru_isolate(sc->order,
@@ -1656,15 +1664,18 @@ static void move_active_pages_to_lru(struct zone *zone,
 
 	while (!list_empty(list)) {
 		struct lruvec *lruvec;
+		int numpages;
 
 		page = lru_to_page(list);
 
 		VM_BUG_ON(PageLRU(page));
 		SetPageLRU(page);
 
-		lruvec = mem_cgroup_lru_add_list(zone, page, lru);
+		lruvec = page_lruvec(page);
 		list_move(&page->lru, &lruvec->pages_lru[lru]);
-		pgmoved += hpage_nr_pages(page);
+		numpages = hpage_nr_pages(page);
+		lruvec->pages_count[lru] += numpages;
+		pgmoved += numpages;
 
 		if (put_page_testzero(page)) {
 			__ClearPageLRU(page);
@@ -3590,8 +3601,10 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
 			VM_BUG_ON(PageActive(page));
 			ClearPageUnevictable(page);
 			__dec_zone_state(zone, NR_UNEVICTABLE);
-			lruvec = mem_cgroup_lru_move_lists(zone, page,
-						LRU_UNEVICTABLE, lru);
+			lruvec = page_lruvec(page);
+			lruvec->pages_count[LRU_UNEVICTABLE]--;
+			VM_BUG_ON((long)lruvec->pages_count[LRU_UNEVICTABLE] < 0);
+			lruvec->pages_count[lru]++;
 			list_move(&page->lru, &lruvec->pages_lru[lru]);
 			__inc_zone_state(zone, NR_INACTIVE_ANON + lru);
 			pgrescued++;


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 08/21] mm: unify inactive_list_is_low()
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (6 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 07/21] mm: add lruvec->pages_count Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:36   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 09/21] mm: add lruvec->reclaim_stat Konstantin Khlebnikov
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Unify memcg and non-memcg logic, always use exact counters from struct lruvec.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |   30 ++++++++----------------------
 1 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f3c0fbe..b3e8bab 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1822,6 +1822,7 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 {
 	unsigned long active, inactive;
 	unsigned int gb, ratio;
+	struct lruvec *lruvec;
 
 	/*
 	 * If we don't have swap space, anonymous page deactivation
@@ -1830,17 +1831,9 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 	if (!total_swap_pages)
 		return 0;
 
-	if (scanning_global_lru(mz)) {
-		active = zone_page_state(mz->zone, NR_ACTIVE_ANON);
-		inactive = zone_page_state(mz->zone, NR_INACTIVE_ANON);
-	} else {
-		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
-				zone_to_nid(mz->zone), zone_idx(mz->zone),
-				BIT(LRU_ACTIVE_ANON));
-		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
-				zone_to_nid(mz->zone), zone_idx(mz->zone),
-				BIT(LRU_INACTIVE_ANON));
-	}
+	lruvec = mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup);
+	active = lruvec->pages_count[LRU_ACTIVE_ANON];
+	inactive = lruvec->pages_count[LRU_INACTIVE_ANON];
 
 	/* Total size in gigabytes */
 	gb = (active + inactive) >> (30 - PAGE_SHIFT);
@@ -1875,18 +1868,11 @@ static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 static int inactive_file_is_low(struct mem_cgroup_zone *mz)
 {
 	unsigned long active, inactive;
+	struct lruvec *lruvec;
 
-	if (scanning_global_lru(mz)) {
-		active = zone_page_state(mz->zone, NR_ACTIVE_FILE);
-		inactive = zone_page_state(mz->zone, NR_INACTIVE_FILE);
-	} else {
-		active = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
-				zone_to_nid(mz->zone), zone_idx(mz->zone),
-				BIT(LRU_ACTIVE_FILE));
-		inactive = mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
-				zone_to_nid(mz->zone), zone_idx(mz->zone),
-				BIT(LRU_INACTIVE_FILE));
-	}
+	lruvec = mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup);
+	active = lruvec->pages_count[LRU_ACTIVE_FILE];
+	inactive = lruvec->pages_count[LRU_INACTIVE_FILE];
 
 	return inactive < active;
 }


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 09/21] mm: add lruvec->reclaim_stat
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (7 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 08/21] mm: unify inactive_list_is_low() Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:38   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 10/21] mm: kill struct mem_cgroup_zone Konstantin Khlebnikov
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Merge memcg and non-memcg reclaim stat. We need to update only one.
Move zone->reclaimer_stat and mem_cgroup_per_zone->reclaimer_stat to struct lruvec.

struct lruvec will become operating unit for recalimer logic,
thus this is perfect place for these counters.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/memcontrol.h |   17 --------------
 include/linux/mmzone.h     |    5 +++-
 mm/memcontrol.c            |   52 +++++---------------------------------------
 mm/page_alloc.c            |    4 ---
 mm/swap.c                  |   12 ++--------
 mm/vmscan.c                |    5 +++-
 6 files changed, 14 insertions(+), 81 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b9d555b..c3e46b0 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -103,10 +103,6 @@ void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg,
 					int nid, int zid, unsigned int lrumask);
-struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
-						      struct zone *zone);
-struct zone_reclaim_stat*
-mem_cgroup_get_reclaim_stat_from_page(struct page *page);
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 					struct task_struct *p);
 extern void mem_cgroup_replace_page_cache(struct page *oldpage,
@@ -286,19 +282,6 @@ mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
 	return 0;
 }
 
-
-static inline struct zone_reclaim_stat*
-mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg, struct zone *zone)
-{
-	return NULL;
-}
-
-static inline struct zone_reclaim_stat*
-mem_cgroup_get_reclaim_stat_from_page(struct page *page)
-{
-	return NULL;
-}
-
 static inline void
 mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 {
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 69b0f31..82d5ff3 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -299,6 +299,9 @@ struct zone_reclaim_stat {
 struct lruvec {
 	struct list_head	pages_lru[NR_LRU_LISTS];
 	unsigned long		pages_count[NR_LRU_LISTS];
+
+	struct zone_reclaim_stat	reclaim_stat;
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 	struct zone		*zone;
 	struct pglist_data	*node;
@@ -379,8 +382,6 @@ struct zone {
 	spinlock_t		lru_lock;
 	struct lruvec		lruvec;
 
-	struct zone_reclaim_stat reclaim_stat;
-
 	unsigned long		pages_scanned;	   /* since last reclaim */
 	unsigned long		flags;		   /* zone flags, see below */
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 80ce60c..bef57db 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -138,7 +138,6 @@ struct mem_cgroup_per_zone {
 
 	struct mem_cgroup_reclaim_iter reclaim_iter[DEF_PRIORITY + 1];
 
-	struct zone_reclaim_stat reclaim_stat;
 	struct rb_node		tree_node;	/* RB tree node */
 	unsigned long long	usage_in_excess;/* Set to the value by which */
 						/* the soft limit is exceeded*/
@@ -441,15 +440,6 @@ struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg)
 	return &memcg->css;
 }
 
-static struct mem_cgroup_per_zone *
-page_cgroup_zoneinfo(struct mem_cgroup *memcg, struct page *page)
-{
-	int nid = page_to_nid(page);
-	int zid = page_zonenum(page);
-
-	return mem_cgroup_zoneinfo(memcg, nid, zid);
-}
-
 static struct mem_cgroup_tree_per_zone *
 soft_limit_tree_node_zone(int nid, int zid)
 {
@@ -1111,34 +1101,6 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *memcg)
 	return ret;
 }
 
-struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
-						      struct zone *zone)
-{
-	int nid = zone_to_nid(zone);
-	int zid = zone_idx(zone);
-	struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
-
-	return &mz->reclaim_stat;
-}
-
-struct zone_reclaim_stat *
-mem_cgroup_get_reclaim_stat_from_page(struct page *page)
-{
-	struct page_cgroup *pc;
-	struct mem_cgroup_per_zone *mz;
-
-	if (mem_cgroup_disabled())
-		return NULL;
-
-	pc = lookup_page_cgroup(page);
-	if (!PageCgroupUsed(pc))
-		return NULL;
-	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
-	smp_rmb();
-	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
-	return &mz->reclaim_stat;
-}
-
 #define mem_cgroup_from_res_counter(counter, member)	\
 	container_of(counter, struct mem_cgroup, member)
 
@@ -4073,21 +4035,19 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
 	{
 		int nid, zid;
 		struct mem_cgroup_per_zone *mz;
+		struct zone_reclaim_stat *rs;
 		unsigned long recent_rotated[2] = {0, 0};
 		unsigned long recent_scanned[2] = {0, 0};
 
 		for_each_online_node(nid)
 			for (zid = 0; zid < MAX_NR_ZONES; zid++) {
 				mz = mem_cgroup_zoneinfo(memcg, nid, zid);
+				rs = &mz->lruvec.reclaim_stat;
 
-				recent_rotated[0] +=
-					mz->reclaim_stat.recent_rotated[0];
-				recent_rotated[1] +=
-					mz->reclaim_stat.recent_rotated[1];
-				recent_scanned[0] +=
-					mz->reclaim_stat.recent_scanned[0];
-				recent_scanned[1] +=
-					mz->reclaim_stat.recent_scanned[1];
+				recent_rotated[0] += rs->recent_rotated[0];
+				recent_rotated[1] += rs->recent_rotated[1];
+				recent_scanned[0] += rs->recent_scanned[0];
+				recent_scanned[1] += rs->recent_scanned[1];
 			}
 		cb->fill(cb, "recent_rotated_anon", recent_rotated[0]);
 		cb->fill(cb, "recent_rotated_file", recent_rotated[1]);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1cc3afe..ab42446 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4375,10 +4375,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone_pcp_init(zone);
 		init_zone_lruvec(zone, &zone->lruvec);
-		zone->reclaim_stat.recent_rotated[0] = 0;
-		zone->reclaim_stat.recent_rotated[1] = 0;
-		zone->reclaim_stat.recent_scanned[0] = 0;
-		zone->reclaim_stat.recent_scanned[1] = 0;
 		zap_zone_vm_stats(zone);
 		zone->flags = 0;
 		if (!size)
diff --git a/mm/swap.c b/mm/swap.c
index 2afe02c..70d1542 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -277,21 +277,13 @@ void rotate_reclaimable_page(struct page *page)
 static void update_page_reclaim_stat(struct zone *zone, struct page *page,
 				     int file, int rotated)
 {
-	struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
-	struct zone_reclaim_stat *memcg_reclaim_stat;
+	struct zone_reclaim_stat *reclaim_stat;
 
-	memcg_reclaim_stat = mem_cgroup_get_reclaim_stat_from_page(page);
+	reclaim_stat = &page_lruvec(page)->reclaim_stat;
 
 	reclaim_stat->recent_scanned[file]++;
 	if (rotated)
 		reclaim_stat->recent_rotated[file]++;
-
-	if (!memcg_reclaim_stat)
-		return;
-
-	memcg_reclaim_stat->recent_scanned[file]++;
-	if (rotated)
-		memcg_reclaim_stat->recent_rotated[file]++;
 }
 
 static void __activate_page(struct page *page, void *arg)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b3e8bab..98bd61f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -184,9 +184,10 @@ static bool scanning_global_lru(struct mem_cgroup_zone *mz)
 static struct zone_reclaim_stat *get_reclaim_stat(struct mem_cgroup_zone *mz)
 {
 	if (!scanning_global_lru(mz))
-		return mem_cgroup_get_reclaim_stat(mz->mem_cgroup, mz->zone);
+		return &mem_cgroup_zone_lruvec(mz->zone,
+				mz->mem_cgroup)->reclaim_stat;
 
-	return &mz->zone->reclaim_stat;
+	return &mz->zone->lruvec.reclaim_stat;
 }
 
 static unsigned long zone_nr_lru_pages(struct mem_cgroup_zone *mz,


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 10/21] mm: kill struct mem_cgroup_zone
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (8 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 09/21] mm: add lruvec->reclaim_stat Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:41   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 11/21] mm: move page-to-lruvec translation upper Konstantin Khlebnikov
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

struct mem_cgroup_zone always points to one lruvec, either root zone->lruvec or
to some from memcg. So this fancy pointer can be replaced with direct pointer to
struct lruvec, because all required infromation already collected on lruvec.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |  186 ++++++++++++++++++++++-------------------------------------
 1 files changed, 70 insertions(+), 116 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 98bd61f..f5e7046 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -115,11 +115,6 @@ struct scan_control {
 	nodemask_t	*nodemask;
 };
 
-struct mem_cgroup_zone {
-	struct mem_cgroup *mem_cgroup;
-	struct zone *zone;
-};
-
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
 
 #ifdef ARCH_HAS_PREFETCH
@@ -164,45 +159,13 @@ static bool global_reclaim(struct scan_control *sc)
 {
 	return !sc->target_mem_cgroup;
 }
-
-static bool scanning_global_lru(struct mem_cgroup_zone *mz)
-{
-	return !mz->mem_cgroup;
-}
 #else
 static bool global_reclaim(struct scan_control *sc)
 {
 	return true;
 }
-
-static bool scanning_global_lru(struct mem_cgroup_zone *mz)
-{
-	return true;
-}
 #endif
 
-static struct zone_reclaim_stat *get_reclaim_stat(struct mem_cgroup_zone *mz)
-{
-	if (!scanning_global_lru(mz))
-		return &mem_cgroup_zone_lruvec(mz->zone,
-				mz->mem_cgroup)->reclaim_stat;
-
-	return &mz->zone->lruvec.reclaim_stat;
-}
-
-static unsigned long zone_nr_lru_pages(struct mem_cgroup_zone *mz,
-				       enum lru_list lru)
-{
-	if (!scanning_global_lru(mz))
-		return mem_cgroup_zone_nr_lru_pages(mz->mem_cgroup,
-						    zone_to_nid(mz->zone),
-						    zone_idx(mz->zone),
-						    BIT(lru));
-
-	return zone_page_state(mz->zone, NR_LRU_BASE + lru);
-}
-
-
 /*
  * Add a shrinker callback to be called from the vm
  */
@@ -764,7 +727,7 @@ static enum page_references page_check_references(struct page *page,
  * shrink_page_list() returns the number of reclaimed pages
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
-				      struct mem_cgroup_zone *mz,
+				      struct lruvec *lruvec,
 				      struct scan_control *sc,
 				      int priority,
 				      unsigned long *ret_nr_dirty,
@@ -795,7 +758,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto keep;
 
 		VM_BUG_ON(PageActive(page));
-		VM_BUG_ON(page_zone(page) != mz->zone);
+		VM_BUG_ON(page_zone(page) != lruvec_zone(lruvec));
 
 		sc->nr_scanned++;
 
@@ -1021,7 +984,7 @@ keep_lumpy:
 	 * will encounter the same problem
 	 */
 	if (nr_dirty && nr_dirty == nr_congested && global_reclaim(sc))
-		zone_set_flag(mz->zone, ZONE_CONGESTED);
+		zone_set_flag(lruvec_zone(lruvec), ZONE_CONGESTED);
 
 	free_hot_cold_page_list(&free_pages, 1);
 
@@ -1136,7 +1099,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
  * Appropriate locks must be held before calling this function.
  *
  * @nr_to_scan:	The number of pages to look through on the list.
- * @mz:		The mem_cgroup_zone to pull pages from.
+ * @lruvec	The struct lruvec to pull pages from.
  * @dst:	The temp list to put pages on to.
  * @nr_scanned:	The number of pages that were scanned.
  * @sc:		The scan_control struct for this reclaim session
@@ -1147,11 +1110,10 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
  * returns how many pages were moved onto *@dst.
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
-		struct mem_cgroup_zone *mz, struct list_head *dst,
+		struct lruvec *lruvec, struct list_head *dst,
 		unsigned long *nr_scanned, struct scan_control *sc,
 		isolate_mode_t mode, int active, int file)
 {
-	struct lruvec *lruvec;
 	struct list_head *src;
 	unsigned long nr_taken = 0;
 	unsigned long nr_lumpy_taken = 0;
@@ -1160,7 +1122,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	unsigned long scan;
 	int lru = LRU_BASE;
 
-	lruvec = mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup);
 	if (active)
 		lru += LRU_ACTIVE;
 	if (file)
@@ -1366,11 +1327,11 @@ static int too_many_isolated(struct zone *zone, int file,
 }
 
 static noinline_for_stack void
-putback_inactive_pages(struct mem_cgroup_zone *mz,
+putback_inactive_pages(struct lruvec *lruvec,
 		       struct list_head *page_list)
 {
-	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
-	struct zone *zone = mz->zone;
+	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
+	struct zone *zone = lruvec_zone(lruvec);
 	LIST_HEAD(pages_to_free);
 
 	/*
@@ -1417,13 +1378,13 @@ putback_inactive_pages(struct mem_cgroup_zone *mz,
 }
 
 static noinline_for_stack void
-update_isolated_counts(struct mem_cgroup_zone *mz,
+update_isolated_counts(struct lruvec *lruvec,
 		       struct list_head *page_list,
 		       unsigned long *nr_anon,
 		       unsigned long *nr_file)
 {
-	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
-	struct zone *zone = mz->zone;
+	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
+	struct zone *zone = lruvec_zone(lruvec);
 	unsigned int count[NR_LRU_LISTS] = { 0, };
 	unsigned long nr_active = 0;
 	struct page *page;
@@ -1507,7 +1468,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
  * of reclaimed pages
  */
 static noinline_for_stack unsigned long
-shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
+shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		     struct scan_control *sc, int priority, int file)
 {
 	LIST_HEAD(page_list);
@@ -1519,7 +1480,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	unsigned long nr_dirty = 0;
 	unsigned long nr_writeback = 0;
 	isolate_mode_t isolate_mode = ISOLATE_INACTIVE;
-	struct zone *zone = mz->zone;
+	struct zone *zone = lruvec_zone(lruvec);
 
 	while (unlikely(too_many_isolated(zone, file, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1542,8 +1503,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 
 	spin_lock_irq(&zone->lru_lock);
 
-	nr_taken = isolate_lru_pages(nr_to_scan, mz, &page_list, &nr_scanned,
-				     sc, isolate_mode, 0, file);
+	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list,
+				     &nr_scanned, sc, isolate_mode, 0, file);
+
 	if (global_reclaim(sc)) {
 		zone->pages_scanned += nr_scanned;
 		if (current_is_kswapd())
@@ -1559,20 +1521,20 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 		return 0;
 	}
 
-	update_isolated_counts(mz, &page_list, &nr_anon, &nr_file);
+	update_isolated_counts(lruvec, &page_list, &nr_anon, &nr_file);
 
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON, nr_anon);
 	__mod_zone_page_state(zone, NR_ISOLATED_FILE, nr_file);
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, mz, sc, priority,
+	nr_reclaimed = shrink_page_list(&page_list, lruvec, sc, priority,
 						&nr_dirty, &nr_writeback);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, mz, sc,
+		nr_reclaimed += shrink_page_list(&page_list, lruvec, sc,
 					priority, &nr_dirty, &nr_writeback);
 	}
 
@@ -1582,7 +1544,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
 	__count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);
 
-	putback_inactive_pages(mz, &page_list);
+	putback_inactive_pages(lruvec, &page_list);
 
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
 	__mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
@@ -1697,7 +1659,7 @@ static void move_active_pages_to_lru(struct zone *zone,
 }
 
 static void shrink_active_list(unsigned long nr_to_scan,
-			       struct mem_cgroup_zone *mz,
+			       struct lruvec *lruvec,
 			       struct scan_control *sc,
 			       int priority, int file)
 {
@@ -1708,10 +1670,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
+	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 	unsigned long nr_rotated = 0;
 	isolate_mode_t isolate_mode = ISOLATE_ACTIVE;
-	struct zone *zone = mz->zone;
+	struct zone *zone = lruvec_zone(lruvec);
 
 	lru_add_drain();
 
@@ -1722,8 +1684,9 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	spin_lock_irq(&zone->lru_lock);
 
-	nr_taken = isolate_lru_pages(nr_to_scan, mz, &l_hold, &nr_scanned, sc,
-				     isolate_mode, 1, file);
+	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned,
+				     sc, isolate_mode, 1, file);
+
 	if (global_reclaim(sc))
 		zone->pages_scanned += nr_scanned;
 
@@ -1819,11 +1782,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
  *    1TB     101        10GB
  *   10TB     320        32GB
  */
-static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
+static int inactive_anon_is_low(struct lruvec *lruvec)
 {
 	unsigned long active, inactive;
 	unsigned int gb, ratio;
-	struct lruvec *lruvec;
 
 	/*
 	 * If we don't have swap space, anonymous page deactivation
@@ -1832,7 +1794,6 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 	if (!total_swap_pages)
 		return 0;
 
-	lruvec = mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup);
 	active = lruvec->pages_count[LRU_ACTIVE_ANON];
 	inactive = lruvec->pages_count[LRU_INACTIVE_ANON];
 
@@ -1846,7 +1807,7 @@ static int inactive_anon_is_low(struct mem_cgroup_zone *mz)
 	return inactive * ratio < active;
 }
 #else
-static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
+static inline int inactive_anon_is_low(struct lruvec *lruvec)
 {
 	return 0;
 }
@@ -1866,39 +1827,38 @@ static inline int inactive_anon_is_low(struct mem_cgroup_zone *mz)
  * This uses a different ratio than the anonymous pages, because
  * the page cache uses a use-once replacement algorithm.
  */
-static int inactive_file_is_low(struct mem_cgroup_zone *mz)
+static int inactive_file_is_low(struct lruvec *lruvec)
 {
 	unsigned long active, inactive;
-	struct lruvec *lruvec;
 
-	lruvec = mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup);
 	active = lruvec->pages_count[LRU_ACTIVE_FILE];
 	inactive = lruvec->pages_count[LRU_INACTIVE_FILE];
 
 	return inactive < active;
 }
 
-static int inactive_list_is_low(struct mem_cgroup_zone *mz, int file)
+static int inactive_list_is_low(struct lruvec *lruvec, int file)
 {
 	if (file)
-		return inactive_file_is_low(mz);
+		return inactive_file_is_low(lruvec);
 	else
-		return inactive_anon_is_low(mz);
+		return inactive_anon_is_low(lruvec);
 }
 
 static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
-				 struct mem_cgroup_zone *mz,
+				 struct lruvec *lruvec,
 				 struct scan_control *sc, int priority)
 {
 	int file = is_file_lru(lru);
 
 	if (is_active_lru(lru)) {
-		if (inactive_list_is_low(mz, file))
-			shrink_active_list(nr_to_scan, mz, sc, priority, file);
+		if (inactive_list_is_low(lruvec, file))
+			shrink_active_list(nr_to_scan, lruvec,
+					   sc, priority, file);
 		return 0;
 	}
 
-	return shrink_inactive_list(nr_to_scan, mz, sc, priority, file);
+	return shrink_inactive_list(nr_to_scan, lruvec, sc, priority, file);
 }
 
 static int vmscan_swappiness(struct scan_control *sc)
@@ -1916,17 +1876,18 @@ static int vmscan_swappiness(struct scan_control *sc)
  *
  * nr[0] = anon pages to scan; nr[1] = file pages to scan
  */
-static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
+static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 			   unsigned long *nr, int priority)
 {
 	unsigned long anon, file, free;
 	unsigned long anon_prio, file_prio;
 	unsigned long ap, fp;
-	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
+	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 	u64 fraction[2], denominator;
 	enum lru_list lru;
 	int noswap = 0;
 	bool force_scan = false;
+	struct zone *zone = lruvec_zone(lruvec);
 
 	/*
 	 * If the zone or memcg is small, nr[l] can be 0.  This
@@ -1938,7 +1899,7 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 * latencies, so it's better to scan a minimum amount there as
 	 * well.
 	 */
-	if (current_is_kswapd() && mz->zone->all_unreclaimable)
+	if (current_is_kswapd() && zone->all_unreclaimable)
 		force_scan = true;
 	if (!global_reclaim(sc))
 		force_scan = true;
@@ -1952,16 +1913,16 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 		goto out;
 	}
 
-	anon  = zone_nr_lru_pages(mz, LRU_ACTIVE_ANON) +
-		zone_nr_lru_pages(mz, LRU_INACTIVE_ANON);
-	file  = zone_nr_lru_pages(mz, LRU_ACTIVE_FILE) +
-		zone_nr_lru_pages(mz, LRU_INACTIVE_FILE);
+	anon  = lruvec->pages_count[LRU_ACTIVE_ANON] +
+		lruvec->pages_count[LRU_INACTIVE_ANON];
+	file  = lruvec->pages_count[LRU_ACTIVE_FILE] +
+		lruvec->pages_count[LRU_INACTIVE_FILE];
 
 	if (global_reclaim(sc)) {
-		free  = zone_page_state(mz->zone, NR_FREE_PAGES);
+		free  = zone_page_state(zone, NR_FREE_PAGES);
 		/* If we have very few page cache pages,
 		   force-scan anon pages. */
-		if (unlikely(file + free <= high_wmark_pages(mz->zone))) {
+		if (unlikely(file + free <= high_wmark_pages(zone))) {
 			fraction[0] = 1;
 			fraction[1] = 0;
 			denominator = 1;
@@ -1987,7 +1948,7 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 *
 	 * anon in [0], file in [1]
 	 */
-	spin_lock_irq(&mz->zone->lru_lock);
+	spin_lock_irq(&zone->lru_lock);
 	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
 		reclaim_stat->recent_scanned[0] /= 2;
 		reclaim_stat->recent_rotated[0] /= 2;
@@ -2008,7 +1969,7 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 
 	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
-	spin_unlock_irq(&mz->zone->lru_lock);
+	spin_unlock_irq(&zone->lru_lock);
 
 	fraction[0] = ap;
 	fraction[1] = fp;
@@ -2018,7 +1979,7 @@ out:
 		int file = is_file_lru(lru);
 		unsigned long scan;
 
-		scan = zone_nr_lru_pages(mz, lru);
+		scan = lruvec->pages_count[lru];
 		if (priority || noswap) {
 			scan >>= priority;
 			if (!scan && force_scan)
@@ -2036,7 +1997,7 @@ out:
  * back to the allocator and call try_to_compact_zone(), we ensure that
  * there are enough free pages for it to be likely successful
  */
-static inline bool should_continue_reclaim(struct mem_cgroup_zone *mz,
+static inline bool should_continue_reclaim(struct lruvec *lruvec,
 					unsigned long nr_reclaimed,
 					unsigned long nr_scanned,
 					struct scan_control *sc)
@@ -2076,15 +2037,15 @@ static inline bool should_continue_reclaim(struct mem_cgroup_zone *mz,
 	 * inactive lists are large enough, continue reclaiming
 	 */
 	pages_for_compaction = (2UL << sc->order);
-	inactive_lru_pages = zone_nr_lru_pages(mz, LRU_INACTIVE_FILE);
+	inactive_lru_pages = lruvec->pages_count[LRU_INACTIVE_FILE];
 	if (nr_swap_pages > 0)
-		inactive_lru_pages += zone_nr_lru_pages(mz, LRU_INACTIVE_ANON);
+		inactive_lru_pages += lruvec->pages_count[LRU_INACTIVE_ANON];
 	if (sc->nr_reclaimed < pages_for_compaction &&
 			inactive_lru_pages > pages_for_compaction)
 		return true;
 
 	/* If compaction would go ahead or the allocation would succeed, stop */
-	switch (compaction_suitable(mz->zone, sc->order)) {
+	switch (compaction_suitable(lruvec_zone(lruvec), sc->order)) {
 	case COMPACT_PARTIAL:
 	case COMPACT_CONTINUE:
 		return false;
@@ -2096,8 +2057,8 @@ static inline bool should_continue_reclaim(struct mem_cgroup_zone *mz,
 /*
  * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
  */
-static void shrink_mem_cgroup_zone(int priority, struct mem_cgroup_zone *mz,
-				   struct scan_control *sc)
+static void shrink_lruvec(int priority, struct lruvec *lruvec,
+			struct scan_control *sc)
 {
 	unsigned long nr[NR_LRU_LISTS];
 	unsigned long nr_to_scan;
@@ -2109,7 +2070,7 @@ static void shrink_mem_cgroup_zone(int priority, struct mem_cgroup_zone *mz,
 restart:
 	nr_reclaimed = 0;
 	nr_scanned = sc->nr_scanned;
-	get_scan_count(mz, sc, nr, priority);
+	get_scan_count(lruvec, sc, nr, priority);
 
 	blk_start_plug(&plug);
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
@@ -2121,7 +2082,7 @@ restart:
 				nr[lru] -= nr_to_scan;
 
 				nr_reclaimed += shrink_list(lru, nr_to_scan,
-							    mz, sc, priority);
+							    lruvec, sc, priority);
 			}
 		}
 		/*
@@ -2147,11 +2108,11 @@ restart:
 	 * Even if we did not try to evict anon pages at all, we want to
 	 * rebalance the anon lru active/inactive ratio.
 	 */
-	if (inactive_anon_is_low(mz))
-		shrink_active_list(SWAP_CLUSTER_MAX, mz, sc, priority, 0);
+	if (inactive_anon_is_low(lruvec))
+		shrink_active_list(SWAP_CLUSTER_MAX, lruvec, sc, priority, 0);
 
 	/* reclaim/compaction might need reclaim to continue */
-	if (should_continue_reclaim(mz, nr_reclaimed,
+	if (should_continue_reclaim(lruvec, nr_reclaimed,
 					sc->nr_scanned - nr_scanned, sc))
 		goto restart;
 
@@ -2167,15 +2128,14 @@ static void shrink_zone(int priority, struct zone *zone,
 		.priority = priority,
 	};
 	struct mem_cgroup *memcg;
+	struct lruvec *lruvec;
 
 	memcg = mem_cgroup_iter(root, NULL, &reclaim);
 	do {
-		struct mem_cgroup_zone mz = {
-			.mem_cgroup = memcg,
-			.zone = zone,
-		};
+		lruvec = mem_cgroup_zone_lruvec(zone, memcg);
+
+		shrink_lruvec(priority, lruvec, sc);
 
-		shrink_mem_cgroup_zone(priority, &mz, sc);
 		/*
 		 * Limit reclaim has historically picked one memcg and
 		 * scanned it with decreasing priority levels until
@@ -2495,10 +2455,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
 		.order = 0,
 		.target_mem_cgroup = memcg,
 	};
-	struct mem_cgroup_zone mz = {
-		.mem_cgroup = memcg,
-		.zone = zone,
-	};
+	struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -2514,7 +2471,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
 	 * will pick up pages from other mem cgroup's as well. We hack
 	 * the priority and make it zero.
 	 */
-	shrink_mem_cgroup_zone(0, &mz, &sc);
+	shrink_lruvec(0, lruvec, &sc);
 
 	trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
 
@@ -2575,13 +2532,10 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc,
 
 	memcg = mem_cgroup_iter(NULL, NULL, NULL);
 	do {
-		struct mem_cgroup_zone mz = {
-			.mem_cgroup = memcg,
-			.zone = zone,
-		};
+		struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 
-		if (inactive_anon_is_low(&mz))
-			shrink_active_list(SWAP_CLUSTER_MAX, &mz,
+		if (inactive_anon_is_low(lruvec))
+			shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
 					   sc, priority, 0);
 
 		memcg = mem_cgroup_iter(NULL, memcg, NULL);


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 11/21] mm: move page-to-lruvec translation upper
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (9 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 10/21] mm: kill struct mem_cgroup_zone Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:42   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 12/21] mm: push lruvec into update_page_reclaim_stat() Konstantin Khlebnikov
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

move page_lruvec() out of add_page_to_lru_list() and del_page_from_lru_list()
switch its first argument from zone to lruvec.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mm_inline.h |   10 ++++------
 mm/compaction.c           |    4 +++-
 mm/memcontrol.c           |    7 +++++--
 mm/swap.c                 |   33 +++++++++++++++++++++------------
 mm/vmscan.c               |   14 +++++++++-----
 5 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index daa3d15..143a2e8 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -22,26 +22,24 @@ static inline int page_is_file_cache(struct page *page)
 }
 
 static inline void
-add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
+add_page_to_lru_list(struct lruvec *lruvec, struct page *page, enum lru_list lru)
 {
-	struct lruvec *lruvec = page_lruvec(page);
 	int numpages = hpage_nr_pages(page);
 
 	list_add(&page->lru, &lruvec->pages_lru[lru]);
 	lruvec->pages_count[lru] += numpages;
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, numpages);
+	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, numpages);
 }
 
 static inline void
-del_page_from_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
+del_page_from_lru_list(struct lruvec *lruvec, struct page *page, enum lru_list lru)
 {
-	struct lruvec *lruvec = page_lruvec(page);
 	int numpages = hpage_nr_pages(page);
 
 	list_del(&page->lru);
 	lruvec->pages_count[lru] -= numpages;
 	VM_BUG_ON((long)lruvec->pages_count[lru] < 0);
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -numpages);
+	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -numpages);
 }
 
 /**
diff --git a/mm/compaction.c b/mm/compaction.c
index 74a8c82..a976b28 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -262,6 +262,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
+	struct lruvec *lruvec;
 
 	/* Do not scan outside zone boundaries */
 	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
@@ -381,7 +382,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		VM_BUG_ON(PageTransCompound(page));
 
 		/* Successfully isolated */
-		del_page_from_lru_list(zone, page, page_lru(page));
+		lruvec = page_lruvec(page);
+		del_page_from_lru_list(lruvec, page, page_lru(page));
 		list_add(&page->lru, migratelist);
 		cc->nr_migratepages++;
 		nr_isolated++;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bef57db..83fa99b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2543,6 +2543,7 @@ __mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
 {
 	struct page_cgroup *pc = lookup_page_cgroup(page);
 	struct zone *zone = page_zone(page);
+	struct lruvec *lruvec;
 	unsigned long flags;
 	bool removed = false;
 
@@ -2553,13 +2554,15 @@ __mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
 	 */
 	spin_lock_irqsave(&zone->lru_lock, flags);
 	if (PageLRU(page)) {
-		del_page_from_lru_list(zone, page, page_lru(page));
+		lruvec = page_lruvec(page);
+		del_page_from_lru_list(lruvec, page, page_lru(page));
 		ClearPageLRU(page);
 		removed = true;
 	}
 	__mem_cgroup_commit_charge(memcg, page, 1, pc, ctype);
 	if (removed) {
-		add_page_to_lru_list(zone, page, page_lru(page));
+		lruvec = page_lruvec(page);
+		add_page_to_lru_list(lruvec, page, page_lru(page));
 		SetPageLRU(page);
 	}
 	spin_unlock_irqrestore(&zone->lru_lock, flags);
diff --git a/mm/swap.c b/mm/swap.c
index 70d1542..0cbc558 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -49,11 +49,13 @@ static void __page_cache_release(struct page *page)
 	if (PageLRU(page)) {
 		unsigned long flags;
 		struct zone *zone = page_zone(page);
+		struct lruvec *lruvec;
 
 		spin_lock_irqsave(&zone->lru_lock, flags);
 		VM_BUG_ON(!PageLRU(page));
 		__ClearPageLRU(page);
-		del_page_from_lru_list(zone, page, page_off_lru(page));
+		lruvec = page_lruvec(page);
+		del_page_from_lru_list(lruvec, page, page_off_lru(page));
 		spin_unlock_irqrestore(&zone->lru_lock, flags);
 	}
 }
@@ -293,11 +295,13 @@ static void __activate_page(struct page *page, void *arg)
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
 		int file = page_is_file_cache(page);
 		int lru = page_lru_base_type(page);
-		del_page_from_lru_list(zone, page, lru);
+		struct lruvec *lruvec = page_lruvec(page);
+
+		del_page_from_lru_list(lruvec, page, lru);
 
 		SetPageActive(page);
 		lru += LRU_ACTIVE;
-		add_page_to_lru_list(zone, page, lru);
+		add_page_to_lru_list(lruvec, page, lru);
 		__count_vm_event(PGACTIVATE);
 
 		update_page_reclaim_stat(zone, page, file, 1);
@@ -404,11 +408,13 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
 void add_page_to_unevictable_list(struct page *page)
 {
 	struct zone *zone = page_zone(page);
+	struct lruvec *lruvec;
 
 	spin_lock_irq(&zone->lru_lock);
 	SetPageUnevictable(page);
 	SetPageLRU(page);
-	add_page_to_lru_list(zone, page, LRU_UNEVICTABLE);
+	lruvec = page_lruvec(page);
+	add_page_to_lru_list(lruvec, page, LRU_UNEVICTABLE);
 	spin_unlock_irq(&zone->lru_lock);
 }
 
@@ -438,6 +444,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 	int lru, file;
 	bool active;
 	struct zone *zone = page_zone(page);
+	struct lruvec *lruvec;
 
 	if (!PageLRU(page))
 		return;
@@ -453,10 +460,11 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 
 	file = page_is_file_cache(page);
 	lru = page_lru_base_type(page);
-	del_page_from_lru_list(zone, page, lru + active);
+	lruvec = page_lruvec(page);
+	del_page_from_lru_list(lruvec, page, lru + active);
 	ClearPageActive(page);
 	ClearPageReferenced(page);
-	add_page_to_lru_list(zone, page, lru);
+	add_page_to_lru_list(lruvec, page, lru);
 
 	if (PageWriteback(page) || PageDirty(page)) {
 		/*
@@ -466,7 +474,6 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 		 */
 		SetPageReclaim(page);
 	} else {
-		struct lruvec *lruvec = page_lruvec(page);
 		/*
 		 * The page's writeback ends up during pagevec
 		 * We moves tha page into tail of inactive.
@@ -596,6 +603,7 @@ void release_pages(struct page **pages, int nr, int cold)
 
 		if (PageLRU(page)) {
 			struct zone *pagezone = page_zone(page);
+			struct lruvec *lruvec = page_lruvec(page);
 
 			if (pagezone != zone) {
 				if (zone)
@@ -606,7 +614,7 @@ void release_pages(struct page **pages, int nr, int cold)
 			}
 			VM_BUG_ON(!PageLRU(page));
 			__ClearPageLRU(page);
-			del_page_from_lru_list(zone, page, page_off_lru(page));
+			del_page_from_lru_list(lruvec, page, page_off_lru(page));
 		}
 
 		list_add(&page->lru, &pages_to_free);
@@ -644,6 +652,7 @@ void lru_add_page_tail(struct zone* zone,
 	int active;
 	enum lru_list lru;
 	const int file = 0;
+	struct lruvec *lruvec = page_lruvec(page);
 
 	VM_BUG_ON(!PageHead(page));
 	VM_BUG_ON(PageCompound(page_tail));
@@ -678,7 +687,7 @@ void lru_add_page_tail(struct zone* zone,
 		 * Use the standard add function to put page_tail on the list,
 		 * but then correct its position so they all end up in order.
 		 */
-		add_page_to_lru_list(zone, page_tail, lru);
+		add_page_to_lru_list(lruvec, page_tail, lru);
 		list_head = page_tail->lru.prev;
 		list_move_tail(&page_tail->lru, list_head);
 	}
@@ -688,7 +697,7 @@ void lru_add_page_tail(struct zone* zone,
 static void __pagevec_lru_add_fn(struct page *page, void *arg)
 {
 	enum lru_list lru = (enum lru_list)arg;
-	struct zone *zone = page_zone(page);
+	struct lruvec *lruvec = page_lruvec(page);
 	int file = is_file_lru(lru);
 	int active = is_active_lru(lru);
 
@@ -699,8 +708,8 @@ static void __pagevec_lru_add_fn(struct page *page, void *arg)
 	SetPageLRU(page);
 	if (active)
 		SetPageActive(page);
-	update_page_reclaim_stat(zone, page, file, active);
-	add_page_to_lru_list(zone, page, lru);
+	update_page_reclaim_stat(lruvec_zone(lruvec), page, file, active);
+	add_page_to_lru_list(lruvec, page, lru);
 }
 
 /*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f5e7046..ebb5d99 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1286,6 +1286,7 @@ int isolate_lru_page(struct page *page)
 
 	if (PageLRU(page)) {
 		struct zone *zone = page_zone(page);
+		struct lruvec *lruvec;
 
 		spin_lock_irq(&zone->lru_lock);
 		if (PageLRU(page)) {
@@ -1293,8 +1294,8 @@ int isolate_lru_page(struct page *page)
 			ret = 0;
 			get_page(page);
 			ClearPageLRU(page);
-
-			del_page_from_lru_list(zone, page, lru);
+			lruvec = page_lruvec(page);
+			del_page_from_lru_list(lruvec, page, lru);
 		}
 		spin_unlock_irq(&zone->lru_lock);
 	}
@@ -1339,6 +1340,7 @@ putback_inactive_pages(struct lruvec *lruvec,
 	 */
 	while (!list_empty(page_list)) {
 		struct page *page = lru_to_page(page_list);
+		struct lruvec *lruvec;
 		int lru;
 
 		VM_BUG_ON(PageLRU(page));
@@ -1351,7 +1353,9 @@ putback_inactive_pages(struct lruvec *lruvec,
 		}
 		SetPageLRU(page);
 		lru = page_lru(page);
-		add_page_to_lru_list(zone, page, lru);
+
+		lruvec = page_lruvec(page);
+		add_page_to_lru_list(lruvec, page, lru);
 		if (is_active_lru(lru)) {
 			int file = is_file_lru(lru);
 			int numpages = hpage_nr_pages(page);
@@ -1360,7 +1364,7 @@ putback_inactive_pages(struct lruvec *lruvec,
 		if (put_page_testzero(page)) {
 			__ClearPageLRU(page);
 			__ClearPageActive(page);
-			del_page_from_lru_list(zone, page, lru);
+			del_page_from_lru_list(lruvec, page, lru);
 
 			if (unlikely(PageCompound(page))) {
 				spin_unlock_irq(&zone->lru_lock);
@@ -1643,7 +1647,7 @@ static void move_active_pages_to_lru(struct zone *zone,
 		if (put_page_testzero(page)) {
 			__ClearPageLRU(page);
 			__ClearPageActive(page);
-			del_page_from_lru_list(zone, page, lru);
+			del_page_from_lru_list(lruvec, page, lru);
 
 			if (unlikely(PageCompound(page))) {
 				spin_unlock_irq(&zone->lru_lock);


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 12/21] mm: push lruvec into update_page_reclaim_stat()
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (10 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 11/21] mm: move page-to-lruvec translation upper Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:44   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 13/21] mm: push lruvecs from pagevec_lru_move_fn() to iterator Konstantin Khlebnikov
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Push lruvec pointer into update_page_reclaim_stat()
* drop page argument
* drop active and file arguments, use lru instead

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/swap.c |   30 +++++++++---------------------
 1 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 0cbc558..1f5731e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -276,24 +276,19 @@ void rotate_reclaimable_page(struct page *page)
 	}
 }
 
-static void update_page_reclaim_stat(struct zone *zone, struct page *page,
-				     int file, int rotated)
+static void update_page_reclaim_stat(struct lruvec *lruvec, enum lru_list lru)
 {
-	struct zone_reclaim_stat *reclaim_stat;
-
-	reclaim_stat = &page_lruvec(page)->reclaim_stat;
+	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
+	int file = is_file_lru(lru);
 
 	reclaim_stat->recent_scanned[file]++;
-	if (rotated)
+	if (is_active_lru(lru))
 		reclaim_stat->recent_rotated[file]++;
 }
 
 static void __activate_page(struct page *page, void *arg)
 {
-	struct zone *zone = page_zone(page);
-
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
-		int file = page_is_file_cache(page);
 		int lru = page_lru_base_type(page);
 		struct lruvec *lruvec = page_lruvec(page);
 
@@ -304,7 +299,7 @@ static void __activate_page(struct page *page, void *arg)
 		add_page_to_lru_list(lruvec, page, lru);
 		__count_vm_event(PGACTIVATE);
 
-		update_page_reclaim_stat(zone, page, file, 1);
+		update_page_reclaim_stat(lruvec, lru);
 	}
 }
 
@@ -443,7 +438,6 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 {
 	int lru, file;
 	bool active;
-	struct zone *zone = page_zone(page);
 	struct lruvec *lruvec;
 
 	if (!PageLRU(page))
@@ -484,7 +478,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 
 	if (active)
 		__count_vm_event(PGDEACTIVATE);
-	update_page_reclaim_stat(zone, page, file, 0);
+	update_page_reclaim_stat(lruvec, lru);
 }
 
 /*
@@ -649,9 +643,7 @@ EXPORT_SYMBOL(__pagevec_release);
 void lru_add_page_tail(struct zone* zone,
 		       struct page *page, struct page *page_tail)
 {
-	int active;
 	enum lru_list lru;
-	const int file = 0;
 	struct lruvec *lruvec = page_lruvec(page);
 
 	VM_BUG_ON(!PageHead(page));
@@ -664,13 +656,11 @@ void lru_add_page_tail(struct zone* zone,
 	if (page_evictable(page_tail, NULL)) {
 		if (PageActive(page)) {
 			SetPageActive(page_tail);
-			active = 1;
 			lru = LRU_ACTIVE_ANON;
 		} else {
-			active = 0;
 			lru = LRU_INACTIVE_ANON;
 		}
-		update_page_reclaim_stat(zone, page_tail, file, active);
+		update_page_reclaim_stat(lruvec, lru);
 	} else {
 		SetPageUnevictable(page_tail);
 		lru = LRU_UNEVICTABLE;
@@ -698,17 +688,15 @@ static void __pagevec_lru_add_fn(struct page *page, void *arg)
 {
 	enum lru_list lru = (enum lru_list)arg;
 	struct lruvec *lruvec = page_lruvec(page);
-	int file = is_file_lru(lru);
-	int active = is_active_lru(lru);
 
 	VM_BUG_ON(PageActive(page));
 	VM_BUG_ON(PageUnevictable(page));
 	VM_BUG_ON(PageLRU(page));
 
 	SetPageLRU(page);
-	if (active)
+	if (is_active_lru(lru))
 		SetPageActive(page);
-	update_page_reclaim_stat(lruvec_zone(lruvec), page, file, active);
+	update_page_reclaim_stat(lruvec, lru);
 	add_page_to_lru_list(lruvec, page, lru);
 }
 


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 13/21] mm: push lruvecs from pagevec_lru_move_fn() to iterator
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (11 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 12/21] mm: push lruvec into update_page_reclaim_stat() Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:45   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 14/21] mm: introduce lruvec locking primitives Konstantin Khlebnikov
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Push lruvec pointer from pagevec_lru_move_fn() to iterator function.
Push lruvec pointer into lru_add_page_tail()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/swap.h |    2 +-
 mm/huge_memory.c     |    4 +++-
 mm/swap.c            |   27 +++++++++++++--------------
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index f7df3ea..8630354 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -224,7 +224,7 @@ extern unsigned int nr_free_pagecache_pages(void);
 /* linux/mm/swap.c */
 extern void __lru_cache_add(struct page *, enum lru_list lru);
 extern void lru_cache_add_lru(struct page *, enum lru_list lru);
-extern void lru_add_page_tail(struct zone* zone,
+extern void lru_add_page_tail(struct lruvec *lruvec,
 			      struct page *page, struct page *page_tail);
 extern void activate_page(struct page *);
 extern void mark_page_accessed(struct page *);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 91d3efb..09e7069 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1229,10 +1229,12 @@ static void __split_huge_page_refcount(struct page *page)
 {
 	int i;
 	struct zone *zone = page_zone(page);
+	struct lruvec *lruvec;
 	int tail_count = 0;
 
 	/* prevent PageLRU to go away from under us, and freeze lru stats */
 	spin_lock_irq(&zone->lru_lock);
+	lruvec = page_lruvec(page);
 	compound_lock(page);
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(page);
@@ -1308,7 +1310,7 @@ static void __split_huge_page_refcount(struct page *page)
 		BUG_ON(!PageSwapBacked(page_tail));
 
 
-		lru_add_page_tail(zone, page, page_tail);
+		lru_add_page_tail(lruvec, page, page_tail);
 	}
 	atomic_sub(tail_count, &page->_count);
 	BUG_ON(atomic_read(&page->_count) <= 0);
diff --git a/mm/swap.c b/mm/swap.c
index 1f5731e..f7b5896 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -204,7 +204,8 @@ void put_pages_list(struct list_head *pages)
 EXPORT_SYMBOL(put_pages_list);
 
 static void pagevec_lru_move_fn(struct pagevec *pvec,
-				void (*move_fn)(struct page *page, void *arg),
+				void (*move_fn)(struct lruvec *lruvec,
+						struct page *page, void *arg),
 				void *arg)
 {
 	int i;
@@ -214,6 +215,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 	for (i = 0; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
 		struct zone *pagezone = page_zone(page);
+		struct lruvec *lruvec;
 
 		if (pagezone != zone) {
 			if (zone)
@@ -222,7 +224,8 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 			spin_lock_irqsave(&zone->lru_lock, flags);
 		}
 
-		(*move_fn)(page, arg);
+		lruvec = page_lruvec(page);
+		(*move_fn)(lruvec, page, arg);
 	}
 	if (zone)
 		spin_unlock_irqrestore(&zone->lru_lock, flags);
@@ -230,13 +233,13 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 	pagevec_reinit(pvec);
 }
 
-static void pagevec_move_tail_fn(struct page *page, void *arg)
+static void pagevec_move_tail_fn(struct lruvec *lruvec,
+				 struct page *page, void *arg)
 {
 	int *pgmoved = arg;
 
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
 		enum lru_list lru = page_lru_base_type(page);
-		struct lruvec *lruvec = page_lruvec(page);
 
 		list_move_tail(&page->lru, &lruvec->pages_lru[lru]);
 		(*pgmoved)++;
@@ -286,11 +289,10 @@ static void update_page_reclaim_stat(struct lruvec *lruvec, enum lru_list lru)
 		reclaim_stat->recent_rotated[file]++;
 }
 
-static void __activate_page(struct page *page, void *arg)
+static void __activate_page(struct lruvec *lruvec, struct page *page, void *arg)
 {
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
 		int lru = page_lru_base_type(page);
-		struct lruvec *lruvec = page_lruvec(page);
 
 		del_page_from_lru_list(lruvec, page, lru);
 
@@ -434,11 +436,10 @@ void add_page_to_unevictable_list(struct page *page)
  * be write it out by flusher threads as this is much more effective
  * than the single-page writeout from reclaim.
  */
-static void lru_deactivate_fn(struct page *page, void *arg)
+static void lru_deactivate_fn(struct lruvec *lruvec, struct page *page, void *arg)
 {
 	int lru, file;
 	bool active;
-	struct lruvec *lruvec;
 
 	if (!PageLRU(page))
 		return;
@@ -454,7 +455,6 @@ static void lru_deactivate_fn(struct page *page, void *arg)
 
 	file = page_is_file_cache(page);
 	lru = page_lru_base_type(page);
-	lruvec = page_lruvec(page);
 	del_page_from_lru_list(lruvec, page, lru + active);
 	ClearPageActive(page);
 	ClearPageReferenced(page);
@@ -640,16 +640,15 @@ EXPORT_SYMBOL(__pagevec_release);
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /* used by __split_huge_page_refcount() */
-void lru_add_page_tail(struct zone* zone,
+void lru_add_page_tail(struct lruvec *lruvec,
 		       struct page *page, struct page *page_tail)
 {
 	enum lru_list lru;
-	struct lruvec *lruvec = page_lruvec(page);
 
 	VM_BUG_ON(!PageHead(page));
 	VM_BUG_ON(PageCompound(page_tail));
 	VM_BUG_ON(PageLRU(page_tail));
-	VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&zone->lru_lock));
+	VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&lruvec_zone(lruvec)->lru_lock));
 
 	SetPageLRU(page_tail);
 
@@ -684,10 +683,10 @@ void lru_add_page_tail(struct zone* zone,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-static void __pagevec_lru_add_fn(struct page *page, void *arg)
+static void __pagevec_lru_add_fn(struct lruvec *lruvec,
+				 struct page *page, void *arg)
 {
 	enum lru_list lru = (enum lru_list)arg;
-	struct lruvec *lruvec = page_lruvec(page);
 
 	VM_BUG_ON(PageActive(page));
 	VM_BUG_ON(PageUnevictable(page));


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 14/21] mm: introduce lruvec locking primitives
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (12 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 13/21] mm: push lruvecs from pagevec_lru_move_fn() to iterator Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  0:56   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 15/21] mm: handle lruvec relocks on lumpy reclaim Konstantin Khlebnikov
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

This is initial preparation for lru_lock splitting.

This locking primites designed to hide splitted nature of lru_lock
and to avoid overhead for non-splitted lru_lock in non-memcg case.

* Lock via lruvec reference

lock_lruvec(lruvec, flags)
lock_lruvec_irq(lruvec)

* Lock via page reference

lock_page_lruvec(page, flags)
lock_page_lruvec_irq(page)
relock_page_lruvec(lruvec, page, flags)
relock_page_lruvec_irq(lruvec, page)
__relock_page_lruvec(lruvec, page) ( lruvec != NULL, page in same zone )

They always returns pointer to some locked lruvec, page anyway can be
not in lru, PageLRU() sign is stable while we hold returned lruvec lock.
Caller must guarantee page to lruvec reference validity.

* Lock via page, without stable page reference

__lock_page_lruvec_irq(&lruvec, page)

It returns true of lruvec succesfully locked and PageLRU is set.
Initial lruvec can be NULL. Consequent calls must be in the same zone.

* Unlock

unlock_lruvec(lruvec, flags)
unlock_lruvec_irq(lruvec)

* Wait

wait_lruvec_unlock(lruvec)
Wait for lruvec unlock, caller must have stable reference to lruvec.

__wait_lruvec_unlock(lruvec)
Wait for lruvec unlock before locking other lrulock for same page,
nothing if there only one possible lruvec per page.
Used at page-to-lruvec reference switching to stabilize PageLRU sign.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/huge_memory.c |    8 +-
 mm/internal.h    |  176 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c  |   14 ++--
 mm/swap.c        |   58 ++++++------------
 mm/vmscan.c      |   77 ++++++++++--------------
 5 files changed, 237 insertions(+), 96 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 09e7069..74996b8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1228,13 +1228,11 @@ static int __split_huge_page_splitting(struct page *page,
 static void __split_huge_page_refcount(struct page *page)
 {
 	int i;
-	struct zone *zone = page_zone(page);
 	struct lruvec *lruvec;
 	int tail_count = 0;
 
 	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irq(&zone->lru_lock);
-	lruvec = page_lruvec(page);
+	lruvec = lock_page_lruvec_irq(page);
 	compound_lock(page);
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(page);
@@ -1316,11 +1314,11 @@ static void __split_huge_page_refcount(struct page *page)
 	BUG_ON(atomic_read(&page->_count) <= 0);
 
 	__dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
-	__mod_zone_page_state(zone, NR_ANON_PAGES, HPAGE_PMD_NR);
+	__mod_zone_page_state(lruvec_zone(lruvec), NR_ANON_PAGES, HPAGE_PMD_NR);
 
 	ClearPageCompound(page);
 	compound_unlock(page);
-	spin_unlock_irq(&zone->lru_lock);
+	unlock_lruvec_irq(lruvec);
 
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		struct page *page_tail = page + i;
diff --git a/mm/internal.h b/mm/internal.h
index ef49dbf..9454752 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -13,6 +13,182 @@
 
 #include <linux/mm.h>
 
+static inline void lock_lruvec(struct lruvec *lruvec, unsigned long *flags)
+{
+	spin_lock_irqsave(&lruvec_zone(lruvec)->lru_lock, *flags);
+}
+
+static inline void lock_lruvec_irq(struct lruvec *lruvec)
+{
+	spin_lock_irq(&lruvec_zone(lruvec)->lru_lock);
+}
+
+static inline void unlock_lruvec(struct lruvec *lruvec, unsigned long *flags)
+{
+	spin_unlock_irqrestore(&lruvec_zone(lruvec)->lru_lock, *flags);
+}
+
+static inline void unlock_lruvec_irq(struct lruvec *lruvec)
+{
+	spin_unlock_irq(&lruvec_zone(lruvec)->lru_lock);
+}
+
+static inline void wait_lruvec_unlock(struct lruvec *lruvec)
+{
+	spin_unlock_wait(&lruvec_zone(lruvec)->lru_lock);
+}
+
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+
+/* Dynamic page to lruvec mapping */
+
+/* Lock other lruvec for other page in the same zone */
+static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
+						  struct page *page)
+{
+	/* Currenyly only one lru_lock per-zone */
+	return page_lruvec(page);
+}
+
+static inline struct lruvec *relock_page_lruvec_irq(struct lruvec *lruvec,
+						    struct page *page)
+{
+	struct zone *zone = page_zone(page);
+
+	if (!lruvec) {
+		spin_lock_irq(&zone->lru_lock);
+	} else if (zone != lruvec_zone(lruvec)) {
+		unlock_lruvec_irq(lruvec);
+		spin_lock_irq(&zone->lru_lock);
+	}
+	return page_lruvec(page);
+}
+
+static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
+						struct page *page,
+						unsigned long *flags)
+{
+	struct zone *zone = page_zone(page);
+
+	if (!lruvec) {
+		spin_lock_irqsave(&zone->lru_lock, *flags);
+	} else if (zone != lruvec_zone(lruvec)) {
+		unlock_lruvec(lruvec, flags);
+		spin_lock_irqsave(&zone->lru_lock, *flags);
+	}
+	return page_lruvec(page);
+}
+
+/*
+ * Caller may not have stable reference to page.
+ * Page for next call must be from the same zone.
+ * Returns true if page successfully catched in LRU.
+ */
+static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
+					  struct page *page)
+{
+	struct zone *zone;
+	bool ret = false;
+
+	if (PageLRU(page)) {
+		if (!*lruvec) {
+			zone = page_zone(page);
+			spin_lock_irq(&zone->lru_lock);
+		} else
+			zone = lruvec_zone(*lruvec);
+
+		if (PageLRU(page)) {
+			*lruvec = page_lruvec(page);
+			ret = true;
+		} else
+			*lruvec = &zone->lruvec;
+	}
+
+	return ret;
+}
+
+/* Wait for lruvec unlock before locking other lruvec for the same page */
+static inline void __wait_lruvec_unlock(struct lruvec *lruvec)
+{
+	/* Currently only one lru_lock per-zone */
+}
+
+#else /* CONFIG_CGROUP_MEM_RES_CTLR */
+
+/* Fixed page to lruvec mapping */
+
+/* Lock lruvec for other page in the same zone */
+static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
+						  struct page *page)
+{
+	/* Currently ony one lruvec per-zone */
+	return locked_lruvec;
+}
+
+static inline struct lruvec *relock_page_lruvec(struct lruvec *locked_lruvec,
+						struct page *page,
+						unsigned long *flags)
+{
+	struct lruvec *lruvec = page_lruvec(page);
+
+	if (!locked_lruvec) {
+		lock_lruvec(lruvec, flags);
+	} else if (locked_lruvec != lruvec) {
+		unlock_lruvec(locked_lruvec, flags);
+		lock_lruvec(lruvec, flags);
+	}
+
+	return lruvec;
+}
+
+static inline struct lruvec *relock_page_lruvec_irq(
+		struct lruvec *locked_lruvec, struct page *page)
+{
+	struct lruvec *lruvec = page_lruvec(page);
+
+	if (!locked_lruvec) {
+		lock_lruvec_irq(lruvec);
+	} else if (locked_lruvec != lruvec) {
+		unlock_lruvec_irq(locked_lruvec);
+		lock_lruvec_irq(lruvec);
+	}
+
+	return lruvec;
+}
+
+static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
+					  struct page *page)
+{
+	bool ret = false;
+
+	if (PageLRU(page)) {
+		*lruvec = relock_page_lruvec_irq(*lruvec, page);
+		if (PageLRU(page))
+			ret = true;
+	}
+
+	return ret;
+}
+
+/* Wait for lruvec unlock before locking other lruvec for the same page */
+static inline void __wait_lruvec_unlock(struct lruvec *lruvec)
+{
+	/* Fixed page to lruvec mapping, there only one possible lruvec */
+}
+
+#endif /* CONFIG_CGROUP_MEM_RES_CTLR */
+
+static inline struct lruvec *lock_page_lruvec(struct page *page,
+					      unsigned long *flags)
+{
+	return relock_page_lruvec(NULL, page, flags);
+}
+
+static inline struct lruvec *lock_page_lruvec_irq(struct page *page)
+{
+	return relock_page_lruvec_irq(NULL, page);
+}
+
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 		unsigned long floor, unsigned long ceiling);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 83fa99b..aed1360 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3487,12 +3487,14 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 	struct list_head *list;
 	struct page *busy;
 	struct zone *zone;
+	struct lruvec *lruvec;
 	int ret = 0;
 
 	zone = &NODE_DATA(node)->node_zones[zid];
 	mz = mem_cgroup_zoneinfo(memcg, node, zid);
-	list = &mz->lruvec.pages_lru[lru];
-	loop = mz->lruvec.pages_count[lru];
+	lruvec = &mz->lruvec;
+	list = &lruvec->pages_lru[lru];
+	loop = lruvec->pages_count[lru];
 	/* give some margin against EBUSY etc...*/
 	loop += 256;
 	busy = NULL;
@@ -3501,19 +3503,19 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 		struct page *page;
 
 		ret = 0;
-		spin_lock_irqsave(&zone->lru_lock, flags);
+		lock_lruvec(lruvec, &flags);
 		if (list_empty(list)) {
-			spin_unlock_irqrestore(&zone->lru_lock, flags);
+			unlock_lruvec(lruvec, &flags);
 			break;
 		}
 		page = list_entry(list->prev, struct page, lru);
 		if (busy == page) {
 			list_move(&page->lru, list);
 			busy = NULL;
-			spin_unlock_irqrestore(&zone->lru_lock, flags);
+			unlock_lruvec(lruvec, &flags);
 			continue;
 		}
-		spin_unlock_irqrestore(&zone->lru_lock, flags);
+		unlock_lruvec(lruvec, &flags);
 
 		pc = lookup_page_cgroup(page);
 
diff --git a/mm/swap.c b/mm/swap.c
index f7b5896..3689e3d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -48,15 +48,13 @@ static void __page_cache_release(struct page *page)
 {
 	if (PageLRU(page)) {
 		unsigned long flags;
-		struct zone *zone = page_zone(page);
 		struct lruvec *lruvec;
 
-		spin_lock_irqsave(&zone->lru_lock, flags);
+		lruvec = lock_page_lruvec(page, &flags);
 		VM_BUG_ON(!PageLRU(page));
 		__ClearPageLRU(page);
-		lruvec = page_lruvec(page);
 		del_page_from_lru_list(lruvec, page, page_off_lru(page));
-		spin_unlock_irqrestore(&zone->lru_lock, flags);
+		unlock_lruvec(lruvec, &flags);
 	}
 }
 
@@ -209,26 +207,17 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 				void *arg)
 {
 	int i;
-	struct zone *zone = NULL;
+	struct lruvec *lruvec = NULL;
 	unsigned long flags = 0;
 
 	for (i = 0; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
-		struct zone *pagezone = page_zone(page);
-		struct lruvec *lruvec;
-
-		if (pagezone != zone) {
-			if (zone)
-				spin_unlock_irqrestore(&zone->lru_lock, flags);
-			zone = pagezone;
-			spin_lock_irqsave(&zone->lru_lock, flags);
-		}
 
-		lruvec = page_lruvec(page);
+		lruvec = relock_page_lruvec(lruvec, page, &flags);
 		(*move_fn)(lruvec, page, arg);
 	}
-	if (zone)
-		spin_unlock_irqrestore(&zone->lru_lock, flags);
+	if (lruvec)
+		unlock_lruvec(lruvec, &flags);
 	release_pages(pvec->pages, pvec->nr, pvec->cold);
 	pagevec_reinit(pvec);
 }
@@ -335,11 +324,11 @@ static inline void activate_page_drain(int cpu)
 
 void activate_page(struct page *page)
 {
-	struct zone *zone = page_zone(page);
+	struct lruvec *lruvec;
 
-	spin_lock_irq(&zone->lru_lock);
+	lruvec = lock_page_lruvec_irq(page);
 	__activate_page(page, NULL);
-	spin_unlock_irq(&zone->lru_lock);
+	unlock_lruvec_irq(lruvec);
 }
 #endif
 
@@ -404,15 +393,13 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
  */
 void add_page_to_unevictable_list(struct page *page)
 {
-	struct zone *zone = page_zone(page);
 	struct lruvec *lruvec;
 
-	spin_lock_irq(&zone->lru_lock);
+	lruvec = lock_page_lruvec_irq(page);
 	SetPageUnevictable(page);
 	SetPageLRU(page);
-	lruvec = page_lruvec(page);
 	add_page_to_lru_list(lruvec, page, LRU_UNEVICTABLE);
-	spin_unlock_irq(&zone->lru_lock);
+	unlock_lruvec_irq(lruvec);
 }
 
 /*
@@ -577,16 +564,16 @@ void release_pages(struct page **pages, int nr, int cold)
 {
 	int i;
 	LIST_HEAD(pages_to_free);
-	struct zone *zone = NULL;
+	struct lruvec *lruvec = NULL;
 	unsigned long uninitialized_var(flags);
 
 	for (i = 0; i < nr; i++) {
 		struct page *page = pages[i];
 
 		if (unlikely(PageCompound(page))) {
-			if (zone) {
-				spin_unlock_irqrestore(&zone->lru_lock, flags);
-				zone = NULL;
+			if (lruvec) {
+				unlock_lruvec(lruvec, &flags);
+				lruvec = NULL;
 			}
 			put_compound_page(page);
 			continue;
@@ -596,16 +583,7 @@ void release_pages(struct page **pages, int nr, int cold)
 			continue;
 
 		if (PageLRU(page)) {
-			struct zone *pagezone = page_zone(page);
-			struct lruvec *lruvec = page_lruvec(page);
-
-			if (pagezone != zone) {
-				if (zone)
-					spin_unlock_irqrestore(&zone->lru_lock,
-									flags);
-				zone = pagezone;
-				spin_lock_irqsave(&zone->lru_lock, flags);
-			}
+			lruvec = relock_page_lruvec(lruvec, page, &flags);
 			VM_BUG_ON(!PageLRU(page));
 			__ClearPageLRU(page);
 			del_page_from_lru_list(lruvec, page, page_off_lru(page));
@@ -613,8 +591,8 @@ void release_pages(struct page **pages, int nr, int cold)
 
 		list_add(&page->lru, &pages_to_free);
 	}
-	if (zone)
-		spin_unlock_irqrestore(&zone->lru_lock, flags);
+	if (lruvec)
+		unlock_lruvec(lruvec, &flags);
 
 	free_hot_cold_page_list(&pages_to_free, cold);
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ebb5d99..a3941d1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1285,19 +1285,17 @@ int isolate_lru_page(struct page *page)
 	VM_BUG_ON(!page_count(page));
 
 	if (PageLRU(page)) {
-		struct zone *zone = page_zone(page);
 		struct lruvec *lruvec;
 
-		spin_lock_irq(&zone->lru_lock);
+		lruvec = lock_page_lruvec_irq(page);
 		if (PageLRU(page)) {
 			int lru = page_lru(page);
 			ret = 0;
 			get_page(page);
 			ClearPageLRU(page);
-			lruvec = page_lruvec(page);
 			del_page_from_lru_list(lruvec, page, lru);
 		}
-		spin_unlock_irq(&zone->lru_lock);
+		unlock_lruvec_irq(lruvec);
 	}
 	return ret;
 }
@@ -1332,7 +1330,6 @@ putback_inactive_pages(struct lruvec *lruvec,
 		       struct list_head *page_list)
 {
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
-	struct zone *zone = lruvec_zone(lruvec);
 	LIST_HEAD(pages_to_free);
 
 	/*
@@ -1340,15 +1337,14 @@ putback_inactive_pages(struct lruvec *lruvec,
 	 */
 	while (!list_empty(page_list)) {
 		struct page *page = lru_to_page(page_list);
-		struct lruvec *lruvec;
 		int lru;
 
 		VM_BUG_ON(PageLRU(page));
 		list_del(&page->lru);
 		if (unlikely(!page_evictable(page, NULL))) {
-			spin_unlock_irq(&zone->lru_lock);
+			unlock_lruvec_irq(lruvec);
 			putback_lru_page(page);
-			spin_lock_irq(&zone->lru_lock);
+			lock_lruvec_irq(lruvec);
 			continue;
 		}
 		SetPageLRU(page);
@@ -1367,9 +1363,9 @@ putback_inactive_pages(struct lruvec *lruvec,
 			del_page_from_lru_list(lruvec, page, lru);
 
 			if (unlikely(PageCompound(page))) {
-				spin_unlock_irq(&zone->lru_lock);
+				unlock_lruvec_irq(lruvec);
 				(*get_compound_page_dtor(page))(page);
-				spin_lock_irq(&zone->lru_lock);
+				lock_lruvec_irq(lruvec);
 			} else
 				list_add(&page->lru, &pages_to_free);
 		}
@@ -1505,7 +1501,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	if (!sc->may_writepage)
 		isolate_mode |= ISOLATE_CLEAN;
 
-	spin_lock_irq(&zone->lru_lock);
+	lock_lruvec_irq(lruvec);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list,
 				     &nr_scanned, sc, isolate_mode, 0, file);
@@ -1521,7 +1517,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	}
 
 	if (nr_taken == 0) {
-		spin_unlock_irq(&zone->lru_lock);
+		unlock_lruvec_irq(lruvec);
 		return 0;
 	}
 
@@ -1530,7 +1526,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON, nr_anon);
 	__mod_zone_page_state(zone, NR_ISOLATED_FILE, nr_file);
 
-	spin_unlock_irq(&zone->lru_lock);
+	unlock_lruvec_irq(lruvec);
 
 	nr_reclaimed = shrink_page_list(&page_list, lruvec, sc, priority,
 						&nr_dirty, &nr_writeback);
@@ -1542,7 +1538,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 					priority, &nr_dirty, &nr_writeback);
 	}
 
-	spin_lock_irq(&zone->lru_lock);
+	lock_lruvec_irq(lruvec);
 
 	if (current_is_kswapd())
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
@@ -1553,7 +1549,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
 	__mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
 
-	spin_unlock_irq(&zone->lru_lock);
+	unlock_lruvec_irq(lruvec);
 
 	free_hot_cold_page_list(&page_list, 1);
 
@@ -1609,7 +1605,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
  * But we had to alter page->flags anyway.
  */
 
-static void move_active_pages_to_lru(struct zone *zone,
+static void move_active_pages_to_lru(struct lruvec *lruvec,
 				     struct list_head *list,
 				     struct list_head *pages_to_free,
 				     enum lru_list lru)
@@ -1618,7 +1614,7 @@ static void move_active_pages_to_lru(struct zone *zone,
 	struct page *page;
 
 	if (buffer_heads_over_limit) {
-		spin_unlock_irq(&zone->lru_lock);
+		unlock_lruvec_irq(lruvec);
 		list_for_each_entry(page, list, lru) {
 			if (page_has_private(page) && trylock_page(page)) {
 				if (page_has_private(page))
@@ -1626,11 +1622,10 @@ static void move_active_pages_to_lru(struct zone *zone,
 				unlock_page(page);
 			}
 		}
-		spin_lock_irq(&zone->lru_lock);
+		lock_lruvec_irq(lruvec);
 	}
 
 	while (!list_empty(list)) {
-		struct lruvec *lruvec;
 		int numpages;
 
 		page = lru_to_page(list);
@@ -1650,14 +1645,14 @@ static void move_active_pages_to_lru(struct zone *zone,
 			del_page_from_lru_list(lruvec, page, lru);
 
 			if (unlikely(PageCompound(page))) {
-				spin_unlock_irq(&zone->lru_lock);
+				unlock_lruvec_irq(lruvec);
 				(*get_compound_page_dtor(page))(page);
-				spin_lock_irq(&zone->lru_lock);
+				lock_lruvec_irq(lruvec);
 			} else
 				list_add(&page->lru, pages_to_free);
 		}
 	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, pgmoved);
 	if (!is_active_lru(lru))
 		__count_vm_events(PGDEACTIVATE, pgmoved);
 }
@@ -1686,7 +1681,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	if (!sc->may_writepage)
 		isolate_mode |= ISOLATE_CLEAN;
 
-	spin_lock_irq(&zone->lru_lock);
+	lock_lruvec_irq(lruvec);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned,
 				     sc, isolate_mode, 1, file);
@@ -1702,7 +1697,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	else
 		__mod_zone_page_state(zone, NR_ACTIVE_ANON, -nr_taken);
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
-	spin_unlock_irq(&zone->lru_lock);
+
+	unlock_lruvec_irq(lruvec);
 
 	while (!list_empty(&l_hold)) {
 		cond_resched();
@@ -1739,7 +1735,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	/*
 	 * Move pages back to the lru list.
 	 */
-	spin_lock_irq(&zone->lru_lock);
+	lock_lruvec_irq(lruvec);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
 	 * even though only some of them are actually re-activated.  This
@@ -1748,12 +1744,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
 
-	move_active_pages_to_lru(zone, &l_active, &l_hold,
+	move_active_pages_to_lru(lruvec, &l_active, &l_hold,
 						LRU_ACTIVE + file * LRU_FILE);
-	move_active_pages_to_lru(zone, &l_inactive, &l_hold,
+	move_active_pages_to_lru(lruvec, &l_inactive, &l_hold,
 						LRU_BASE   + file * LRU_FILE);
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
-	spin_unlock_irq(&zone->lru_lock);
+	unlock_lruvec_irq(lruvec);
 
 	free_hot_cold_page_list(&l_hold, 1);
 }
@@ -1952,7 +1948,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	 *
 	 * anon in [0], file in [1]
 	 */
-	spin_lock_irq(&zone->lru_lock);
+	lock_lruvec_irq(lruvec);
 	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
 		reclaim_stat->recent_scanned[0] /= 2;
 		reclaim_stat->recent_rotated[0] /= 2;
@@ -1973,7 +1969,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 
 	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
-	spin_unlock_irq(&zone->lru_lock);
+	unlock_lruvec_irq(lruvec);
 
 	fraction[0] = ap;
 	fraction[1] = fp;
@@ -3518,24 +3514,16 @@ int page_evictable(struct page *page, struct vm_area_struct *vma)
  */
 void check_move_unevictable_pages(struct page **pages, int nr_pages)
 {
-	struct lruvec *lruvec;
-	struct zone *zone = NULL;
+	struct lruvec *lruvec = NULL;
 	int pgscanned = 0;
 	int pgrescued = 0;
 	int i;
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page = pages[i];
-		struct zone *pagezone;
 
 		pgscanned++;
-		pagezone = page_zone(page);
-		if (pagezone != zone) {
-			if (zone)
-				spin_unlock_irq(&zone->lru_lock);
-			zone = pagezone;
-			spin_lock_irq(&zone->lru_lock);
-		}
+		lruvec = relock_page_lruvec_irq(lruvec, page);
 
 		if (!PageLRU(page) || !PageUnevictable(page))
 			continue;
@@ -3545,21 +3533,20 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
 
 			VM_BUG_ON(PageActive(page));
 			ClearPageUnevictable(page);
-			__dec_zone_state(zone, NR_UNEVICTABLE);
-			lruvec = page_lruvec(page);
+			__dec_zone_state(lruvec_zone(lruvec), NR_UNEVICTABLE);
 			lruvec->pages_count[LRU_UNEVICTABLE]--;
 			VM_BUG_ON((long)lruvec->pages_count[LRU_UNEVICTABLE] < 0);
 			lruvec->pages_count[lru]++;
 			list_move(&page->lru, &lruvec->pages_lru[lru]);
-			__inc_zone_state(zone, NR_INACTIVE_ANON + lru);
+			__inc_zone_state(lruvec_zone(lruvec), NR_INACTIVE_ANON + lru);
 			pgrescued++;
 		}
 	}
 
-	if (zone) {
+	if (lruvec) {
 		__count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
 		__count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
-		spin_unlock_irq(&zone->lru_lock);
+		unlock_lruvec_irq(lruvec);
 	}
 }
 #endif /* CONFIG_SHMEM */


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 15/21] mm: handle lruvec relocks on lumpy reclaim
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (13 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 14/21] mm: introduce lruvec locking primitives Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  1:01   ` KAMEZAWA Hiroyuki
  2012-02-23 13:52 ` [PATCH v3 16/21] mm: handle lruvec relocks in compaction Konstantin Khlebnikov
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Prepare for lock splitting in lumly reclaim logic.
Now move_active_pages_to_lru() and putback_inactive_pages()
can put pages into different lruvecs.

* relock book before SetPageLRU()
* update reclaim_stat pointer after relocks
* return currently locked lruvec

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmscan.c |   45 +++++++++++++++++++++++++++++++++------------
 1 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a3941d1..6eeeb4b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1114,6 +1114,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		unsigned long *nr_scanned, struct scan_control *sc,
 		isolate_mode_t mode, int active, int file)
 {
+	struct lruvec *cursor_lruvec = lruvec;
 	struct list_head *src;
 	unsigned long nr_taken = 0;
 	unsigned long nr_lumpy_taken = 0;
@@ -1197,14 +1198,17 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			    !PageSwapCache(cursor_page))
 				break;
 
+			/* Switch cursor_lruvec lock for lumpy isolate */
+			if (!__lock_page_lruvec_irq(&cursor_lruvec,
+						    cursor_page))
+				continue;
+
 			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
 				unsigned int isolated_pages;
-				struct lruvec *cursor_lruvec;
 				int cursor_lru = page_lru(cursor_page);
 
 				list_move(&cursor_page->lru, dst);
 				isolated_pages = hpage_nr_pages(cursor_page);
-				cursor_lruvec = page_lruvec(cursor_page);
 				cursor_lruvec->pages_count[cursor_lru] -=
 								isolated_pages;
 				VM_BUG_ON((long)cursor_lruvec->
@@ -1235,6 +1239,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			}
 		}
 
+		/* Restore original lruvec lock */
+		cursor_lruvec = __relock_page_lruvec(cursor_lruvec, page);
+
 		/* If we break out of the loop above, lumpy reclaim failed */
 		if (pfn < end_pfn)
 			nr_lumpy_failed++;
@@ -1325,7 +1332,10 @@ static int too_many_isolated(struct zone *zone, int file,
 	return isolated > inactive;
 }
 
-static noinline_for_stack void
+/*
+ * Returns currently locked lruvec
+ */
+static noinline_for_stack struct lruvec *
 putback_inactive_pages(struct lruvec *lruvec,
 		       struct list_head *page_list)
 {
@@ -1347,10 +1357,13 @@ putback_inactive_pages(struct lruvec *lruvec,
 			lock_lruvec_irq(lruvec);
 			continue;
 		}
+
+		lruvec = __relock_page_lruvec(lruvec, page);
+		reclaim_stat = &lruvec->reclaim_stat;
+
 		SetPageLRU(page);
 		lru = page_lru(page);
 
-		lruvec = page_lruvec(page);
 		add_page_to_lru_list(lruvec, page, lru);
 		if (is_active_lru(lru)) {
 			int file = is_file_lru(lru);
@@ -1375,6 +1388,8 @@ putback_inactive_pages(struct lruvec *lruvec,
 	 * To save our caller's stack, now use input list for pages to free.
 	 */
 	list_splice(&pages_to_free, page_list);
+
+	return lruvec;
 }
 
 static noinline_for_stack void
@@ -1544,7 +1559,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
 	__count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);
 
-	putback_inactive_pages(lruvec, &page_list);
+	lruvec = putback_inactive_pages(lruvec, &page_list);
 
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
 	__mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
@@ -1603,12 +1618,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
  *
  * The downside is that we have to touch page->_count against each page.
  * But we had to alter page->flags anyway.
+ *
+ * Returns currently locked lruvec
  */
 
-static void move_active_pages_to_lru(struct lruvec *lruvec,
-				     struct list_head *list,
-				     struct list_head *pages_to_free,
-				     enum lru_list lru)
+static struct lruvec *
+move_active_pages_to_lru(struct lruvec *lruvec,
+			 struct list_head *list,
+			 struct list_head *pages_to_free,
+			 enum lru_list lru)
 {
 	unsigned long pgmoved = 0;
 	struct page *page;
@@ -1630,10 +1648,11 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 
 		page = lru_to_page(list);
 
+		lruvec = __relock_page_lruvec(lruvec, page);
+
 		VM_BUG_ON(PageLRU(page));
 		SetPageLRU(page);
 
-		lruvec = page_lruvec(page);
 		list_move(&page->lru, &lruvec->pages_lru[lru]);
 		numpages = hpage_nr_pages(page);
 		lruvec->pages_count[lru] += numpages;
@@ -1655,6 +1674,8 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, pgmoved);
 	if (!is_active_lru(lru))
 		__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	return lruvec;
 }
 
 static void shrink_active_list(unsigned long nr_to_scan,
@@ -1744,9 +1765,9 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
 
-	move_active_pages_to_lru(lruvec, &l_active, &l_hold,
+	lruvec = move_active_pages_to_lru(lruvec, &l_active, &l_hold,
 						LRU_ACTIVE + file * LRU_FILE);
-	move_active_pages_to_lru(lruvec, &l_inactive, &l_hold,
+	lruvec = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold,
 						LRU_BASE   + file * LRU_FILE);
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
 	unlock_lruvec_irq(lruvec);


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 16/21] mm: handle lruvec relocks in compaction
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (14 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 15/21] mm: handle lruvec relocks on lumpy reclaim Konstantin Khlebnikov
@ 2012-02-23 13:52 ` Konstantin Khlebnikov
  2012-02-28  1:13   ` KAMEZAWA Hiroyuki
  2012-02-23 13:53 ` [PATCH v3 17/21] mm: handle lruvec relock in memory controller Konstantin Khlebnikov
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:52 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Prepare for lru_lock splitting in memory compaction code.

* disable irqs in acct_isolated() for __mod_zone_page_state(),
  lru_lock isn't required there.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/compaction.c |   30 ++++++++++++++++--------------
 1 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a976b28..54340e4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -224,8 +224,10 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
 	list_for_each_entry(page, &cc->migratepages, lru)
 		count[!!page_is_file_cache(page)]++;
 
+	local_irq_disable();
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
 	__mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
+	local_irq_enable();
 }
 
 /* Similar to reclaim, but different enough that they don't share logic */
@@ -262,7 +264,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
-	struct lruvec *lruvec;
+	struct lruvec *lruvec = NULL;
 
 	/* Do not scan outside zone boundaries */
 	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
@@ -294,25 +296,24 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 
 	/* Time to isolate some pages for migration */
 	cond_resched();
-	spin_lock_irq(&zone->lru_lock);
 	for (; low_pfn < end_pfn; low_pfn++) {
 		struct page *page;
-		bool locked = true;
 
 		/* give a chance to irqs before checking need_resched() */
 		if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
-			spin_unlock_irq(&zone->lru_lock);
-			locked = false;
+			if (lruvec)
+				unlock_lruvec_irq(lruvec);
+			lruvec = NULL;
 		}
-		if (need_resched() || spin_is_contended(&zone->lru_lock)) {
-			if (locked)
-				spin_unlock_irq(&zone->lru_lock);
+		if (need_resched() ||
+		    (lruvec && spin_is_contended(&zone->lru_lock))) {
+			if (lruvec)
+				unlock_lruvec_irq(lruvec);
+			lruvec = NULL;
 			cond_resched();
-			spin_lock_irq(&zone->lru_lock);
 			if (fatal_signal_pending(current))
 				break;
-		} else if (!locked)
-			spin_lock_irq(&zone->lru_lock);
+		}
 
 		/*
 		 * migrate_pfn does not necessarily start aligned to a
@@ -359,7 +360,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 		}
 
-		if (!PageLRU(page))
+		if (!__lock_page_lruvec_irq(&lruvec, page))
 			continue;
 
 		/*
@@ -382,7 +383,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		VM_BUG_ON(PageTransCompound(page));
 
 		/* Successfully isolated */
-		lruvec = page_lruvec(page);
 		del_page_from_lru_list(lruvec, page, page_lru(page));
 		list_add(&page->lru, migratelist);
 		cc->nr_migratepages++;
@@ -395,9 +395,11 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		}
 	}
 
+	if (lruvec)
+		unlock_lruvec_irq(lruvec);
+
 	acct_isolated(zone, cc);
 
-	spin_unlock_irq(&zone->lru_lock);
 	cc->migrate_pfn = low_pfn;
 
 	trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 17/21] mm: handle lruvec relock in memory controller
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (15 preceding siblings ...)
  2012-02-23 13:52 ` [PATCH v3 16/21] mm: handle lruvec relocks in compaction Konstantin Khlebnikov
@ 2012-02-23 13:53 ` Konstantin Khlebnikov
  2012-02-28  1:22   ` KAMEZAWA Hiroyuki
  2012-02-23 13:53 ` [PATCH v3 18/21] mm: add to lruvec isolated pages counters Konstantin Khlebnikov
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:53 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Carefully relock lruvec lru lock at page memory cgroup change.

* In free_pn_rcu() wait for lruvec lock release.
  Locking primitives keep lruvec pointer after successful lock held.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/memcontrol.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aed1360..230f434 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2542,7 +2542,6 @@ __mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
 					enum charge_type ctype)
 {
 	struct page_cgroup *pc = lookup_page_cgroup(page);
-	struct zone *zone = page_zone(page);
 	struct lruvec *lruvec;
 	unsigned long flags;
 	bool removed = false;
@@ -2552,20 +2551,19 @@ __mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
 	 * is already on LRU. It means the page may on some other page_cgroup's
 	 * LRU. Take care of it.
 	 */
-	spin_lock_irqsave(&zone->lru_lock, flags);
+	lruvec = lock_page_lruvec(page, &flags);
 	if (PageLRU(page)) {
-		lruvec = page_lruvec(page);
 		del_page_from_lru_list(lruvec, page, page_lru(page));
 		ClearPageLRU(page);
 		removed = true;
 	}
 	__mem_cgroup_commit_charge(memcg, page, 1, pc, ctype);
 	if (removed) {
-		lruvec = page_lruvec(page);
+		lruvec = __relock_page_lruvec(lruvec, page);
 		add_page_to_lru_list(lruvec, page, page_lru(page));
 		SetPageLRU(page);
 	}
-	spin_unlock_irqrestore(&zone->lru_lock, flags);
+	unlock_lruvec(lruvec, &flags);
 }
 
 int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
@@ -4624,7 +4622,16 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 
 static void free_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 {
-	kfree(memcg->info.nodeinfo[node]);
+	struct mem_cgroup_per_node *pn = memcg->info.nodeinfo[node];
+	int zone;
+
+	if (!pn)
+		return;
+
+	for (zone = 0; zone < MAX_NR_ZONES; zone++)
+		wait_lruvec_unlock(&pn->zoneinfo[zone].lruvec);
+
+	kfree(pn);
 }
 
 static struct mem_cgroup *mem_cgroup_alloc(void)


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 18/21] mm: add to lruvec isolated pages counters
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (16 preceding siblings ...)
  2012-02-23 13:53 ` [PATCH v3 17/21] mm: handle lruvec relock in memory controller Konstantin Khlebnikov
@ 2012-02-23 13:53 ` Konstantin Khlebnikov
  2012-02-24  5:32   ` Konstantin Khlebnikov
  2012-02-28  1:38   ` KAMEZAWA Hiroyuki
  2012-02-23 13:53 ` [PATCH v3 19/21] memcg: check lru vectors emptiness in pre-destroy Konstantin Khlebnikov
                   ` (5 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:53 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

This patch adds into struct lruvec counter of isolated pages.
It is required for keeping lruvec alive till the isolated page putback.
We cannot rely on resource counter in memory controller, because it
does not account uncharged memory. And it much better to have common engine
for dynamical lruvec management, than to tie all logic with memory cgroup magic.
Plus this is useful information for memory reclaimer balancing.

There also appears per-cpu page-vectors for putting isolated pages back,
and function add_page_to_evictable_list(). It is similar to lru_cache_add_lru()
but it reuse page reference from caller and can adjust isolated pages counters.
There also new function free_isolated_page_list() which is used at the end of
shrink_page_list() for freeing pages and adjusting counters of isolated pages.

Memory cgroups code can shuffle pages between lruvecs without isolation
if page is already isolated with someone else. Thus page lruvec reference is
unstable even if page is isolated. It is stable only under lru_lock or if page
reference count is zero. That's why we must always recheck page_lruvec() even
on non-lumpy 0-order reclaim, where all pages are isolated from one lruvec.

Memory controller at moving pege between cgroups now adjust isolated pages
counter for old lruvec before inserting page to new lruvec.
Locking lruvec->lru_lock in mem_cgroup_adjust_isolated() also effectively
stabilizes PageLRU() sign, so nobody will see PageLRU() under old lru_lock
while page is already moved into other lruvec.

[ BTW, all lru-id arithmetic can be simplified if we devide unevictable list
  into file and anon parts. After that we can swap bits in page->flags and
  calculate lru-id with single instruction ]

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mmzone.h |   11 ++++-
 include/linux/swap.h   |    4 +-
 mm/compaction.c        |    1 
 mm/huge_memory.c       |    4 ++
 mm/internal.h          |    6 ++
 mm/ksm.c               |    2 -
 mm/memcontrol.c        |   39 ++++++++++++++--
 mm/migrate.c           |    2 -
 mm/rmap.c              |    2 -
 mm/swap.c              |   76 +++++++++++++++++++++++++++++++
 mm/vmscan.c            |  116 +++++++++++++++++++++++++++++++++++-------------
 11 files changed, 218 insertions(+), 45 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 82d5ff3..2e3a298 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -137,13 +137,20 @@ enum lru_list {
 	LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
 	LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
 	LRU_UNEVICTABLE,
-	NR_LRU_LISTS
+	NR_EVICTABLE_LRU_LISTS = LRU_UNEVICTABLE,
+	NR_LRU_LISTS,
+	LRU_ISOLATED = NR_LRU_LISTS,
+	LRU_ISOLATED_ANON = LRU_ISOLATED,
+	LRU_ISOLATED_FILE,
+	NR_LRU_COUNTERS,
 };
 
 #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)
 
 #define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++)
 
+#define for_each_lru_counter(cnt) for (cnt = 0; cnt < NR_LRU_COUNTERS; cnt++)
+
 static inline int is_file_lru(enum lru_list lru)
 {
 	return (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE);
@@ -298,7 +305,7 @@ struct zone_reclaim_stat {
 
 struct lruvec {
 	struct list_head	pages_lru[NR_LRU_LISTS];
-	unsigned long		pages_count[NR_LRU_LISTS];
+	unsigned long		pages_count[NR_LRU_COUNTERS];
 
 	struct zone_reclaim_stat	reclaim_stat;
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8630354..3a3ff2c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -234,7 +234,9 @@ extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_page(struct page *page);
 extern void swap_setup(void);
 
-extern void add_page_to_unevictable_list(struct page *page);
+extern void add_page_to_unevictable_list(struct page *page, bool isolated);
+extern void add_page_to_evictable_list(struct page *page,
+					enum lru_list lru, bool isolated);
 
 /**
  * lru_cache_add: add a page to the page lists
diff --git a/mm/compaction.c b/mm/compaction.c
index 54340e4..fa74cbe 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -384,6 +384,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 
 		/* Successfully isolated */
 		del_page_from_lru_list(lruvec, page, page_lru(page));
+		lruvec->pages_count[LRU_ISOLATED + page_is_file_cache(page)]++;
 		list_add(&page->lru, migratelist);
 		cc->nr_migratepages++;
 		nr_isolated++;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 74996b8..46d9f44 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1316,6 +1316,10 @@ static void __split_huge_page_refcount(struct page *page)
 	__dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
 	__mod_zone_page_state(lruvec_zone(lruvec), NR_ANON_PAGES, HPAGE_PMD_NR);
 
+	/* Fixup isolated pages counter if head page currently isolated */
+	if (!PageLRU(page))
+		lruvec->pages_count[LRU_ISOLATED_ANON] -= HPAGE_PMD_NR-1;
+
 	ClearPageCompound(page);
 	compound_unlock(page);
 	unlock_lruvec_irq(lruvec);
diff --git a/mm/internal.h b/mm/internal.h
index 9454752..6dd2e70 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -265,7 +265,11 @@ extern unsigned long highest_memmap_pfn;
  * in mm/vmscan.c:
  */
 extern int isolate_lru_page(struct page *page);
-extern void putback_lru_page(struct page *page);
+extern void __putback_lru_page(struct page *page, bool isolated);
+static inline void putback_lru_page(struct page *page)
+{
+	__putback_lru_page(page, true);
+}
 
 /*
  * in mm/page_alloc.c
diff --git a/mm/ksm.c b/mm/ksm.c
index e20de58..109e6ec 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1592,7 +1592,7 @@ struct page *ksm_does_need_to_copy(struct page *page,
 		if (page_evictable(new_page, vma))
 			lru_cache_add_lru(new_page, LRU_ACTIVE_ANON);
 		else
-			add_page_to_unevictable_list(new_page);
+			add_page_to_unevictable_list(new_page, false);
 	}
 
 	return new_page;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 230f434..4de8044 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -697,7 +697,7 @@ mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
 
 	mz = mem_cgroup_zoneinfo(memcg, nid, zid);
 
-	for_each_lru(lru) {
+	for_each_lru_counter(lru) {
 		if (BIT(lru) & lru_mask)
 			ret += mz->lruvec.pages_count[lru];
 	}
@@ -2354,6 +2354,17 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+static void mem_cgroup_adjust_isolated(struct lruvec *lruvec,
+				       struct page *page, int delta)
+{
+	int file = page_is_file_cache(page);
+	unsigned long flags;
+
+	lock_lruvec(lruvec, &flags);
+	lruvec->pages_count[LRU_ISOLATED + file] += delta;
+	unlock_lruvec(lruvec, &flags);
+}
+
 /**
  * mem_cgroup_move_account - move account of the page
  * @page: the page
@@ -2452,6 +2463,7 @@ static int mem_cgroup_move_parent(struct page *page,
 	struct mem_cgroup *parent;
 	unsigned int nr_pages;
 	unsigned long uninitialized_var(flags);
+	struct lruvec *lruvec;
 	int ret;
 
 	/* Is ROOT ? */
@@ -2471,6 +2483,8 @@ static int mem_cgroup_move_parent(struct page *page,
 	if (ret)
 		goto put_back;
 
+	lruvec = page_lruvec(page);
+
 	if (nr_pages > 1)
 		flags = compound_lock_irqsave(page);
 
@@ -2480,8 +2494,11 @@ static int mem_cgroup_move_parent(struct page *page,
 
 	if (nr_pages > 1)
 		compound_unlock_irqrestore(page, flags);
+	if (!ret)
+		/* This also stabilize PageLRU() sign for lruvec lock holder. */
+		mem_cgroup_adjust_isolated(lruvec, page, -nr_pages);
 put_back:
-	putback_lru_page(page);
+	__putback_lru_page(page, !ret);
 put:
 	put_page(page);
 out:
@@ -3879,6 +3896,8 @@ enum {
 	MCS_INACTIVE_FILE,
 	MCS_ACTIVE_FILE,
 	MCS_UNEVICTABLE,
+	MCS_ISOLATED_ANON,
+	MCS_ISOLATED_FILE,
 	NR_MCS_STAT,
 };
 
@@ -3902,7 +3921,9 @@ struct {
 	{"active_anon", "total_active_anon"},
 	{"inactive_file", "total_inactive_file"},
 	{"active_file", "total_active_file"},
-	{"unevictable", "total_unevictable"}
+	{"unevictable", "total_unevictable"},
+	{"isolated_anon", "total_isolated_anon"},
+	{"isolated_file", "total_isolated_file"},
 };
 
 
@@ -3942,6 +3963,10 @@ mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
 	s->stat[MCS_ACTIVE_FILE] += val * PAGE_SIZE;
 	val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
 	s->stat[MCS_UNEVICTABLE] += val * PAGE_SIZE;
+	val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ISOLATED_ANON));
+	s->stat[MCS_ISOLATED_ANON] += val * PAGE_SIZE;
+	val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ISOLATED_FILE));
+	s->stat[MCS_ISOLATED_FILE] += val * PAGE_SIZE;
 }
 
 static void
@@ -5243,6 +5268,7 @@ retry:
 		struct page *page;
 		struct page_cgroup *pc;
 		swp_entry_t ent;
+		struct lruvec *lruvec;
 
 		if (!mc.precharge)
 			break;
@@ -5253,14 +5279,17 @@ retry:
 			page = target.page;
 			if (isolate_lru_page(page))
 				goto put;
+			lruvec = page_lruvec(page);
 			pc = lookup_page_cgroup(page);
 			if (!mem_cgroup_move_account(page, 1, pc,
 						     mc.from, mc.to, false)) {
 				mc.precharge--;
 				/* we uncharge from mc.from later. */
 				mc.moved_charge++;
-			}
-			putback_lru_page(page);
+				mem_cgroup_adjust_isolated(lruvec, page, -1);
+				__putback_lru_page(page, false);
+			} else
+				__putback_lru_page(page, true);
 put:			/* is_target_pte_for_mc() gets the page */
 			put_page(page);
 			break;
diff --git a/mm/migrate.c b/mm/migrate.c
index df141f6..de13a0e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -868,7 +868,7 @@ out:
 	 * Move the new page to the LRU. If migration was not successful
 	 * then this will free the page.
 	 */
-	putback_lru_page(newpage);
+	__putback_lru_page(newpage, false);
 	if (result) {
 		if (rc)
 			*result = rc;
diff --git a/mm/rmap.c b/mm/rmap.c
index aa547d4..06b5def9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1139,7 +1139,7 @@ void page_add_new_anon_rmap(struct page *page,
 	if (page_evictable(page, vma))
 		lru_cache_add_lru(page, LRU_ACTIVE_ANON);
 	else
-		add_page_to_unevictable_list(page);
+		add_page_to_unevictable_list(page, false);
 }
 
 /**
diff --git a/mm/swap.c b/mm/swap.c
index 3689e3d..998c71c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -37,6 +37,8 @@
 int page_cluster;
 
 static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
+static DEFINE_PER_CPU(struct pagevec[NR_EVICTABLE_LRU_LISTS],
+					   lru_add_isolated_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
 
@@ -381,6 +383,67 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
 	__lru_cache_add(page, lru);
 }
 
+static void __lru_add_isolated_fn(struct lruvec *lruvec,
+				  struct page *page, void *arg)
+{
+	enum lru_list lru = (enum lru_list)arg;
+
+	VM_BUG_ON(PageActive(page));
+	VM_BUG_ON(PageUnevictable(page));
+	VM_BUG_ON(PageLRU(page));
+
+	SetPageLRU(page);
+	if (is_active_lru(lru))
+		SetPageActive(page);
+	update_page_reclaim_stat(lruvec, lru);
+	add_page_to_lru_list(lruvec, page, lru);
+	lruvec->pages_count[LRU_ISOLATED + is_file_lru(lru)] -=
+						hpage_nr_pages(page);
+}
+
+static void __lru_add_isolated(struct pagevec *pvec, enum lru_list lru)
+{
+	VM_BUG_ON(is_unevictable_lru(lru));
+	pagevec_lru_move_fn(pvec, __lru_add_isolated_fn, (void *)lru);
+}
+
+/**
+ * add_page_to_evictable_list - add page to lru list
+ * @page	the page to be added into the lru list
+ * @lru		lru list id
+ * @isolated	need to adjust isolated pages counter
+ *
+ * Like lru_cache_add_lru() but reuses caller's reference to page and
+ * taking care about isolated pages counter on lruvec if isolated = true.
+ */
+void add_page_to_evictable_list(struct page *page,
+				enum lru_list lru, bool isolated)
+{
+	struct pagevec *pvec;
+
+	if (PageActive(page)) {
+		VM_BUG_ON(PageUnevictable(page));
+		ClearPageActive(page);
+	} else if (PageUnevictable(page)) {
+		VM_BUG_ON(PageActive(page));
+		ClearPageUnevictable(page);
+	}
+
+	VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
+
+	preempt_disable();
+	if (isolated) {
+		pvec = __this_cpu_ptr(lru_add_isolated_pvecs + lru);
+		if (!pagevec_add(pvec, page))
+			__lru_add_isolated(pvec, lru);
+	} else {
+		pvec = __this_cpu_ptr(lru_add_pvecs + lru);
+		if (!pagevec_add(pvec, page))
+			__pagevec_lru_add(pvec, lru);
+	}
+	preempt_enable();
+}
+
 /**
  * add_page_to_unevictable_list - add a page to the unevictable list
  * @page:  the page to be added to the unevictable list
@@ -391,7 +454,7 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
  * while it's locked or otherwise "invisible" to other tasks.  This is
  * difficult to do when using the pagevec cache, so bypass that.
  */
-void add_page_to_unevictable_list(struct page *page)
+void add_page_to_unevictable_list(struct page *page, bool isolated)
 {
 	struct lruvec *lruvec;
 
@@ -399,6 +462,10 @@ void add_page_to_unevictable_list(struct page *page)
 	SetPageUnevictable(page);
 	SetPageLRU(page);
 	add_page_to_lru_list(lruvec, page, LRU_UNEVICTABLE);
+	if (isolated) {
+		int type = LRU_ISOLATED + page_is_file_cache(page);
+		lruvec->pages_count[type] -= hpage_nr_pages(page);
+	}
 	unlock_lruvec_irq(lruvec);
 }
 
@@ -485,6 +552,13 @@ static void drain_cpu_pagevecs(int cpu)
 			__pagevec_lru_add(pvec, lru);
 	}
 
+	pvecs = per_cpu(lru_add_isolated_pvecs, cpu);
+	for_each_evictable_lru(lru) {
+		pvec = &pvecs[lru];
+		if (pagevec_count(pvec))
+			__lru_add_isolated(pvec, lru);
+	}
+
 	pvec = &per_cpu(lru_rotate_pvecs, cpu);
 	if (pagevec_count(pvec)) {
 		unsigned long flags;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6eeeb4b..a1ff010 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -42,6 +42,7 @@
 #include <linux/sysctl.h>
 #include <linux/oom.h>
 #include <linux/prefetch.h>
+#include <trace/events/kmem.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -585,15 +586,16 @@ int remove_mapping(struct address_space *mapping, struct page *page)
 }
 
 /**
- * putback_lru_page - put previously isolated page onto appropriate LRU list
+ * __putback_lru_page - put previously isolated page onto appropriate LRU list
  * @page: page to be put back to appropriate lru list
+ * @isolated: isolated pages counter update required
  *
  * Add previously isolated @page to appropriate LRU list.
  * Page may still be unevictable for other reasons.
  *
  * lru_lock must not be held, interrupts must be enabled.
  */
-void putback_lru_page(struct page *page)
+void __putback_lru_page(struct page *page, bool isolated)
 {
 	int lru;
 	int active = !!TestClearPageActive(page);
@@ -612,14 +614,16 @@ redo:
 		 * We know how to handle that.
 		 */
 		lru = active + page_lru_base_type(page);
-		lru_cache_add_lru(page, lru);
+		add_page_to_evictable_list(page, lru, isolated);
+		if (was_unevictable)
+			count_vm_event(UNEVICTABLE_PGRESCUED);
 	} else {
 		/*
 		 * Put unevictable pages directly on zone's unevictable
 		 * list.
 		 */
 		lru = LRU_UNEVICTABLE;
-		add_page_to_unevictable_list(page);
+		add_page_to_unevictable_list(page, isolated);
 		/*
 		 * When racing with an mlock or AS_UNEVICTABLE clearing
 		 * (page is unlocked) make sure that if the other thread
@@ -631,30 +635,26 @@ redo:
 		 * The other side is TestClearPageMlocked() or shmem_lock().
 		 */
 		smp_mb();
-	}
-
-	/*
-	 * page's status can change while we move it among lru. If an evictable
-	 * page is on unevictable list, it never be freed. To avoid that,
-	 * check after we added it to the list, again.
-	 */
-	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
-		if (!isolate_lru_page(page)) {
-			put_page(page);
-			goto redo;
-		}
-		/* This means someone else dropped this page from LRU
-		 * So, it will be freed or putback to LRU again. There is
-		 * nothing to do here.
+		/*
+		 * page's status can change while we move it among lru.
+		 * If an evictable page is on unevictable list, it never be freed.
+		 * To avoid that, check after we added it to the list, again.
 		 */
+		if (page_evictable(page, NULL)) {
+			if (!isolate_lru_page(page)) {
+				isolated = true;
+				put_page(page);
+				goto redo;
+			}
+			/* This means someone else dropped this page from LRU
+			 * So, it will be freed or putback to LRU again. There is
+			 * nothing to do here.
+			 */
+		}
+		put_page(page);		/* drop ref from isolate */
+		if (!was_unevictable)
+			count_vm_event(UNEVICTABLE_PGCULLED);
 	}
-
-	if (was_unevictable && lru != LRU_UNEVICTABLE)
-		count_vm_event(UNEVICTABLE_PGRESCUED);
-	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
-		count_vm_event(UNEVICTABLE_PGCULLED);
-
-	put_page(page);		/* drop ref from isolate */
 }
 
 enum page_references {
@@ -724,6 +724,48 @@ static enum page_references page_check_references(struct page *page,
 }
 
 /*
+ * Free a list of isolated 0-order pages
+ */
+static void free_isolated_page_list(struct lruvec *lruvec,
+				    struct list_head *list, int cold)
+{
+	struct page *page, *next;
+	unsigned long nr_pages[2];
+	struct list_head queue;
+
+again:
+	INIT_LIST_HEAD(&queue);
+	nr_pages[0] = nr_pages[1] = 0;
+
+	list_for_each_entry_safe(page, next, list, lru) {
+		if (unlikely(lruvec != page_lruvec(page))) {
+			list_add_tail(&page->lru, &queue);
+			continue;
+		}
+		nr_pages[page_is_file_cache(page)]++;
+		trace_mm_page_free_batched(page, cold);
+		free_hot_cold_page(page, cold);
+	}
+
+	lock_lruvec_irq(lruvec);
+	lruvec->pages_count[LRU_ISOLATED_ANON] -= nr_pages[0];
+	lruvec->pages_count[LRU_ISOLATED_FILE] -= nr_pages[1];
+	unlock_lruvec_irq(lruvec);
+
+	/*
+	 * Usually there will be only one iteration, because
+	 * at 0-order reclaim all pages are from one lruvec
+	 * if we didn't raced with memory cgroup shuffling.
+	 */
+	if (unlikely(!list_empty(&queue))) {
+		list_replace(&queue, list);
+		lruvec = page_lruvec(list_first_entry(list,
+					struct page, lru));
+		goto again;
+	}
+}
+
+/*
  * shrink_page_list() returns the number of reclaimed pages
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
@@ -986,7 +1028,7 @@ keep_lumpy:
 	if (nr_dirty && nr_dirty == nr_congested && global_reclaim(sc))
 		zone_set_flag(lruvec_zone(lruvec), ZONE_CONGESTED);
 
-	free_hot_cold_page_list(&free_pages, 1);
+	free_isolated_page_list(lruvec, &free_pages, 1);
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
@@ -1206,11 +1248,14 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
 				unsigned int isolated_pages;
 				int cursor_lru = page_lru(cursor_page);
+				int cur_file = page_is_file_cache(cursor_page);
 
 				list_move(&cursor_page->lru, dst);
 				isolated_pages = hpage_nr_pages(cursor_page);
 				cursor_lruvec->pages_count[cursor_lru] -=
 								isolated_pages;
+				cursor_lruvec->pages_count[LRU_ISOLATED +
+						cur_file] += isolated_pages;
 				VM_BUG_ON((long)cursor_lruvec->
 						pages_count[cursor_lru] < 0);
 				nr_taken += isolated_pages;
@@ -1248,6 +1293,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	}
 
 	lruvec->pages_count[lru] -= nr_taken - nr_lumpy_taken;
+	lruvec->pages_count[LRU_ISOLATED + file] += nr_taken - nr_lumpy_taken;
 	VM_BUG_ON((long)lruvec->pages_count[lru] < 0);
 
 	*nr_scanned = scan;
@@ -1296,11 +1342,14 @@ int isolate_lru_page(struct page *page)
 
 		lruvec = lock_page_lruvec_irq(page);
 		if (PageLRU(page)) {
+			int file = page_is_file_cache(page);
 			int lru = page_lru(page);
 			ret = 0;
 			get_page(page);
 			ClearPageLRU(page);
 			del_page_from_lru_list(lruvec, page, lru);
+			lruvec->pages_count[LRU_ISOLATED + file] +=
+							hpage_nr_pages(page);
 		}
 		unlock_lruvec_irq(lruvec);
 	}
@@ -1347,7 +1396,7 @@ putback_inactive_pages(struct lruvec *lruvec,
 	 */
 	while (!list_empty(page_list)) {
 		struct page *page = lru_to_page(page_list);
-		int lru;
+		int numpages, lru, file;
 
 		VM_BUG_ON(PageLRU(page));
 		list_del(&page->lru);
@@ -1363,13 +1412,13 @@ putback_inactive_pages(struct lruvec *lruvec,
 
 		SetPageLRU(page);
 		lru = page_lru(page);
+		file = is_file_lru(lru);
+		numpages = hpage_nr_pages(page);
 
 		add_page_to_lru_list(lruvec, page, lru);
-		if (is_active_lru(lru)) {
-			int file = is_file_lru(lru);
-			int numpages = hpage_nr_pages(page);
+		lruvec->pages_count[LRU_ISOLATED + file] -= numpages;
+		if (is_active_lru(lru))
 			reclaim_stat->recent_rotated[file] += numpages;
-		}
 		if (put_page_testzero(page)) {
 			__ClearPageLRU(page);
 			__ClearPageActive(page);
@@ -1656,6 +1705,9 @@ move_active_pages_to_lru(struct lruvec *lruvec,
 		list_move(&page->lru, &lruvec->pages_lru[lru]);
 		numpages = hpage_nr_pages(page);
 		lruvec->pages_count[lru] += numpages;
+		/* There should be no mess between file and anon pages */
+		lruvec->pages_count[LRU_ISOLATED +
+				    is_file_lru(lru)] -= numpages;
 		pgmoved += numpages;
 
 		if (put_page_testzero(page)) {


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 19/21] memcg: check lru vectors emptiness in pre-destroy
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (17 preceding siblings ...)
  2012-02-23 13:53 ` [PATCH v3 18/21] mm: add to lruvec isolated pages counters Konstantin Khlebnikov
@ 2012-02-23 13:53 ` Konstantin Khlebnikov
  2012-02-28  1:43   ` KAMEZAWA Hiroyuki
  2012-02-23 13:53 ` [PATCH v3 20/21] mm: split zone->lru_lock Konstantin Khlebnikov
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:53 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

We must abort cgroup destroying if it still not empty,
resource counter cannot catch isolated uncharged pages.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/memcontrol.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4de8044..fbeff85 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4859,8 +4859,16 @@ free_out:
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	int ret;
+
+	ret = mem_cgroup_force_empty(memcg, false);
+	if (ret)
+		return ret;
 
-	return mem_cgroup_force_empty(memcg, false);
+	if (mem_cgroup_nr_lru_pages(memcg, -1))
+		return -EBUSY;
+
+	return 0;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 20/21] mm: split zone->lru_lock
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (18 preceding siblings ...)
  2012-02-23 13:53 ` [PATCH v3 19/21] memcg: check lru vectors emptiness in pre-destroy Konstantin Khlebnikov
@ 2012-02-23 13:53 ` Konstantin Khlebnikov
  2012-02-28  1:49   ` KAMEZAWA Hiroyuki
  2012-02-23 13:53 ` [PATCH v3 21/21] mm: zone lru vectors interleaving Konstantin Khlebnikov
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:53 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Looks like all ready for splitting zone->lru_lock into per-lruvec pieces.

lruvec locking loop protected with rcu, actually there is irq-disabling instead
of rcu_read_lock(). Memory controller already releases its lru-vectors after
syncronize_rcu() in cgroup_diput(). Probably it should be replaced with synchronize_sched()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mmzone.h |    3 +-
 mm/compaction.c        |    2 +
 mm/internal.h          |   66 +++++++++++++++++++++++++-----------------------
 mm/page_alloc.c        |    2 +
 mm/swap.c              |    2 +
 5 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2e3a298..9880150 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -304,6 +304,8 @@ struct zone_reclaim_stat {
 };
 
 struct lruvec {
+	spinlock_t		lru_lock;
+
 	struct list_head	pages_lru[NR_LRU_LISTS];
 	unsigned long		pages_count[NR_LRU_COUNTERS];
 
@@ -386,7 +388,6 @@ struct zone {
 	ZONE_PADDING(_pad1_)
 
 	/* Fields commonly accessed by the page reclaim scanner */
-	spinlock_t		lru_lock;
 	struct lruvec		lruvec;
 
 	unsigned long		pages_scanned;	   /* since last reclaim */
diff --git a/mm/compaction.c b/mm/compaction.c
index fa74cbe..8661bb58 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -306,7 +306,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			lruvec = NULL;
 		}
 		if (need_resched() ||
-		    (lruvec && spin_is_contended(&zone->lru_lock))) {
+		    (lruvec && spin_is_contended(&lruvec->lru_lock))) {
 			if (lruvec)
 				unlock_lruvec_irq(lruvec);
 			lruvec = NULL;
diff --git a/mm/internal.h b/mm/internal.h
index 6dd2e70..9a9fd53 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -15,27 +15,27 @@
 
 static inline void lock_lruvec(struct lruvec *lruvec, unsigned long *flags)
 {
-	spin_lock_irqsave(&lruvec_zone(lruvec)->lru_lock, *flags);
+	spin_lock_irqsave(&lruvec->lru_lock, *flags);
 }
 
 static inline void lock_lruvec_irq(struct lruvec *lruvec)
 {
-	spin_lock_irq(&lruvec_zone(lruvec)->lru_lock);
+	spin_lock_irq(&lruvec->lru_lock);
 }
 
 static inline void unlock_lruvec(struct lruvec *lruvec, unsigned long *flags)
 {
-	spin_unlock_irqrestore(&lruvec_zone(lruvec)->lru_lock, *flags);
+	spin_unlock_irqrestore(&lruvec->lru_lock, *flags);
 }
 
 static inline void unlock_lruvec_irq(struct lruvec *lruvec)
 {
-	spin_unlock_irq(&lruvec_zone(lruvec)->lru_lock);
+	spin_unlock_irq(&lruvec->lru_lock);
 }
 
 static inline void wait_lruvec_unlock(struct lruvec *lruvec)
 {
-	spin_unlock_wait(&lruvec_zone(lruvec)->lru_lock);
+	spin_unlock_wait(&lruvec->lru_lock);
 }
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
@@ -46,37 +46,39 @@ static inline void wait_lruvec_unlock(struct lruvec *lruvec)
 static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
 						  struct page *page)
 {
-	/* Currenyly only one lru_lock per-zone */
-	return page_lruvec(page);
+	struct lruvec *lruvec;
+
+	do {
+		lruvec = page_lruvec(page);
+		if (likely(lruvec == locked_lruvec))
+			return lruvec;
+		spin_unlock(&locked_lruvec->lru_lock);
+		spin_lock(&lruvec->lru_lock);
+		locked_lruvec = lruvec;
+	} while (1);
 }
 
 static inline struct lruvec *relock_page_lruvec_irq(struct lruvec *lruvec,
 						    struct page *page)
 {
-	struct zone *zone = page_zone(page);
-
 	if (!lruvec) {
-		spin_lock_irq(&zone->lru_lock);
-	} else if (zone != lruvec_zone(lruvec)) {
-		unlock_lruvec_irq(lruvec);
-		spin_lock_irq(&zone->lru_lock);
+		local_irq_disable();
+		lruvec = page_lruvec(page);
+		spin_lock(&lruvec->lru_lock);
 	}
-	return page_lruvec(page);
+	return __relock_page_lruvec(lruvec, page);
 }
 
 static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
 						struct page *page,
 						unsigned long *flags)
 {
-	struct zone *zone = page_zone(page);
-
 	if (!lruvec) {
-		spin_lock_irqsave(&zone->lru_lock, *flags);
-	} else if (zone != lruvec_zone(lruvec)) {
-		unlock_lruvec(lruvec, flags);
-		spin_lock_irqsave(&zone->lru_lock, *flags);
+		local_irq_save(*flags);
+		lruvec = page_lruvec(page);
+		spin_lock(&lruvec->lru_lock);
 	}
-	return page_lruvec(page);
+	return __relock_page_lruvec(lruvec, page);
 }
 
 /*
@@ -87,22 +89,24 @@ static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
 static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
 					  struct page *page)
 {
-	struct zone *zone;
 	bool ret = false;
 
+	rcu_read_lock();
+	/*
+	 * If we see there PageLRU(), it means page has valid lruvec link.
+	 * We need protect whole operation with single rcu-interval, otherwise
+	 * lruvec which hold this LRU sign can run out before we secure it.
+	 */
 	if (PageLRU(page)) {
 		if (!*lruvec) {
-			zone = page_zone(page);
-			spin_lock_irq(&zone->lru_lock);
-		} else
-			zone = lruvec_zone(*lruvec);
-
-		if (PageLRU(page)) {
 			*lruvec = page_lruvec(page);
+			lock_lruvec_irq(*lruvec);
+		}
+		*lruvec = __relock_page_lruvec(*lruvec, page);
+		if (PageLRU(page))
 			ret = true;
-		} else
-			*lruvec = &zone->lruvec;
 	}
+	rcu_read_unlock();
 
 	return ret;
 }
@@ -110,7 +114,7 @@ static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
 /* Wait for lruvec unlock before locking other lruvec for the same page */
 static inline void __wait_lruvec_unlock(struct lruvec *lruvec)
 {
-	/* Currently only one lru_lock per-zone */
+	wait_lruvec_unlock(lruvec);
 }
 
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ab42446..beadcc9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4294,6 +4294,7 @@ void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
 	enum lru_list lru;
 
 	memset(lruvec, 0, sizeof(struct lruvec));
+	spin_lock_init(&lruvec->lru_lock);
 	for_each_lru(lru)
 		INIT_LIST_HEAD(&lruvec->pages_lru[lru]);
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
@@ -4369,7 +4370,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 #endif
 		zone->name = zone_names[j];
 		spin_lock_init(&zone->lock);
-		spin_lock_init(&zone->lru_lock);
 		zone_seqlock_init(zone);
 		zone->zone_pgdat = pgdat;
 
diff --git a/mm/swap.c b/mm/swap.c
index 998c71c..8156181 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -700,7 +700,7 @@ void lru_add_page_tail(struct lruvec *lruvec,
 	VM_BUG_ON(!PageHead(page));
 	VM_BUG_ON(PageCompound(page_tail));
 	VM_BUG_ON(PageLRU(page_tail));
-	VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&lruvec_zone(lruvec)->lru_lock));
+	VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&lruvec->lru_lock));
 
 	SetPageLRU(page_tail);
 


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 21/21] mm: zone lru vectors interleaving
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (19 preceding siblings ...)
  2012-02-23 13:53 ` [PATCH v3 20/21] mm: split zone->lru_lock Konstantin Khlebnikov
@ 2012-02-23 13:53 ` Konstantin Khlebnikov
  2012-02-23 14:44   ` Hillf Danton
  2012-02-23 16:21   ` Andi Kleen
  2012-02-25  0:05 ` [PATCH v3 00/21] mm: lru_lock splitting Tim Chen
                   ` (2 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 13:53 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Split zones into several lru vectors with pfn-based interleaving.
Thus we can redeuce lru_lock contention without using cgroups.

By default there 4 lru with 16Mb interleaving.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/huge_mm.h    |    3 ++
 include/linux/memcontrol.h |    2 +-
 include/linux/mm.h         |   45 +++++++++++++++++++++++++++++------
 include/linux/mmzone.h     |    4 ++-
 mm/Kconfig                 |   16 +++++++++++++
 mm/internal.h              |   19 ++++++++++++++-
 mm/memcontrol.c            |   56 ++++++++++++++++++++++++--------------------
 mm/page_alloc.c            |    7 +++---
 mm/vmscan.c                |   18 ++++++++++----
 9 files changed, 124 insertions(+), 46 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1b92129..3a45cb3 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -107,6 +107,9 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd);
 #if HPAGE_PMD_ORDER > MAX_ORDER
 #error "hugepages can't be allocated by the buddy allocator"
 #endif
+#if HPAGE_PMD_ORDER > CONFIG_PAGE_LRU_INTERLEAVING
+#error "zone lru interleaving order lower than huge page order"
+#endif
 extern int hugepage_madvise(struct vm_area_struct *vma,
 			    unsigned long *vm_flags, int advice);
 extern void __vma_adjust_trans_huge(struct vm_area_struct *vma,
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c3e46b0..b137d4c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -199,7 +199,7 @@ static inline void mem_cgroup_uncharge_cache_page(struct page *page)
 static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
 						    struct mem_cgroup *memcg)
 {
-	return &zone->lruvec;
+	return zone->lruvec;
 }
 
 static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c6dc4ab..d14db10 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -728,12 +728,46 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
 #endif
 }
 
+#if CONFIG_PAGE_LRU_SPLIT == 1
+
+static inline int page_lruvec_id(struct page *page)
+{
+	return 0;
+}
+
+#else /* CONFIG_PAGE_LRU_SPLIT */
+
+static inline int page_lruvec_id(struct page *page)
+{
+
+	unsigned long pfn = page_to_pfn(page);
+
+	return (pfn >> CONFIG_PAGE_LRU_INTERLEAVING) % CONFIG_PAGE_LRU_SPLIT;
+}
+
+#endif /* CONFIG_PAGE_LRU_SPLIT */
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
-/* Multiple lruvecs in zone */
+/* Dynamic page to lruvec mapping */
 
 extern struct lruvec *page_lruvec(struct page *page);
 
+#else
+
+/* Fixed page to lruvecs mapping */
+
+static inline struct lruvec *page_lruvec(struct page *page)
+{
+	return page_zone(page)->lruvec + page_lruvec_id(page);
+}
+
+#endif
+
+#if defined(CONFIG_CGROUP_MEM_RES_CTLR) || (CONFIG_PAGE_LRU_SPLIT != 1)
+
+/* Multiple lruvecs in zone */
+
 static inline struct zone *lruvec_zone(struct lruvec *lruvec)
 {
 	return lruvec->zone;
@@ -744,15 +778,10 @@ static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
 	return lruvec->node;
 }
 
-#else /* CONFIG_CGROUP_MEM_RES_CTLR */
+#else /* defined(CONFIG_CGROUP_MEM_RES_CTLR) || (CONFIG_PAGE_LRU_SPLIT != 1) */
 
 /* Single lruvec in zone */
 
-static inline struct lruvec *page_lruvec(struct page *page)
-{
-	return &page_zone(page)->lruvec;
-}
-
 static inline struct zone *lruvec_zone(struct lruvec *lruvec)
 {
 	return container_of(lruvec, struct zone, lruvec);
@@ -763,7 +792,7 @@ static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
 	return lruvec_zone(lruvec)->zone_pgdat;
 }
 
-#endif /* CONFIG_CGROUP_MEM_RES_CTLR */
+#endif /* defined(CONFIG_CGROUP_MEM_RES_CTLR) || (CONFIG_PAGE_LRU_SPLIT != 1) */
 
 /*
  * Some inline functions in vmstat.h depend on page_zone()
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9880150..a52f423 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -311,7 +311,7 @@ struct lruvec {
 
 	struct zone_reclaim_stat	reclaim_stat;
 
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+#if defined(CONFIG_CGROUP_MEM_RES_CTLR) || (CONFIG_PAGE_LRU_SPLIT != 1)
 	struct zone		*zone;
 	struct pglist_data	*node;
 #endif
@@ -388,7 +388,7 @@ struct zone {
 	ZONE_PADDING(_pad1_)
 
 	/* Fields commonly accessed by the page reclaim scanner */
-	struct lruvec		lruvec;
+	struct lruvec		lruvec[CONFIG_PAGE_LRU_SPLIT];
 
 	unsigned long		pages_scanned;	   /* since last reclaim */
 	unsigned long		flags;		   /* zone flags, see below */
diff --git a/mm/Kconfig b/mm/Kconfig
index 2613c91..48ff866 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -183,6 +183,22 @@ config SPLIT_PTLOCK_CPUS
 	default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
 	default "4"
 
+config PAGE_LRU_SPLIT
+	int "Memory lru lists per zone"
+	default	4 if EXPERIMENTAL && SPARSEMEM_VMEMMAP
+	default 1
+	help
+	  The number of lru lists in each memory zone for interleaving.
+	  Allows to redeuce lru_lock contention, but adds some overhead.
+	  Without SPARSEMEM_VMEMMAP might be costly. "1" means no split.
+
+config PAGE_LRU_INTERLEAVING
+	int "Memory lru lists interleaving page-order"
+	default	12
+	help
+	  Page order for lru lists interleaving. By default 12 (16Mb).
+	  Must be greater than huge-page order.
+	  With CONFIG_PAGE_LRU_SPLIT=1 has no effect.
 #
 # support for memory compaction
 config COMPACTION
diff --git a/mm/internal.h b/mm/internal.h
index 9a9fd53..f429911 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -13,6 +13,15 @@
 
 #include <linux/mm.h>
 
+#define for_each_zone_id(zone_id) \
+	for ( zone_id = 0 ; zone_id < MAX_NR_ZONES ; zone_id++ )
+
+#define for_each_lruvec_id(lruvec_id) \
+	for ( lruvec_id = 0 ; lruvec_id < CONFIG_PAGE_LRU_SPLIT ; lruvec_id++ )
+
+#define for_each_zone_and_lruvec_id(zone_id, lruvec_id) \
+	for_each_zone_id(zone_id) for_each_lruvec_id(lruvec_id)
+
 static inline void lock_lruvec(struct lruvec *lruvec, unsigned long *flags)
 {
 	spin_lock_irqsave(&lruvec->lru_lock, *flags);
@@ -125,7 +134,15 @@ static inline void __wait_lruvec_unlock(struct lruvec *lruvec)
 static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
 						  struct page *page)
 {
-	/* Currently ony one lruvec per-zone */
+#if CONFIG_PAGE_LRU_SPLIT != 1
+	struct lruvec *lruvec = page_lruvec(page);
+
+	if (unlikely(lruvec != locked_lruvec)) {
+		spin_unlock(&locked_lruvec->lru_lock);
+		spin_lock(&lruvec->lru_lock);
+		locked_lruvec = lruvec;
+	}
+#endif
 	return locked_lruvec;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fbeff85..59fe4b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -134,7 +134,7 @@ struct mem_cgroup_reclaim_iter {
  * per-zone information in memory controller.
  */
 struct mem_cgroup_per_zone {
-	struct lruvec		lruvec;
+	struct lruvec		lruvec[CONFIG_PAGE_LRU_SPLIT];
 
 	struct mem_cgroup_reclaim_iter reclaim_iter[DEF_PRIORITY + 1];
 
@@ -694,12 +694,15 @@ mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
 	struct mem_cgroup_per_zone *mz;
 	enum lru_list lru;
 	unsigned long ret = 0;
+	int lruvec_id;
 
 	mz = mem_cgroup_zoneinfo(memcg, nid, zid);
 
-	for_each_lru_counter(lru) {
-		if (BIT(lru) & lru_mask)
-			ret += mz->lruvec.pages_count[lru];
+	for_each_lruvec_id(lruvec_id) {
+		for_each_lru_counter(lru) {
+			if (BIT(lru) & lru_mask)
+				ret += mz->lruvec[lruvec_id].pages_count[lru];
+		}
 	}
 	return ret;
 }
@@ -995,7 +998,7 @@ out:
 EXPORT_SYMBOL(mem_cgroup_count_vm_event);
 
 /**
- * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg
+ * mem_cgroup_zone_lruvec - get the array of lruvecs for a zone and memcg
  * @zone: zone of the wanted lruvec
  * @mem: memcg of the wanted lruvec
  *
@@ -1009,10 +1012,10 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
 	struct mem_cgroup_per_zone *mz;
 
 	if (mem_cgroup_disabled())
-		return &zone->lruvec;
+		return zone->lruvec;
 
 	mz = mem_cgroup_zoneinfo(memcg, zone_to_nid(zone), zone_idx(zone));
-	return &mz->lruvec;
+	return mz->lruvec;
 }
 
 /**
@@ -1027,14 +1030,15 @@ struct lruvec *page_lruvec(struct page *page)
 {
 	struct mem_cgroup_per_zone *mz;
 	struct page_cgroup *pc;
+	int lruvec_id = page_lruvec_id(page);
 
 	if (mem_cgroup_disabled())
-		return &page_zone(page)->lruvec;
+		return page_zone(page)->lruvec + lruvec_id;
 
 	pc = lookup_page_cgroup(page);
 	mz = mem_cgroup_zoneinfo(pc->mem_cgroup,
 			page_to_nid(page), page_zonenum(page));
-	return &mz->lruvec;
+	return mz->lruvec + lruvec_id;
 }
 
 /*
@@ -3495,7 +3499,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
  * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
  */
 static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
-				int node, int zid, enum lru_list lru)
+				int node, int zid, int lid, enum lru_list lru)
 {
 	struct mem_cgroup_per_zone *mz;
 	unsigned long flags, loop;
@@ -3507,7 +3511,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 
 	zone = &NODE_DATA(node)->node_zones[zid];
 	mz = mem_cgroup_zoneinfo(memcg, node, zid);
-	lruvec = &mz->lruvec;
+	lruvec = mz->lruvec + lid;
 	list = &lruvec->pages_lru[lru];
 	loop = lruvec->pages_count[lru];
 	/* give some margin against EBUSY etc...*/
@@ -3558,7 +3562,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
 {
 	int ret;
-	int node, zid, shrink;
+	int node, zid, lid, shrink;
 	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
 	struct cgroup *cgrp = memcg->css.cgroup;
 
@@ -3582,18 +3586,17 @@ move_account:
 		ret = 0;
 		mem_cgroup_start_move(memcg);
 		for_each_node_state(node, N_HIGH_MEMORY) {
-			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
+			for_each_zone_and_lruvec_id(zid, lid) {
 				enum lru_list lru;
 				for_each_lru(lru) {
 					ret = mem_cgroup_force_empty_list(memcg,
-							node, zid, lru);
+							node, zid, lid, lru);
 					if (ret)
-						break;
+						goto abort;
 				}
 			}
-			if (ret)
-				break;
 		}
+abort:
 		mem_cgroup_end_move(memcg);
 		memcg_oom_recover(memcg);
 		/* it seems parent cgroup doesn't have enough mem */
@@ -4061,16 +4064,16 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
 
 #ifdef CONFIG_DEBUG_VM
 	{
-		int nid, zid;
+		int nid, zid, lid;
 		struct mem_cgroup_per_zone *mz;
 		struct zone_reclaim_stat *rs;
 		unsigned long recent_rotated[2] = {0, 0};
 		unsigned long recent_scanned[2] = {0, 0};
 
 		for_each_online_node(nid)
-			for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+			for_each_zone_and_lruvec_id(zid, lid) {
 				mz = mem_cgroup_zoneinfo(memcg, nid, zid);
-				rs = &mz->lruvec.reclaim_stat;
+				rs = &mz->lruvec[lid].reclaim_stat;
 
 				recent_rotated[0] += rs->recent_rotated[0];
 				recent_rotated[1] += rs->recent_rotated[1];
@@ -4618,7 +4621,7 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
 	struct mem_cgroup_per_zone *mz;
-	int zone, tmp = node;
+	int zone, lruvec_id, tmp = node;
 	/*
 	 * This routine is called against possible nodes.
 	 * But it's BUG to call kmalloc() against offline node.
@@ -4635,8 +4638,9 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
 		mz = &pn->zoneinfo[zone];
-		init_zone_lruvec(&NODE_DATA(node)->node_zones[zone],
-				 &mz->lruvec);
+		for_each_lruvec_id(lruvec_id)
+			init_zone_lruvec(&NODE_DATA(node)->node_zones[zone],
+					 &mz->lruvec[lruvec_id]);
 		mz->usage_in_excess = 0;
 		mz->on_tree = false;
 		mz->memcg = memcg;
@@ -4648,13 +4652,13 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 static void free_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn = memcg->info.nodeinfo[node];
-	int zone;
+	int zone, lruvec;
 
 	if (!pn)
 		return;
 
-	for (zone = 0; zone < MAX_NR_ZONES; zone++)
-		wait_lruvec_unlock(&pn->zoneinfo[zone].lruvec);
+	for_each_zone_and_lruvec_id(zone, lruvec)
+		wait_lruvec_unlock(&pn->zoneinfo[zone].lruvec[lruvec]);
 
 	kfree(pn);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index beadcc9..9b0cc92 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4297,7 +4297,7 @@ void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
 	spin_lock_init(&lruvec->lru_lock);
 	for_each_lru(lru)
 		INIT_LIST_HEAD(&lruvec->pages_lru[lru]);
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+#if defined(CONFIG_CGROUP_MEM_RES_CTLR) || (CONFIG_PAGE_LRU_SPLIT != 1)
 	lruvec->node = zone->zone_pgdat;
 	lruvec->zone = zone;
 #endif
@@ -4312,7 +4312,7 @@ void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
 static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		unsigned long *zones_size, unsigned long *zholes_size)
 {
-	enum zone_type j;
+	enum zone_type j, lruvec_id;
 	int nid = pgdat->node_id;
 	unsigned long zone_start_pfn = pgdat->node_start_pfn;
 	int ret;
@@ -4374,7 +4374,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		zone->zone_pgdat = pgdat;
 
 		zone_pcp_init(zone);
-		init_zone_lruvec(zone, &zone->lruvec);
+		for_each_lruvec_id(lruvec_id)
+			init_zone_lruvec(zone, &zone->lruvec[lruvec_id]);
 		zap_zone_vm_stats(zone);
 		zone->flags = 0;
 		if (!size)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a1ff010..aaf2b0e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2202,12 +2202,14 @@ static void shrink_zone(int priority, struct zone *zone,
 	};
 	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
+	int lruvec_id;
 
 	memcg = mem_cgroup_iter(root, NULL, &reclaim);
 	do {
 		lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 
-		shrink_lruvec(priority, lruvec, sc);
+		for_each_lruvec_id(lruvec_id)
+			shrink_lruvec(priority, lruvec + lruvec_id, sc);
 
 		/*
 		 * Limit reclaim has historically picked one memcg and
@@ -2529,6 +2531,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
 		.target_mem_cgroup = memcg,
 	};
 	struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
+	int lruvec_id;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -2544,7 +2547,8 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
 	 * will pick up pages from other mem cgroup's as well. We hack
 	 * the priority and make it zero.
 	 */
-	shrink_lruvec(0, lruvec, &sc);
+	for_each_lruvec_id(lruvec_id)
+		shrink_lruvec(0, lruvec + lruvec_id, &sc);
 
 	trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
 
@@ -2599,6 +2603,7 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc,
 			    int priority)
 {
 	struct mem_cgroup *memcg;
+	int lruvec_id;
 
 	if (!total_swap_pages)
 		return;
@@ -2607,9 +2612,12 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc,
 	do {
 		struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 
-		if (inactive_anon_is_low(lruvec))
-			shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
-					   sc, priority, 0);
+		for_each_lruvec_id(lruvec_id) {
+			if (inactive_anon_is_low(lruvec + lruvec_id))
+				shrink_active_list(SWAP_CLUSTER_MAX,
+						   lruvec + lruvec_id,
+						   sc, priority, 0);
+		}
 
 		memcg = mem_cgroup_iter(NULL, memcg, NULL);
 	} while (memcg);


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 21/21] mm: zone lru vectors interleaving
  2012-02-23 13:53 ` [PATCH v3 21/21] mm: zone lru vectors interleaving Konstantin Khlebnikov
@ 2012-02-23 14:44   ` Hillf Danton
  2012-02-23 16:21   ` Andi Kleen
  1 sibling, 0 replies; 65+ messages in thread
From: Hillf Danton @ 2012-02-23 14:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki, Andi Kleen

On Thu, Feb 23, 2012 at 9:53 PM, Konstantin Khlebnikov
<khlebnikov@openvz.org> wrote:
> @@ -4312,7 +4312,7 @@ void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
>  static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>                unsigned long *zones_size, unsigned long *zholes_size)
>  {
> -       enum zone_type j;
> +       enum zone_type j, lruvec_id;

Like other cases in the patch,

          int lruvec_id;

looks clearer

>        int nid = pgdat->node_id;
>        unsigned long zone_start_pfn = pgdat->node_start_pfn;
>        int ret;
> @@ -4374,7 +4374,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>                zone->zone_pgdat = pgdat;
>
>                zone_pcp_init(zone);
> -               init_zone_lruvec(zone, &zone->lruvec);
> +               for_each_lruvec_id(lruvec_id)
> +                       init_zone_lruvec(zone, &zone->lruvec[lruvec_id]);
>                zap_zone_vm_stats(zone);
>                zone->flags = 0;
>                if (!size)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 21/21] mm: zone lru vectors interleaving
  2012-02-23 13:53 ` [PATCH v3 21/21] mm: zone lru vectors interleaving Konstantin Khlebnikov
  2012-02-23 14:44   ` Hillf Danton
@ 2012-02-23 16:21   ` Andi Kleen
  2012-02-23 18:48     ` [PATCH 1/2] mm: configure lruvec split by boot options Konstantin Khlebnikov
  2012-02-23 18:48     ` [PATCH 2/2] mm: show zone lruvec state in /proc/zoneinfo Konstantin Khlebnikov
  1 sibling, 2 replies; 65+ messages in thread
From: Andi Kleen @ 2012-02-23 16:21 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki, Andi Kleen, tim.c.chen

> +config PAGE_LRU_SPLIT
> +	int "Memory lru lists per zone"
> +	default	4 if EXPERIMENTAL && SPARSEMEM_VMEMMAP
> +	default 1
> +	help
> +	  The number of lru lists in each memory zone for interleaving.
> +	  Allows to redeuce lru_lock contention, but adds some overhead.
> +	  Without SPARSEMEM_VMEMMAP might be costly. "1" means no split.

Could you turn those two numbers into a boot option? Compile time 
parameters are nasty to use.

I suppose it's ok to have an upper limit.

> +
> +config PAGE_LRU_INTERLEAVING
> +	int "Memory lru lists interleaving page-order"
> +	default	12
> +	help
> +	  Page order for lru lists interleaving. By default 12 (16Mb).
> +	  Must be greater than huge-page order.
> +	  With CONFIG_PAGE_LRU_SPLIT=1 has no effect.

-Andi

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical
  2012-02-23 13:51 ` [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical Konstantin Khlebnikov
@ 2012-02-23 18:03   ` Johannes Weiner
  2012-02-23 19:46     ` Konstantin Khlebnikov
  2012-02-28  0:11   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 65+ messages in thread
From: Johannes Weiner @ 2012-02-23 18:03 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Andrew Morton,
	KAMEZAWA Hiroyuki, Andi Kleen

On Thu, Feb 23, 2012 at 05:51:46PM +0400, Konstantin Khlebnikov wrote:
> Check mm-owner cgroup membership hierarchically.

I think this one cat just beat up this other cat in front of my
window, yelling something about money and missing product.  Anyway, I
already forgot why we want this patch.  Could you describe that in the
changelog, please?

> @@ -821,6 +821,26 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
>  				struct mem_cgroup, css);
>  }
>  
> +/**
> + * mm_match_cgroup - cgroup hierarchy mm membership test
> + * @mm		mm_struct to test
> + * @cgroup	target cgroup
> + *
> + * Returns true if mm belong this cgroup or any its child in hierarchy
> + */
> +int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
> +	while (memcg != cgroup && memcg && memcg->use_hierarchy)
> +		memcg = parent_mem_cgroup(memcg);
> +	rcu_read_unlock();
> +
> +	return cgroup == memcg;
> +}

Please don't duplicate mem_cgroup_same_or_subtree()'s functionality in
a worse way.  The hierarchy information is kept in a stack such that
ancestry can be detected in linear time, check out css_is_ancestor().

If you don't want to nest rcu_read_lock(), you could push the
rcu_read_lock() from css_is_ancestor() into its sole user and provide
a __mem_cgroup_is_ancestor() that assumes rcu already read-locked.

No?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 1/2] mm: configure lruvec split by boot options
  2012-02-23 16:21   ` Andi Kleen
@ 2012-02-23 18:48     ` Konstantin Khlebnikov
  2012-02-23 18:48     ` [PATCH 2/2] mm: show zone lruvec state in /proc/zoneinfo Konstantin Khlebnikov
  1 sibling, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 18:48 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

This patch adds boot options:
lruvec_split=%u by default 1, limited by CONFIG_PAGE_LRU_SPLIT
lruvec_interleaving=%u by default CONFIG_PAGE_LRU_INTERLEAVING

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 include/linux/mm.h |    5 ++++-
 mm/internal.h      |    2 +-
 mm/page_alloc.c    |   29 +++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d14db10..f042a34 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -737,12 +737,15 @@ static inline int page_lruvec_id(struct page *page)
 
 #else /* CONFIG_PAGE_LRU_SPLIT */
 
+extern unsigned lruvec_split;
+extern unsigned lruvec_interleaving;
+
 static inline int page_lruvec_id(struct page *page)
 {
 
 	unsigned long pfn = page_to_pfn(page);
 
-	return (pfn >> CONFIG_PAGE_LRU_INTERLEAVING) % CONFIG_PAGE_LRU_SPLIT;
+	return (pfn >> lruvec_interleaving) % lruvec_split;
 }
 
 #endif /* CONFIG_PAGE_LRU_SPLIT */
diff --git a/mm/internal.h b/mm/internal.h
index f429911..be7415b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -17,7 +17,7 @@
 	for ( zone_id = 0 ; zone_id < MAX_NR_ZONES ; zone_id++ )
 
 #define for_each_lruvec_id(lruvec_id) \
-	for ( lruvec_id = 0 ; lruvec_id < CONFIG_PAGE_LRU_SPLIT ; lruvec_id++ )
+	for ( lruvec_id = 0 ; lruvec_id < lruvec_split ; lruvec_id++ )
 
 #define for_each_zone_and_lruvec_id(zone_id, lruvec_id) \
 	for_each_zone_id(zone_id) for_each_lruvec_id(lruvec_id)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9b0cc92..1a899fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4303,6 +4303,35 @@ void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
 #endif
 }
 
+#if CONFIG_PAGE_LRU_SPLIT != 1
+
+unsigned lruvec_split = 1;
+unsigned lruvec_interleaving = CONFIG_PAGE_LRU_INTERLEAVING;
+
+static int __init set_lruvec_split(char *arg)
+{
+	if (!kstrtouint(arg, 0, &lruvec_split) &&
+	    lruvec_split >= 1 &&
+	    lruvec_split <= CONFIG_PAGE_LRU_SPLIT)
+		return 0;
+	lruvec_split = 1;
+	return 1;
+}
+early_param("lruvec_split", set_lruvec_split);
+
+static int __init set_lruvec_interleaving(char *arg)
+{
+	if (!kstrtouint(arg, 0, &lruvec_interleaving) &&
+	    lruvec_interleaving >= HPAGE_PMD_ORDER &&
+	    lruvec_interleaving <= BITS_PER_LONG)
+		return 0;
+	lruvec_split = 1;
+	return 1;
+}
+early_param("lruvec_interleaving", set_lruvec_interleaving);
+
+#endif
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 2/2] mm: show zone lruvec state in /proc/zoneinfo
  2012-02-23 16:21   ` Andi Kleen
  2012-02-23 18:48     ` [PATCH 1/2] mm: configure lruvec split by boot options Konstantin Khlebnikov
@ 2012-02-23 18:48     ` Konstantin Khlebnikov
  1 sibling, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 18:48 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/vmstat.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 2c813e1..2e77a19 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -20,6 +20,8 @@
 #include <linux/writeback.h>
 #include <linux/compaction.h>
 
+#include "internal.h"
+
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
 EXPORT_PER_CPU_SYMBOL(vm_event_states);
@@ -1020,6 +1022,27 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		   "\n  start_pfn:         %lu",
 		   zone->all_unreclaimable,
 		   zone->zone_start_pfn);
+	seq_printf(m, "\n  lruvecs");
+	for_each_lruvec_id(i) {
+		struct lruvec *lruvec = zone->lruvec + i;
+		enum lru_list lru;
+
+		seq_printf(m,
+			   "\n    lruvec: %i",
+			   i);
+		for_each_lru(lru)
+			seq_printf(m,
+			   "\n              %s: %lu",
+			   vmstat_text[NR_LRU_BASE + lru],
+			   lruvec->pages_count[lru]);
+		seq_printf(m,
+			   "\n              %s: %lu"
+			   "\n              %s: %lu",
+			   vmstat_text[NR_ISOLATED_ANON],
+			   lruvec->pages_count[LRU_ISOLATED_ANON],
+			   vmstat_text[NR_ISOLATED_FILE],
+			   lruvec->pages_count[LRU_ISOLATED_FILE]);
+	}
 	seq_putc(m, '\n');
 }
 


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical
  2012-02-23 18:03   ` Johannes Weiner
@ 2012-02-23 19:46     ` Konstantin Khlebnikov
  2012-02-23 22:06       ` Johannes Weiner
  0 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-23 19:46 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hugh Dickins, linux-kernel, linux-mm, Andrew Morton,
	KAMEZAWA Hiroyuki, Andi Kleen

Johannes Weiner wrote:
> On Thu, Feb 23, 2012 at 05:51:46PM +0400, Konstantin Khlebnikov wrote:
>> Check mm-owner cgroup membership hierarchically.
>
> I think this one cat just beat up this other cat in front of my
> window, yelling something about money and missing product.  Anyway, I
> already forgot why we want this patch.  Could you describe that in the
> changelog, please?

Yeah, sorry for lack of comment.

This test is used in rmap walker at checling page referencies in reclaimer.
Memory cgroup shrinker want to skip all referencies outside of cgroup hierarchy
which is currently under reclaim.

Actually this patch does not important for this set and can be dropped without problems,
it does not share any context with other patches. Next patch is more important because
it fixes global reclaimer and required for further cleanups.

>
>> @@ -821,6 +821,26 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
>>   				struct mem_cgroup, css);
>>   }
>>
>> +/**
>> + * mm_match_cgroup - cgroup hierarchy mm membership test
>> + * @mm		mm_struct to test
>> + * @cgroup	target cgroup
>> + *
>> + * Returns true if mm belong this cgroup or any its child in hierarchy
>> + */
>> +int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
>> +{
>> +	struct mem_cgroup *memcg;
>> +
>> +	rcu_read_lock();
>> +	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
>> +	while (memcg != cgroup&&  memcg&&  memcg->use_hierarchy)
>> +		memcg = parent_mem_cgroup(memcg);
>> +	rcu_read_unlock();
>> +
>> +	return cgroup == memcg;
>> +}
>
> Please don't duplicate mem_cgroup_same_or_subtree()'s functionality in
> a worse way.  The hierarchy information is kept in a stack such that
> ancestry can be detected in linear time, check out css_is_ancestor().

Ok, there will be something like that:

+bool mm_match_cgroup(const struct mm_struct *mm,
+                    const struct mem_cgroup *cgroup)
+{
+       struct mem_cgroup *memcg;
+       bool ret;
+
+       rcu_read_lock();
+       memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
+       ret = memcg && mem_cgroup_same_or_subtree(cgroup, memcg);
+       rcu_read_unlock();
+
+       return ret;
+}
+

>
> If you don't want to nest rcu_read_lock(), you could push the
> rcu_read_lock() from css_is_ancestor() into its sole user and provide
> a __mem_cgroup_is_ancestor() that assumes rcu already read-locked.
>
> No?

It is not a problem.

looks like mem_cgroup_same_or_subtree() check something different,
because it does not check ->use_hierarchy flag on tested cgroup, only on target cgroup.

Or just all this hierarchical stuff is out of sync in different parts of code.
For example memcg_get_hierarchical_limit() start from deepest cgroup and go upper
while ->use_hierarchy is set.

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical
  2012-02-23 19:46     ` Konstantin Khlebnikov
@ 2012-02-23 22:06       ` Johannes Weiner
  0 siblings, 0 replies; 65+ messages in thread
From: Johannes Weiner @ 2012-02-23 22:06 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Andrew Morton,
	KAMEZAWA Hiroyuki, Andi Kleen

On Thu, Feb 23, 2012 at 11:46:22PM +0400, Konstantin Khlebnikov wrote:
> Johannes Weiner wrote:
> >On Thu, Feb 23, 2012 at 05:51:46PM +0400, Konstantin Khlebnikov wrote:
> >>@@ -821,6 +821,26 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
> >>  				struct mem_cgroup, css);
> >>  }
> >>
> >>+/**
> >>+ * mm_match_cgroup - cgroup hierarchy mm membership test
> >>+ * @mm		mm_struct to test
> >>+ * @cgroup	target cgroup
> >>+ *
> >>+ * Returns true if mm belong this cgroup or any its child in hierarchy
> >>+ */
> >>+int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
> >>+{
> >>+	struct mem_cgroup *memcg;
> >>+
> >>+	rcu_read_lock();
> >>+	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
> >>+	while (memcg != cgroup&&  memcg&&  memcg->use_hierarchy)
> >>+		memcg = parent_mem_cgroup(memcg);
> >>+	rcu_read_unlock();
> >>+
> >>+	return cgroup == memcg;
> >>+}
> >
> >Please don't duplicate mem_cgroup_same_or_subtree()'s functionality in
> >a worse way.  The hierarchy information is kept in a stack such that
> >ancestry can be detected in linear time, check out css_is_ancestor().
> 
> Ok, there will be something like that:
> 
> +bool mm_match_cgroup(const struct mm_struct *mm,
> +                    const struct mem_cgroup *cgroup)
> +{
> +       struct mem_cgroup *memcg;
> +       bool ret;
> +
> +       rcu_read_lock();
> +       memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
> +       ret = memcg && mem_cgroup_same_or_subtree(cgroup, memcg);
> +       rcu_read_unlock();
> +
> +       return ret;
> +}
> +

It would be unfortunate to nest rcu_read_lock(), but I think this
looks good otherwise.

> >If you don't want to nest rcu_read_lock(), you could push the
> >rcu_read_lock() from css_is_ancestor() into its sole user and provide
> >a __mem_cgroup_is_ancestor() that assumes rcu already read-locked.
> >
> >No?
> 
> It is not a problem.
> 
> looks like mem_cgroup_same_or_subtree() check something different,
> because it does not check ->use_hierarchy flag on tested cgroup, only on target cgroup.

If a memcg has hierarchy enabled, any memcg that turns out to be its
child is guaranteed to have hierarchy enabled.

> Or just all this hierarchical stuff is out of sync in different parts of code.
> For example memcg_get_hierarchical_limit() start from deepest cgroup and go upper
> while ->use_hierarchy is set.

This one really has to walk up and find the smallest applicable limit,
there is no way around looking at every single level.

But checking for ancestry can and has been optimized.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 18/21] mm: add to lruvec isolated pages counters
  2012-02-23 13:53 ` [PATCH v3 18/21] mm: add to lruvec isolated pages counters Konstantin Khlebnikov
@ 2012-02-24  5:32   ` Konstantin Khlebnikov
  2012-02-28  1:38   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-24  5:32 UTC (permalink / raw)
  To: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki
  Cc: Andi Kleen

Konstantin Khlebnikov wrote:
> @@ -2480,8 +2494,11 @@ static int mem_cgroup_move_parent(struct page *page,
>
>          if (nr_pages>  1)
>                  compound_unlock_irqrestore(page, flags);
> +       if (!ret)
> +               /* This also stabilize PageLRU() sign for lruvec lock holder. */
> +               mem_cgroup_adjust_isolated(lruvec, page, -nr_pages);
>   put_back:
> -       putback_lru_page(page);
> +       __putback_lru_page(page, !ret);
>   put:
>          put_page(page);
>   out:

Oh, no. There must be !!ret

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2482,7 +2482,7 @@ static int mem_cgroup_move_parent(struct page *page,
                 /* This also stabilize PageLRU() sign for lruvec lock holder. */
                 mem_cgroup_adjust_isolated(lruvec, page, -nr_pages);
  put_back:
-       __putback_lru_page(page, !ret);
+       __putback_lru_page(page, !!ret);
  put:
         put_page(page);
  out:

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 00/21] mm: lru_lock splitting
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (20 preceding siblings ...)
  2012-02-23 13:53 ` [PATCH v3 21/21] mm: zone lru vectors interleaving Konstantin Khlebnikov
@ 2012-02-25  0:05 ` Tim Chen
  2012-02-25  5:34   ` Konstantin Khlebnikov
  2012-02-25  2:15 ` KAMEZAWA Hiroyuki
  2012-02-28  1:52 ` KAMEZAWA Hiroyuki
  23 siblings, 1 reply; 65+ messages in thread
From: Tim Chen @ 2012-02-25  0:05 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki, Andi Kleen

On Thu, 2012-02-23 at 17:51 +0400, Konstantin Khlebnikov wrote:
> v3 changes:
> * inactive-ratio reworked again, now it always calculated from from scratch
> * hierarchical pte reference bits filter in memory-cgroup reclaimer
> * fixed two bugs in locking, found by Hugh Dickins
> * locking functions slightly simplified
> * new patch for isolated pages accounting
> * new patch with lru interleaving
> 
> This patchset is based on next-20120210
> 
> git: https://github.com/koct9i/linux/commits/lruvec-v3
> 
> ---

I am seeing an improvement of about 7% in throughput in a workload where
I am doing parallel reading of files that are mmaped. The contention on
lru_lock used to be 13% in the cpu profile on the __pagevec_lru_add code
path. Now lock contention on this path drops to about 0.6%.  I have 40
hyper-threaded enabled cpu cores running 80 mmaped file reading
processes.

So initial testing of this patch set looks encouraging.

Tim


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 00/21] mm: lru_lock splitting
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (21 preceding siblings ...)
  2012-02-25  0:05 ` [PATCH v3 00/21] mm: lru_lock splitting Tim Chen
@ 2012-02-25  2:15 ` KAMEZAWA Hiroyuki
  2012-02-25  5:31   ` Konstantin Khlebnikov
  2012-02-28  1:52 ` KAMEZAWA Hiroyuki
  23 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-25  2:15 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:51:36 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> v3 changes:
> * inactive-ratio reworked again, now it always calculated from from scratch
> * hierarchical pte reference bits filter in memory-cgroup reclaimer
> * fixed two bugs in locking, found by Hugh Dickins
> * locking functions slightly simplified
> * new patch for isolated pages accounting
> * new patch with lru interleaving
> 
> This patchset is based on next-20120210
> 
> git: https://github.com/koct9i/linux/commits/lruvec-v3
> 

I wonder.... I just wonder...if we can split a lruvec in a zone into small
pieces of lruvec and have splitted LRU-lock per them, do we need per-memcg-lrulock ?

It seems per-memcg-lrulock can be much bigger lock than small-lruvec-lock.
(depends on configuraton) and much more complicated..and have to take care
of many things.. If unit of splitting can be specified by boot option,
it seems admins can split a big memcg's per-memcg-lru lock into more small pieces.

BTW, how to think of default size of splitting ? I wonder splitting lru into
the number of cpus per a node can be a choice. Each cpu may have a chance to
set prefered-pfn-range at page allocation with additional patches.

Thanks,
-Kame


> ---
> 
> Konstantin Khlebnikov (21):
>       memcg: unify inactive_ratio calculation
>       memcg: make mm_match_cgroup() hirarchical
>       memcg: fix page_referencies cgroup filter on global reclaim
>       memcg: use vm_swappiness from target memory cgroup
>       mm: rename lruvec->lists into lruvec->pages_lru
>       mm: lruvec linking functions
>       mm: add lruvec->pages_count
>       mm: unify inactive_list_is_low()
>       mm: add lruvec->reclaim_stat
>       mm: kill struct mem_cgroup_zone
>       mm: move page-to-lruvec translation upper
>       mm: push lruvec into update_page_reclaim_stat()
>       mm: push lruvecs from pagevec_lru_move_fn() to iterator
>       mm: introduce lruvec locking primitives
>       mm: handle lruvec relocks on lumpy reclaim
>       mm: handle lruvec relocks in compaction
>       mm: handle lruvec relock in memory controller
>       mm: add to lruvec isolated pages counters
>       memcg: check lru vectors emptiness in pre-destroy
>       mm: split zone->lru_lock
>       mm: zone lru vectors interleaving
> 
> 
>  include/linux/huge_mm.h    |    3 
>  include/linux/memcontrol.h |   75 ------
>  include/linux/mm.h         |   66 +++++
>  include/linux/mm_inline.h  |   19 +-
>  include/linux/mmzone.h     |   39 ++-
>  include/linux/swap.h       |    6 
>  mm/Kconfig                 |   16 +
>  mm/compaction.c            |   31 +--
>  mm/huge_memory.c           |   14 +
>  mm/internal.h              |  204 +++++++++++++++++
>  mm/ksm.c                   |    2 
>  mm/memcontrol.c            |  343 +++++++++++-----------------
>  mm/migrate.c               |    2 
>  mm/page_alloc.c            |   70 +-----
>  mm/rmap.c                  |    2 
>  mm/swap.c                  |  217 ++++++++++--------
>  mm/vmscan.c                |  534 ++++++++++++++++++++++++--------------------
>  mm/vmstat.c                |    6 
>  18 files changed, 932 insertions(+), 717 deletions(-)
> 
> -- 
> Signature
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 00/21] mm: lru_lock splitting
  2012-02-25  2:15 ` KAMEZAWA Hiroyuki
@ 2012-02-25  5:31   ` Konstantin Khlebnikov
  2012-02-26 23:54     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-25  5:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:51:36 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> v3 changes:
>> * inactive-ratio reworked again, now it always calculated from from scratch
>> * hierarchical pte reference bits filter in memory-cgroup reclaimer
>> * fixed two bugs in locking, found by Hugh Dickins
>> * locking functions slightly simplified
>> * new patch for isolated pages accounting
>> * new patch with lru interleaving
>>
>> This patchset is based on next-20120210
>>
>> git: https://github.com/koct9i/linux/commits/lruvec-v3
>>
>
> I wonder.... I just wonder...if we can split a lruvec in a zone into small
> pieces of lruvec and have splitted LRU-lock per them, do we need per-memcg-lrulock ?

What per-memcg-lrulock? I don't have it.
last patch splits lruvecs in memcg with the same factor.

>
> It seems per-memcg-lrulock can be much bigger lock than small-lruvec-lock.
> (depends on configuraton) and much more complicated..and have to take care
> of many things.. If unit of splitting can be specified by boot option,
> it seems admins can split a big memcg's per-memcg-lru lock into more small pieces.

lruvec count per memcg can be arbitrary and changeable if cgroup is empty.
This is not in this patch, but it's really easy.

>
> BTW, how to think of default size of splitting ? I wonder splitting lru into
> the number of cpus per a node can be a choice. Each cpu may have a chance to
> set prefered-pfn-range at page allocation with additional patches.

If we rework page to memcg linking and add direct lruvec-id into page->flags,
we will able to change lruvec before inserting page to lru.
Thus each cpu will always insert pages into its own lruvec in zone.
I have not thought about races yet, but this would be perfect solution.

>
> Thanks,
> -Kame
>
>
>> ---
>>
>> Konstantin Khlebnikov (21):
>>        memcg: unify inactive_ratio calculation
>>        memcg: make mm_match_cgroup() hirarchical
>>        memcg: fix page_referencies cgroup filter on global reclaim
>>        memcg: use vm_swappiness from target memory cgroup
>>        mm: rename lruvec->lists into lruvec->pages_lru
>>        mm: lruvec linking functions
>>        mm: add lruvec->pages_count
>>        mm: unify inactive_list_is_low()
>>        mm: add lruvec->reclaim_stat
>>        mm: kill struct mem_cgroup_zone
>>        mm: move page-to-lruvec translation upper
>>        mm: push lruvec into update_page_reclaim_stat()
>>        mm: push lruvecs from pagevec_lru_move_fn() to iterator
>>        mm: introduce lruvec locking primitives
>>        mm: handle lruvec relocks on lumpy reclaim
>>        mm: handle lruvec relocks in compaction
>>        mm: handle lruvec relock in memory controller
>>        mm: add to lruvec isolated pages counters
>>        memcg: check lru vectors emptiness in pre-destroy
>>        mm: split zone->lru_lock
>>        mm: zone lru vectors interleaving
>>
>>
>>   include/linux/huge_mm.h    |    3
>>   include/linux/memcontrol.h |   75 ------
>>   include/linux/mm.h         |   66 +++++
>>   include/linux/mm_inline.h  |   19 +-
>>   include/linux/mmzone.h     |   39 ++-
>>   include/linux/swap.h       |    6
>>   mm/Kconfig                 |   16 +
>>   mm/compaction.c            |   31 +--
>>   mm/huge_memory.c           |   14 +
>>   mm/internal.h              |  204 +++++++++++++++++
>>   mm/ksm.c                   |    2
>>   mm/memcontrol.c            |  343 +++++++++++-----------------
>>   mm/migrate.c               |    2
>>   mm/page_alloc.c            |   70 +-----
>>   mm/rmap.c                  |    2
>>   mm/swap.c                  |  217 ++++++++++--------
>>   mm/vmscan.c                |  534 ++++++++++++++++++++++++--------------------
>>   mm/vmstat.c                |    6
>>   18 files changed, 932 insertions(+), 717 deletions(-)
>>
>> --
>> Signature
>>
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 00/21] mm: lru_lock splitting
  2012-02-25  0:05 ` [PATCH v3 00/21] mm: lru_lock splitting Tim Chen
@ 2012-02-25  5:34   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-25  5:34 UTC (permalink / raw)
  To: Tim Chen
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, KAMEZAWA Hiroyuki, Andi Kleen

Tim Chen wrote:
> On Thu, 2012-02-23 at 17:51 +0400, Konstantin Khlebnikov wrote:
>> v3 changes:
>> * inactive-ratio reworked again, now it always calculated from from scratch
>> * hierarchical pte reference bits filter in memory-cgroup reclaimer
>> * fixed two bugs in locking, found by Hugh Dickins
>> * locking functions slightly simplified
>> * new patch for isolated pages accounting
>> * new patch with lru interleaving
>>
>> This patchset is based on next-20120210
>>
>> git: https://github.com/koct9i/linux/commits/lruvec-v3
>>
>> ---
>
> I am seeing an improvement of about 7% in throughput in a workload where
> I am doing parallel reading of files that are mmaped. The contention on
> lru_lock used to be 13% in the cpu profile on the __pagevec_lru_add code
> path. Now lock contention on this path drops to about 0.6%.  I have 40
> hyper-threaded enabled cpu cores running 80 mmaped file reading
> processes.
>
> So initial testing of this patch set looks encouraging.

That's great!

>
> Tim
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 00/21] mm: lru_lock splitting
  2012-02-25  5:31   ` Konstantin Khlebnikov
@ 2012-02-26 23:54     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-26 23:54 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Sat, 25 Feb 2012 09:31:01 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Thu, 23 Feb 2012 17:51:36 +0400
> > Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
> >
> >> v3 changes:
> >> * inactive-ratio reworked again, now it always calculated from from scratch
> >> * hierarchical pte reference bits filter in memory-cgroup reclaimer
> >> * fixed two bugs in locking, found by Hugh Dickins
> >> * locking functions slightly simplified
> >> * new patch for isolated pages accounting
> >> * new patch with lru interleaving
> >>
> >> This patchset is based on next-20120210
> >>
> >> git: https://github.com/koct9i/linux/commits/lruvec-v3
> >>
> >
> > I wonder.... I just wonder...if we can split a lruvec in a zone into small
> > pieces of lruvec and have splitted LRU-lock per them, do we need per-memcg-lrulock ?
> 
> What per-memcg-lrulock? I don't have it.
> last patch splits lruvecs in memcg with the same factor.
> 
Okay, I missed it.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 01/21] memcg: unify inactive_ratio calculation
  2012-02-23 13:51 ` [PATCH v3 01/21] memcg: unify inactive_ratio calculation Konstantin Khlebnikov
@ 2012-02-28  0:05   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:05 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:51:41 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This patch removes precalculated zone->inactive_ratio.
> Now it always calculated in inactive_anon_is_low() from current lru sizes.
> After that we can merge memcg and non-memcg cases and drop duplicated code.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

seems good to me.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical
  2012-02-23 13:51 ` [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical Konstantin Khlebnikov
  2012-02-23 18:03   ` Johannes Weiner
@ 2012-02-28  0:11   ` KAMEZAWA Hiroyuki
  2012-02-28  6:31     ` Konstantin Khlebnikov
  1 sibling, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:51:46 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Check mm-owner cgroup membership hierarchically.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>


Ack. but ... see below.

> ---
>  include/linux/memcontrol.h |   11 ++---------
>  mm/memcontrol.c            |   20 ++++++++++++++++++++
>  2 files changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8c4d74f..4822d53 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -87,15 +87,8 @@ extern struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm);
>  extern struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg);
>  extern struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont);
>  
> -static inline
> -int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
> -{
> -	struct mem_cgroup *memcg;
> -	rcu_read_lock();
> -	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
> -	rcu_read_unlock();
> -	return cgroup == memcg;
> -}
> +extern int mm_match_cgroup(const struct mm_struct *mm,
> +			   const struct mem_cgroup *cgroup);
>  
>  extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b8039d2..77f5d48 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -821,6 +821,26 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
>  				struct mem_cgroup, css);
>  }
>  
> +/**
> + * mm_match_cgroup - cgroup hierarchy mm membership test
> + * @mm		mm_struct to test
> + * @cgroup	target cgroup
> + *
> + * Returns true if mm belong this cgroup or any its child in hierarchy

belongs to ?

> + */
> +int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
> +{

Please use "memcg" for representing "memory cgroup" (other function's arguments uses "memcg")

> +	struct mem_cgroup *memcg;

So, rename this as *cur_memcg or some.

> +
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
> +	while (memcg != cgroup && memcg && memcg->use_hierarchy)
> +		memcg = parent_mem_cgroup(memcg);

IIUC, parent_mem_cgroup() checks mem->res.parent. mem->res.parent is set only when
parent->use_hierarchy == true. Then, 

	while (memcg != cgroup)
		memcg = parent_mem_cgroup(memcg);

will be enough.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 03/21] memcg: fix page_referencies cgroup filter on global reclaim
  2012-02-23 13:51 ` [PATCH v3 03/21] memcg: fix page_referencies cgroup filter on global reclaim Konstantin Khlebnikov
@ 2012-02-28  0:13   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:13 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:51:51 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Global memory reclaimer shouldn't skip any page referencies.
> 
> This patch pass sc->target_mem_cgroup into page_referenced().
> On global memory reclaim it always NULL, so we will account all.
> Cgroup reclaimer will account only referencies from target cgroup and its childs.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

seems nice to me.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 04/21] memcg: use vm_swappiness from target memory cgroup
  2012-02-23 13:51 ` [PATCH v3 04/21] memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
@ 2012-02-28  0:15   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:15 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:51:56 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Use vm_swappiness from memory cgroup which is triggered this memory reclaim.
> This is more reasonable and allows to kill one argument.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Hmm... it may be better to disallow to have different swappiness in a hierarchy..

>From me,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

But I wonder other guys may have different idea.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 05/21] mm: rename lruvec->lists into lruvec->pages_lru
  2012-02-23 13:52 ` [PATCH v3 05/21] mm: rename lruvec->lists into lruvec->pages_lru Konstantin Khlebnikov
@ 2012-02-28  0:20   ` KAMEZAWA Hiroyuki
  2012-02-28  6:04     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:20 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:00 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This is much more unique and grep-friendly name.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

I worries this kind of change can cause many hunks and make merging difficult..
But this seems not very destructive..

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I have no strong opinions to this naming. How other mm developpers think ?

I personally think making this kind of changes in the head of patch set tend do
make it difficult to merge full sets of patche series.

Thanks,
-Kame

> ---
>  include/linux/mm_inline.h |    2 +-
>  include/linux/mmzone.h    |    2 +-
>  mm/memcontrol.c           |    6 +++---
>  mm/page_alloc.c           |    2 +-
>  mm/swap.c                 |    4 ++--
>  mm/vmscan.c               |    6 +++---
>  6 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index 227fd3e..8415596 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -27,7 +27,7 @@ add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
>  	struct lruvec *lruvec;
>  
>  	lruvec = mem_cgroup_lru_add_list(zone, page, lru);
> -	list_add(&page->lru, &lruvec->lists[lru]);
> +	list_add(&page->lru, &lruvec->pages_lru[lru]);
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, hpage_nr_pages(page));
>  }
>  
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3e1f7ff..ddd0fd2 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -160,7 +160,7 @@ static inline int is_unevictable_lru(enum lru_list lru)
>  }
>  
>  struct lruvec {
> -	struct list_head lists[NR_LRU_LISTS];
> +	struct list_head pages_lru[NR_LRU_LISTS];
>  };
>  
>  /* Mask used at gathering information at once (see memcontrol.c) */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 77f5d48..8f8c7c4 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1050,7 +1050,7 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
>   * the lruvec for the given @zone and the memcg @page is charged to.
>   *
>   * The callsite is then responsible for physically linking the page to
> - * the returned lruvec->lists[@lru].
> + * the returned lruvec->pages_lru[@lru].
>   */
>  struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  				       enum lru_list lru)
> @@ -3592,7 +3592,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
>  
>  	zone = &NODE_DATA(node)->node_zones[zid];
>  	mz = mem_cgroup_zoneinfo(memcg, node, zid);
> -	list = &mz->lruvec.lists[lru];
> +	list = &mz->lruvec.pages_lru[lru];
>  
>  	loop = mz->lru_size[lru];
>  	/* give some margin against EBUSY etc...*/
> @@ -4716,7 +4716,7 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
>  	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
>  		mz = &pn->zoneinfo[zone];
>  		for_each_lru(lru)
> -			INIT_LIST_HEAD(&mz->lruvec.lists[lru]);
> +			INIT_LIST_HEAD(&mz->lruvec.pages_lru[lru]);
>  		mz->usage_in_excess = 0;
>  		mz->on_tree = false;
>  		mz->memcg = memcg;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 38f6744..5f19392 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4363,7 +4363,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  
>  		zone_pcp_init(zone);
>  		for_each_lru(lru)
> -			INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
> +			INIT_LIST_HEAD(&zone->lruvec.pages_lru[lru]);
>  		zone->reclaim_stat.recent_rotated[0] = 0;
>  		zone->reclaim_stat.recent_rotated[1] = 0;
>  		zone->reclaim_stat.recent_scanned[0] = 0;
> diff --git a/mm/swap.c b/mm/swap.c
> index fff1ff7..17993c0 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -238,7 +238,7 @@ static void pagevec_move_tail_fn(struct page *page, void *arg)
>  
>  		lruvec = mem_cgroup_lru_move_lists(page_zone(page),
>  						   page, lru, lru);
> -		list_move_tail(&page->lru, &lruvec->lists[lru]);
> +		list_move_tail(&page->lru, &lruvec->pages_lru[lru]);
>  		(*pgmoved)++;
>  	}
>  }
> @@ -482,7 +482,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
>  		 * We moves tha page into tail of inactive.
>  		 */
>  		lruvec = mem_cgroup_lru_move_lists(zone, page, lru, lru);
> -		list_move_tail(&page->lru, &lruvec->lists[lru]);
> +		list_move_tail(&page->lru, &lruvec->pages_lru[lru]);
>  		__count_vm_event(PGROTATED);
>  	}
>  
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 8b59cb5..e41ad52 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1164,7 +1164,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  		lru += LRU_ACTIVE;
>  	if (file)
>  		lru += LRU_FILE;
> -	src = &lruvec->lists[lru];
> +	src = &lruvec->pages_lru[lru];
>  
>  	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
>  		struct page *page;
> @@ -1663,7 +1663,7 @@ static void move_active_pages_to_lru(struct zone *zone,
>  		SetPageLRU(page);
>  
>  		lruvec = mem_cgroup_lru_add_list(zone, page, lru);
> -		list_move(&page->lru, &lruvec->lists[lru]);
> +		list_move(&page->lru, &lruvec->pages_lru[lru]);
>  		pgmoved += hpage_nr_pages(page);
>  
>  		if (put_page_testzero(page)) {
> @@ -3592,7 +3592,7 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
>  			__dec_zone_state(zone, NR_UNEVICTABLE);
>  			lruvec = mem_cgroup_lru_move_lists(zone, page,
>  						LRU_UNEVICTABLE, lru);
> -			list_move(&page->lru, &lruvec->lists[lru]);
> +			list_move(&page->lru, &lruvec->pages_lru[lru]);
>  			__inc_zone_state(zone, NR_INACTIVE_ANON + lru);
>  			pgrescued++;
>  		}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 06/21] mm: lruvec linking functions
  2012-02-23 13:52 ` [PATCH v3 06/21] mm: lruvec linking functions Konstantin Khlebnikov
@ 2012-02-28  0:27   ` KAMEZAWA Hiroyuki
  2012-02-28  6:09     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:27 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:04 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This patch adds links from page to its lruvec and from lruvec to its zone and node.
> If CONFIG_CGROUP_MEM_RES_CTLR=n they just page_zone() and container_of().
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

small comments in below.

> ---
>  include/linux/mm.h     |   37 +++++++++++++++++++++++++++++++++++++
>  include/linux/mmzone.h |   12 ++++++++----
>  mm/internal.h          |    1 +
>  mm/memcontrol.c        |   27 ++++++++++++++++++++++++---
>  mm/page_alloc.c        |   17 ++++++++++++++---
>  5 files changed, 84 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ee3ebc1..c6dc4ab 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -728,6 +728,43 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
>  #endif
>  }
>  
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +
> +/* Multiple lruvecs in zone */
> +
> +extern struct lruvec *page_lruvec(struct page *page);
> +
> +static inline struct zone *lruvec_zone(struct lruvec *lruvec)
> +{
> +	return lruvec->zone;
> +}
> +
> +static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
> +{
> +	return lruvec->node;
> +}
> +
> +#else /* CONFIG_CGROUP_MEM_RES_CTLR */
> +
> +/* Single lruvec in zone */
> +
> +static inline struct lruvec *page_lruvec(struct page *page)
> +{
> +	return &page_zone(page)->lruvec;
> +}
> +
> +static inline struct zone *lruvec_zone(struct lruvec *lruvec)
> +{
> +	return container_of(lruvec, struct zone, lruvec);
> +}
> +
> +static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
> +{
> +	return lruvec_zone(lruvec)->zone_pgdat;
> +}
> +
> +#endif /* CONFIG_CGROUP_MEM_RES_CTLR */
> +
>  /*
>   * Some inline functions in vmstat.h depend on page_zone()
>   */
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ddd0fd2..be8873a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -159,10 +159,6 @@ static inline int is_unevictable_lru(enum lru_list lru)
>  	return (lru == LRU_UNEVICTABLE);
>  }
>  
> -struct lruvec {
> -	struct list_head pages_lru[NR_LRU_LISTS];
> -};
> -
>  /* Mask used at gathering information at once (see memcontrol.c) */
>  #define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE))
>  #define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON))
> @@ -300,6 +296,14 @@ struct zone_reclaim_stat {
>  	unsigned long		recent_scanned[2];
>  };
>  
> +struct lruvec {
> +	struct list_head	pages_lru[NR_LRU_LISTS];
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +	struct zone		*zone;
> +	struct pglist_data	*node;
> +#endif

I don't think this #ifdef is very good ....this adds other #ifdefs in other headers.
How bad if we remove this #ifdef and use ->zone, ->pgdat in lruvec_zone, lruvec_page
always ?

There may be concerns to fit lruvec at el into cache-line...but this set will add
a (big) hash here later..

I'm sorry if you're asked to add this #ifdef in v1 or v2.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 07/21] mm: add lruvec->pages_count
  2012-02-23 13:52 ` [PATCH v3 07/21] mm: add lruvec->pages_count Konstantin Khlebnikov
@ 2012-02-28  0:35   ` KAMEZAWA Hiroyuki
  2012-02-28  6:16     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:08 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Move lru pages counter from mem_cgroup_per_zone->count[] to lruvec->pages_count[]
> 
> Account pages in all lruvecs, incuding root,
> this isn't a huge overhead, but it greatly simplifies all code.
> 
> Redundant page_lruvec() calls will be optimized in further patches.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Hmm, I like this but..a question below.

> ---
>  include/linux/memcontrol.h |   29 --------------
>  include/linux/mm_inline.h  |   15 +++++--
>  include/linux/mmzone.h     |    1 
>  mm/memcontrol.c            |   93 +-------------------------------------------
>  mm/swap.c                  |    7 +--
>  mm/vmscan.c                |   25 +++++++++---
>  6 files changed, 34 insertions(+), 136 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4822d53..b9d555b 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -63,12 +63,6 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>  					gfp_t gfp_mask);
>  
>  struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
> -struct lruvec *mem_cgroup_lru_add_list(struct zone *, struct page *,
> -				       enum lru_list);
> -void mem_cgroup_lru_del_list(struct page *, enum lru_list);
> -void mem_cgroup_lru_del(struct page *);
> -struct lruvec *mem_cgroup_lru_move_lists(struct zone *, struct page *,
> -					 enum lru_list, enum lru_list);
>  
>  /* For coalescing uncharge for reducing memcg' overhead*/
>  extern void mem_cgroup_uncharge_start(void);
> @@ -212,29 +206,6 @@ static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
>  	return &zone->lruvec;
>  }
>  
> -static inline struct lruvec *mem_cgroup_lru_add_list(struct zone *zone,
> -						     struct page *page,
> -						     enum lru_list lru)
> -{
> -	return &zone->lruvec;
> -}
> -
> -static inline void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
> -{
> -}
> -
> -static inline void mem_cgroup_lru_del(struct page *page)
> -{
> -}
> -
> -static inline struct lruvec *mem_cgroup_lru_move_lists(struct zone *zone,
> -						       struct page *page,
> -						       enum lru_list from,
> -						       enum lru_list to)
> -{
> -	return &zone->lruvec;
> -}
> -
>  static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>  {
>  	return NULL;
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index 8415596..daa3d15 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -24,19 +24,24 @@ static inline int page_is_file_cache(struct page *page)
>  static inline void
>  add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
>  {
> -	struct lruvec *lruvec;
> +	struct lruvec *lruvec = page_lruvec(page);
> +	int numpages = hpage_nr_pages(page);
>  
> -	lruvec = mem_cgroup_lru_add_list(zone, page, lru);
>  	list_add(&page->lru, &lruvec->pages_lru[lru]);
> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, hpage_nr_pages(page));
> +	lruvec->pages_count[lru] += numpages;
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, numpages);
>  }
>  
>  static inline void
>  del_page_from_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
>  {
> -	mem_cgroup_lru_del_list(page, lru);
> +	struct lruvec *lruvec = page_lruvec(page);
> +	int numpages = hpage_nr_pages(page);
> +
>  	list_del(&page->lru);
> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -hpage_nr_pages(page));
> +	lruvec->pages_count[lru] -= numpages;
> +	VM_BUG_ON((long)lruvec->pages_count[lru] < 0);
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -numpages);
>  }
>  
>  /**
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index be8873a..69b0f31 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -298,6 +298,7 @@ struct zone_reclaim_stat {
>  
>  struct lruvec {
>  	struct list_head	pages_lru[NR_LRU_LISTS];
> +	unsigned long		pages_count[NR_LRU_LISTS];

In this time, you don't put the objects under #ifdef...why ?

How do you handle duplication "the number of pages in LRU" of zone->vm_stat and this ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 08/21] mm: unify inactive_list_is_low()
  2012-02-23 13:52 ` [PATCH v3 08/21] mm: unify inactive_list_is_low() Konstantin Khlebnikov
@ 2012-02-28  0:36   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:36 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:19 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Unify memcg and non-memcg logic, always use exact counters from struct lruvec.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Nice.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 09/21] mm: add lruvec->reclaim_stat
  2012-02-23 13:52 ` [PATCH v3 09/21] mm: add lruvec->reclaim_stat Konstantin Khlebnikov
@ 2012-02-28  0:38   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:38 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:24 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Merge memcg and non-memcg reclaim stat. We need to update only one.
> Move zone->reclaimer_stat and mem_cgroup_per_zone->reclaimer_stat to struct lruvec.
> 
> struct lruvec will become operating unit for recalimer logic,
> thus this is perfect place for these counters.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

I like this.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 10/21] mm: kill struct mem_cgroup_zone
  2012-02-23 13:52 ` [PATCH v3 10/21] mm: kill struct mem_cgroup_zone Konstantin Khlebnikov
@ 2012-02-28  0:41   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:41 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:29 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> struct mem_cgroup_zone always points to one lruvec, either root zone->lruvec or
> to some from memcg. So this fancy pointer can be replaced with direct pointer to
> struct lruvec, because all required infromation already collected on lruvec.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 11/21] mm: move page-to-lruvec translation upper
  2012-02-23 13:52 ` [PATCH v3 11/21] mm: move page-to-lruvec translation upper Konstantin Khlebnikov
@ 2012-02-28  0:42   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:42 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:33 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> move page_lruvec() out of add_page_to_lru_list() and del_page_from_lru_list()
> switch its first argument from zone to lruvec.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 12/21] mm: push lruvec into update_page_reclaim_stat()
  2012-02-23 13:52 ` [PATCH v3 12/21] mm: push lruvec into update_page_reclaim_stat() Konstantin Khlebnikov
@ 2012-02-28  0:44   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:38 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Push lruvec pointer into update_page_reclaim_stat()
> * drop page argument
> * drop active and file arguments, use lru instead
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


> ---
>  mm/swap.c |   30 +++++++++---------------------
>  1 files changed, 9 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 0cbc558..1f5731e 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -276,24 +276,19 @@ void rotate_reclaimable_page(struct page *page)
>  	}
>  }
>  
> -static void update_page_reclaim_stat(struct zone *zone, struct page *page,
> -				     int file, int rotated)
> +static void update_page_reclaim_stat(struct lruvec *lruvec, enum lru_list lru)
>  {
> -	struct zone_reclaim_stat *reclaim_stat;
> -
> -	reclaim_stat = &page_lruvec(page)->reclaim_stat;
> +	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> +	int file = is_file_lru(lru);
>  
>  	reclaim_stat->recent_scanned[file]++;
> -	if (rotated)
> +	if (is_active_lru(lru))
>  		reclaim_stat->recent_rotated[file]++;
>  }
>  
>  static void __activate_page(struct page *page, void *arg)
>  {
> -	struct zone *zone = page_zone(page);
> -
>  	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
> -		int file = page_is_file_cache(page);
>  		int lru = page_lru_base_type(page);
>  		struct lruvec *lruvec = page_lruvec(page);
>  
> @@ -304,7 +299,7 @@ static void __activate_page(struct page *page, void *arg)
>  		add_page_to_lru_list(lruvec, page, lru);
>  		__count_vm_event(PGACTIVATE);
>  
> -		update_page_reclaim_stat(zone, page, file, 1);
> +		update_page_reclaim_stat(lruvec, lru);
>  	}
>  }
>  
> @@ -443,7 +438,6 @@ static void lru_deactivate_fn(struct page *page, void *arg)
>  {
>  	int lru, file;
>  	bool active;
> -	struct zone *zone = page_zone(page);
>  	struct lruvec *lruvec;
>  
>  	if (!PageLRU(page))
> @@ -484,7 +478,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
>  
>  	if (active)
>  		__count_vm_event(PGDEACTIVATE);
> -	update_page_reclaim_stat(zone, page, file, 0);
> +	update_page_reclaim_stat(lruvec, lru);
>  }
>  
>  /*
> @@ -649,9 +643,7 @@ EXPORT_SYMBOL(__pagevec_release);
>  void lru_add_page_tail(struct zone* zone,
>  		       struct page *page, struct page *page_tail)
>  {
> -	int active;
>  	enum lru_list lru;
> -	const int file = 0;
>  	struct lruvec *lruvec = page_lruvec(page);
>  
>  	VM_BUG_ON(!PageHead(page));
> @@ -664,13 +656,11 @@ void lru_add_page_tail(struct zone* zone,
>  	if (page_evictable(page_tail, NULL)) {
>  		if (PageActive(page)) {
>  			SetPageActive(page_tail);
> -			active = 1;
>  			lru = LRU_ACTIVE_ANON;
>  		} else {
> -			active = 0;
>  			lru = LRU_INACTIVE_ANON;
>  		}
> -		update_page_reclaim_stat(zone, page_tail, file, active);
> +		update_page_reclaim_stat(lruvec, lru);
>  	} else {
>  		SetPageUnevictable(page_tail);
>  		lru = LRU_UNEVICTABLE;
> @@ -698,17 +688,15 @@ static void __pagevec_lru_add_fn(struct page *page, void *arg)
>  {
>  	enum lru_list lru = (enum lru_list)arg;
>  	struct lruvec *lruvec = page_lruvec(page);
> -	int file = is_file_lru(lru);
> -	int active = is_active_lru(lru);
>  
>  	VM_BUG_ON(PageActive(page));
>  	VM_BUG_ON(PageUnevictable(page));
>  	VM_BUG_ON(PageLRU(page));
>  
>  	SetPageLRU(page);
> -	if (active)
> +	if (is_active_lru(lru))
>  		SetPageActive(page);
> -	update_page_reclaim_stat(lruvec_zone(lruvec), page, file, active);
> +	update_page_reclaim_stat(lruvec, lru);
>  	add_page_to_lru_list(lruvec, page, lru);
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 13/21] mm: push lruvecs from pagevec_lru_move_fn() to iterator
  2012-02-23 13:52 ` [PATCH v3 13/21] mm: push lruvecs from pagevec_lru_move_fn() to iterator Konstantin Khlebnikov
@ 2012-02-28  0:45   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:45 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:42 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Push lruvec pointer from pagevec_lru_move_fn() to iterator function.
> Push lruvec pointer into lru_add_page_tail()
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 14/21] mm: introduce lruvec locking primitives
  2012-02-23 13:52 ` [PATCH v3 14/21] mm: introduce lruvec locking primitives Konstantin Khlebnikov
@ 2012-02-28  0:56   ` KAMEZAWA Hiroyuki
  2012-02-28  6:23     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  0:56 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:47 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This is initial preparation for lru_lock splitting.
> 
> This locking primites designed to hide splitted nature of lru_lock
> and to avoid overhead for non-splitted lru_lock in non-memcg case.
> 
> * Lock via lruvec reference
> 
> lock_lruvec(lruvec, flags)
> lock_lruvec_irq(lruvec)
> 
> * Lock via page reference
> 
> lock_page_lruvec(page, flags)
> lock_page_lruvec_irq(page)
> relock_page_lruvec(lruvec, page, flags)
> relock_page_lruvec_irq(lruvec, page)
> __relock_page_lruvec(lruvec, page) ( lruvec != NULL, page in same zone )
> 
> They always returns pointer to some locked lruvec, page anyway can be
> not in lru, PageLRU() sign is stable while we hold returned lruvec lock.
> Caller must guarantee page to lruvec reference validity.
> 
> * Lock via page, without stable page reference
> 
> __lock_page_lruvec_irq(&lruvec, page)
> 
> It returns true of lruvec succesfully locked and PageLRU is set.
> Initial lruvec can be NULL. Consequent calls must be in the same zone.
> 
> * Unlock
> 
> unlock_lruvec(lruvec, flags)
> unlock_lruvec_irq(lruvec)
> 
> * Wait
> 
> wait_lruvec_unlock(lruvec)
> Wait for lruvec unlock, caller must have stable reference to lruvec.
> 
> __wait_lruvec_unlock(lruvec)
> Wait for lruvec unlock before locking other lrulock for same page,
> nothing if there only one possible lruvec per page.
> Used at page-to-lruvec reference switching to stabilize PageLRU sign.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

O.K. I like this.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Hmm....Could you add a comment in memcg part ? (see below)



> ---
>  mm/huge_memory.c |    8 +-
>  mm/internal.h    |  176 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/memcontrol.c  |   14 ++--
>  mm/swap.c        |   58 ++++++------------
>  mm/vmscan.c      |   77 ++++++++++--------------
>  5 files changed, 237 insertions(+), 96 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 09e7069..74996b8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1228,13 +1228,11 @@ static int __split_huge_page_splitting(struct page *page,
>  static void __split_huge_page_refcount(struct page *page)
>  {
>  	int i;
> -	struct zone *zone = page_zone(page);
>  	struct lruvec *lruvec;
>  	int tail_count = 0;
>  
>  	/* prevent PageLRU to go away from under us, and freeze lru stats */
> -	spin_lock_irq(&zone->lru_lock);
> -	lruvec = page_lruvec(page);
> +	lruvec = lock_page_lruvec_irq(page);
>  	compound_lock(page);
>  	/* complete memcg works before add pages to LRU */
>  	mem_cgroup_split_huge_fixup(page);
> @@ -1316,11 +1314,11 @@ static void __split_huge_page_refcount(struct page *page)
>  	BUG_ON(atomic_read(&page->_count) <= 0);
>  
>  	__dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
> -	__mod_zone_page_state(zone, NR_ANON_PAGES, HPAGE_PMD_NR);
> +	__mod_zone_page_state(lruvec_zone(lruvec), NR_ANON_PAGES, HPAGE_PMD_NR);
>  
>  	ClearPageCompound(page);
>  	compound_unlock(page);
> -	spin_unlock_irq(&zone->lru_lock);
> +	unlock_lruvec_irq(lruvec);
>  
>  	for (i = 1; i < HPAGE_PMD_NR; i++) {
>  		struct page *page_tail = page + i;
> diff --git a/mm/internal.h b/mm/internal.h
> index ef49dbf..9454752 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -13,6 +13,182 @@
>  
>  #include <linux/mm.h>
>  
> +static inline void lock_lruvec(struct lruvec *lruvec, unsigned long *flags)
> +{
> +	spin_lock_irqsave(&lruvec_zone(lruvec)->lru_lock, *flags);
> +}
> +
> +static inline void lock_lruvec_irq(struct lruvec *lruvec)
> +{
> +	spin_lock_irq(&lruvec_zone(lruvec)->lru_lock);
> +}
> +
> +static inline void unlock_lruvec(struct lruvec *lruvec, unsigned long *flags)
> +{
> +	spin_unlock_irqrestore(&lruvec_zone(lruvec)->lru_lock, *flags);
> +}
> +
> +static inline void unlock_lruvec_irq(struct lruvec *lruvec)
> +{
> +	spin_unlock_irq(&lruvec_zone(lruvec)->lru_lock);
> +}
> +
> +static inline void wait_lruvec_unlock(struct lruvec *lruvec)
> +{
> +	spin_unlock_wait(&lruvec_zone(lruvec)->lru_lock);
> +}
> +
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +
> +/* Dynamic page to lruvec mapping */
> +
> +/* Lock other lruvec for other page in the same zone */
> +static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
> +						  struct page *page)
> +{
> +	/* Currenyly only one lru_lock per-zone */
> +	return page_lruvec(page);
> +}
> +
> +static inline struct lruvec *relock_page_lruvec_irq(struct lruvec *lruvec,
> +						    struct page *page)
> +{
> +	struct zone *zone = page_zone(page);
> +
> +	if (!lruvec) {
> +		spin_lock_irq(&zone->lru_lock);
> +	} else if (zone != lruvec_zone(lruvec)) {
> +		unlock_lruvec_irq(lruvec);
> +		spin_lock_irq(&zone->lru_lock);
> +	}
> +	return page_lruvec(page);
> +}

Could you add comments/caution to the caller 

 - !PageLRU(page) case ?
 - Can the caller assume page_lruvec(page) == lruvec ? If no, which lru_vec is locked ?

etc...


> +
> +static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
> +						struct page *page,
> +						unsigned long *flags)
> +{
> +	struct zone *zone = page_zone(page);
> +
> +	if (!lruvec) {
> +		spin_lock_irqsave(&zone->lru_lock, *flags);
> +	} else if (zone != lruvec_zone(lruvec)) {
> +		unlock_lruvec(lruvec, flags);
> +		spin_lock_irqsave(&zone->lru_lock, *flags);
> +	}
> +	return page_lruvec(page);
> +}


Same here.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 15/21] mm: handle lruvec relocks on lumpy reclaim
  2012-02-23 13:52 ` [PATCH v3 15/21] mm: handle lruvec relocks on lumpy reclaim Konstantin Khlebnikov
@ 2012-02-28  1:01   ` KAMEZAWA Hiroyuki
  2012-02-28  6:25     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  1:01 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:52 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Prepare for lock splitting in lumly reclaim logic.
> Now move_active_pages_to_lru() and putback_inactive_pages()
> can put pages into different lruvecs.
> 
> * relock book before SetPageLRU()

lruvec ?

> * update reclaim_stat pointer after relocks
> * return currently locked lruvec
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
>  mm/vmscan.c |   45 +++++++++++++++++++++++++++++++++------------
>  1 files changed, 33 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a3941d1..6eeeb4b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1114,6 +1114,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  		unsigned long *nr_scanned, struct scan_control *sc,
>  		isolate_mode_t mode, int active, int file)
>  {
> +	struct lruvec *cursor_lruvec = lruvec;
>  	struct list_head *src;
>  	unsigned long nr_taken = 0;
>  	unsigned long nr_lumpy_taken = 0;
> @@ -1197,14 +1198,17 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			    !PageSwapCache(cursor_page))
>  				break;
>  
> +			/* Switch cursor_lruvec lock for lumpy isolate */
> +			if (!__lock_page_lruvec_irq(&cursor_lruvec,
> +						    cursor_page))
> +				continue;
> +
>  			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
>  				unsigned int isolated_pages;
> -				struct lruvec *cursor_lruvec;
>  				int cursor_lru = page_lru(cursor_page);
>  
>  				list_move(&cursor_page->lru, dst);
>  				isolated_pages = hpage_nr_pages(cursor_page);
> -				cursor_lruvec = page_lruvec(cursor_page);
>  				cursor_lruvec->pages_count[cursor_lru] -=
>  								isolated_pages;
>  				VM_BUG_ON((long)cursor_lruvec->
> @@ -1235,6 +1239,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			}
>  		}
>  
> +		/* Restore original lruvec lock */
> +		cursor_lruvec = __relock_page_lruvec(cursor_lruvec, page);
> +
>  		/* If we break out of the loop above, lumpy reclaim failed */
>  		if (pfn < end_pfn)
>  			nr_lumpy_failed++;
> @@ -1325,7 +1332,10 @@ static int too_many_isolated(struct zone *zone, int file,
>  	return isolated > inactive;
>  }
>  
> -static noinline_for_stack void
> +/*
> + * Returns currently locked lruvec
> + */
> +static noinline_for_stack struct lruvec *
>  putback_inactive_pages(struct lruvec *lruvec,
>  		       struct list_head *page_list)
>  {
> @@ -1347,10 +1357,13 @@ putback_inactive_pages(struct lruvec *lruvec,
>  			lock_lruvec_irq(lruvec);
>  			continue;
>  		}
> +
> +		lruvec = __relock_page_lruvec(lruvec, page);
> +		reclaim_stat = &lruvec->reclaim_stat;
> +
>  		SetPageLRU(page);
>  		lru = page_lru(page);
>  
> -		lruvec = page_lruvec(page);
>  		add_page_to_lru_list(lruvec, page, lru);
>  		if (is_active_lru(lru)) {
>  			int file = is_file_lru(lru);
> @@ -1375,6 +1388,8 @@ putback_inactive_pages(struct lruvec *lruvec,
>  	 * To save our caller's stack, now use input list for pages to free.
>  	 */
>  	list_splice(&pages_to_free, page_list);
> +
> +	return lruvec;
>  }
>  
>  static noinline_for_stack void
> @@ -1544,7 +1559,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
>  	__count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);
>  
> -	putback_inactive_pages(lruvec, &page_list);
> +	lruvec = putback_inactive_pages(lruvec, &page_list);
>  
>  	__mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
>  	__mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
> @@ -1603,12 +1618,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>   *
>   * The downside is that we have to touch page->_count against each page.
>   * But we had to alter page->flags anyway.
> + *
> + * Returns currently locked lruvec
>   */
>  
> -static void move_active_pages_to_lru(struct lruvec *lruvec,
> -				     struct list_head *list,
> -				     struct list_head *pages_to_free,
> -				     enum lru_list lru)
> +static struct lruvec *
> +move_active_pages_to_lru(struct lruvec *lruvec,
> +			 struct list_head *list,
> +			 struct list_head *pages_to_free,
> +			 enum lru_list lru)
>  {
>  	unsigned long pgmoved = 0;
>  	struct page *page;
> @@ -1630,10 +1648,11 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
>  
>  		page = lru_to_page(list);
>  
> +		lruvec = __relock_page_lruvec(lruvec, page);
> +
>  		VM_BUG_ON(PageLRU(page));
>  		SetPageLRU(page);
>  
> -		lruvec = page_lruvec(page);
>  		list_move(&page->lru, &lruvec->pages_lru[lru]);
>  		numpages = hpage_nr_pages(page);
>  		lruvec->pages_count[lru] += numpages;
> @@ -1655,6 +1674,8 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
>  	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, pgmoved);
>  	if (!is_active_lru(lru))
>  		__count_vm_events(PGDEACTIVATE, pgmoved);
> +
> +	return lruvec;
>  }
>  
>  static void shrink_active_list(unsigned long nr_to_scan,
> @@ -1744,9 +1765,9 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  	 */
>  	reclaim_stat->recent_rotated[file] += nr_rotated;
>  
> -	move_active_pages_to_lru(lruvec, &l_active, &l_hold,
> +	lruvec = move_active_pages_to_lru(lruvec, &l_active, &l_hold,
>  						LRU_ACTIVE + file * LRU_FILE);
> -	move_active_pages_to_lru(lruvec, &l_inactive, &l_hold,
> +	lruvec = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold,
>  						LRU_BASE   + file * LRU_FILE);
>  	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
>  	unlock_lruvec_irq(lruvec);
> 

Hmm...could you add comments to each function as
"The caller should _lock_ lruvec before calling this functions.
 This function returns a lruvec with _locked_. It may be different from passed one.
 And The callser should unlock lruvec"


Thanks,
-Kame





^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 16/21] mm: handle lruvec relocks in compaction
  2012-02-23 13:52 ` [PATCH v3 16/21] mm: handle lruvec relocks in compaction Konstantin Khlebnikov
@ 2012-02-28  1:13   ` KAMEZAWA Hiroyuki
  2012-02-28  6:31     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  1:13 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:52:56 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Prepare for lru_lock splitting in memory compaction code.
> 
> * disable irqs in acct_isolated() for __mod_zone_page_state(),
>   lru_lock isn't required there.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
>  mm/compaction.c |   30 ++++++++++++++++--------------
>  1 files changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a976b28..54340e4 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -224,8 +224,10 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
>  	list_for_each_entry(page, &cc->migratepages, lru)
>  		count[!!page_is_file_cache(page)]++;
>  
> +	local_irq_disable();
>  	__mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
>  	__mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
> +	local_irq_enable();

Why we need to disable Irq here ??



>  }
>  
>  /* Similar to reclaim, but different enough that they don't share logic */
> @@ -262,7 +264,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	unsigned long nr_scanned = 0, nr_isolated = 0;
>  	struct list_head *migratelist = &cc->migratepages;
>  	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
> -	struct lruvec *lruvec;
> +	struct lruvec *lruvec = NULL;
>  
>  	/* Do not scan outside zone boundaries */
>  	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
> @@ -294,25 +296,24 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  
>  	/* Time to isolate some pages for migration */
>  	cond_resched();
> -	spin_lock_irq(&zone->lru_lock);
>  	for (; low_pfn < end_pfn; low_pfn++) {
>  		struct page *page;
> -		bool locked = true;
>  
>  		/* give a chance to irqs before checking need_resched() */
>  		if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
> -			spin_unlock_irq(&zone->lru_lock);
> -			locked = false;
> +			if (lruvec)
> +				unlock_lruvec_irq(lruvec);
> +			lruvec = NULL;
>  		}
> -		if (need_resched() || spin_is_contended(&zone->lru_lock)) {
> -			if (locked)
> -				spin_unlock_irq(&zone->lru_lock);
> +		if (need_resched() ||
> +		    (lruvec && spin_is_contended(&zone->lru_lock))) {
> +			if (lruvec)
> +				unlock_lruvec_irq(lruvec);
> +			lruvec = NULL;
>  			cond_resched();
> -			spin_lock_irq(&zone->lru_lock);
>  			if (fatal_signal_pending(current))
>  				break;
> -		} else if (!locked)
> -			spin_lock_irq(&zone->lru_lock);
> +		}
>  
>  		/*
>  		 * migrate_pfn does not necessarily start aligned to a
> @@ -359,7 +360,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  			continue;
>  		}
>  
> -		if (!PageLRU(page))
> +		if (!__lock_page_lruvec_irq(&lruvec, page))
>  			continue;

Could you add more comments onto __lock_page_lruvec_irq() ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 17/21] mm: handle lruvec relock in memory controller
  2012-02-23 13:53 ` [PATCH v3 17/21] mm: handle lruvec relock in memory controller Konstantin Khlebnikov
@ 2012-02-28  1:22   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  1:22 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:53:10 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Carefully relock lruvec lru lock at page memory cgroup change.
> 
> * In free_pn_rcu() wait for lruvec lock release.
>   Locking primitives keep lruvec pointer after successful lock held.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 18/21] mm: add to lruvec isolated pages counters
  2012-02-23 13:53 ` [PATCH v3 18/21] mm: add to lruvec isolated pages counters Konstantin Khlebnikov
  2012-02-24  5:32   ` Konstantin Khlebnikov
@ 2012-02-28  1:38   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  1:38 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:53:14 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> This patch adds into struct lruvec counter of isolated pages.
> It is required for keeping lruvec alive till the isolated page putback.
> We cannot rely on resource counter in memory controller, because it
> does not account uncharged memory. And it much better to have common engine
> for dynamical lruvec management, than to tie all logic with memory cgroup magic.
> Plus this is useful information for memory reclaimer balancing.
> 
> There also appears per-cpu page-vectors for putting isolated pages back,
> and function add_page_to_evictable_list(). It is similar to lru_cache_add_lru()
> but it reuse page reference from caller and can adjust isolated pages counters.
> There also new function free_isolated_page_list() which is used at the end of
> shrink_page_list() for freeing pages and adjusting counters of isolated pages.
> 
> Memory cgroups code can shuffle pages between lruvecs without isolation
> if page is already isolated with someone else. Thus page lruvec reference is
> unstable even if page is isolated. It is stable only under lru_lock or if page
> reference count is zero. That's why we must always recheck page_lruvec() even
> on non-lumpy 0-order reclaim, where all pages are isolated from one lruvec.
> 
> Memory controller at moving pege between cgroups now adjust isolated pages
> counter for old lruvec before inserting page to new lruvec.
> Locking lruvec->lru_lock in mem_cgroup_adjust_isolated() also effectively
> stabilizes PageLRU() sign, so nobody will see PageLRU() under old lru_lock
> while page is already moved into other lruvec.
> 
> [ BTW, all lru-id arithmetic can be simplified if we devide unevictable list
>   into file and anon parts. After that we can swap bits in page->flags and
>   calculate lru-id with single instruction ]
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Could you split the part showing statistics of isolated pages into another patch ?



> ---
>  include/linux/mmzone.h |   11 ++++-
>  include/linux/swap.h   |    4 +-
>  mm/compaction.c        |    1 
>  mm/huge_memory.c       |    4 ++
>  mm/internal.h          |    6 ++
>  mm/ksm.c               |    2 -
>  mm/memcontrol.c        |   39 ++++++++++++++--
>  mm/migrate.c           |    2 -
>  mm/rmap.c              |    2 -
>  mm/swap.c              |   76 +++++++++++++++++++++++++++++++
>  mm/vmscan.c            |  116 +++++++++++++++++++++++++++++++++++-------------
>  11 files changed, 218 insertions(+), 45 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 82d5ff3..2e3a298 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -137,13 +137,20 @@ enum lru_list {
>  	LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
>  	LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
>  	LRU_UNEVICTABLE,
> -	NR_LRU_LISTS
> +	NR_EVICTABLE_LRU_LISTS = LRU_UNEVICTABLE,
> +	NR_LRU_LISTS,
> +	LRU_ISOLATED = NR_LRU_LISTS,
> +	LRU_ISOLATED_ANON = LRU_ISOLATED,
> +	LRU_ISOLATED_FILE,
> +	NR_LRU_COUNTERS,
>  };
>  
>  #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)
>  
>  #define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++)
>  
> +#define for_each_lru_counter(cnt) for (cnt = 0; cnt < NR_LRU_COUNTERS; cnt++)
> +
>  static inline int is_file_lru(enum lru_list lru)
>  {
>  	return (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE);
> @@ -298,7 +305,7 @@ struct zone_reclaim_stat {
>  
>  struct lruvec {
>  	struct list_head	pages_lru[NR_LRU_LISTS];
> -	unsigned long		pages_count[NR_LRU_LISTS];
> +	unsigned long		pages_count[NR_LRU_COUNTERS];
>  
>  	struct zone_reclaim_stat	reclaim_stat;
>  
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8630354..3a3ff2c 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -234,7 +234,9 @@ extern void rotate_reclaimable_page(struct page *page);
>  extern void deactivate_page(struct page *page);
>  extern void swap_setup(void);
>  
> -extern void add_page_to_unevictable_list(struct page *page);
> +extern void add_page_to_unevictable_list(struct page *page, bool isolated);
> +extern void add_page_to_evictable_list(struct page *page,
> +					enum lru_list lru, bool isolated);
>  
>  /**
>   * lru_cache_add: add a page to the page lists
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 54340e4..fa74cbe 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -384,6 +384,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  
>  		/* Successfully isolated */
>  		del_page_from_lru_list(lruvec, page, page_lru(page));
> +		lruvec->pages_count[LRU_ISOLATED + page_is_file_cache(page)]++;
>  		list_add(&page->lru, migratelist);
>  		cc->nr_migratepages++;
>  		nr_isolated++;
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 74996b8..46d9f44 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1316,6 +1316,10 @@ static void __split_huge_page_refcount(struct page *page)
>  	__dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
>  	__mod_zone_page_state(lruvec_zone(lruvec), NR_ANON_PAGES, HPAGE_PMD_NR);
>  
> +	/* Fixup isolated pages counter if head page currently isolated */
> +	if (!PageLRU(page))
> +		lruvec->pages_count[LRU_ISOLATED_ANON] -= HPAGE_PMD_NR-1;
> +
>  	ClearPageCompound(page);
>  	compound_unlock(page);
>  	unlock_lruvec_irq(lruvec);
> diff --git a/mm/internal.h b/mm/internal.h
> index 9454752..6dd2e70 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -265,7 +265,11 @@ extern unsigned long highest_memmap_pfn;
>   * in mm/vmscan.c:
>   */
>  extern int isolate_lru_page(struct page *page);
> -extern void putback_lru_page(struct page *page);
> +extern void __putback_lru_page(struct page *page, bool isolated);
> +static inline void putback_lru_page(struct page *page)
> +{
> +	__putback_lru_page(page, true);
> +}
>  
>  /*
>   * in mm/page_alloc.c
> diff --git a/mm/ksm.c b/mm/ksm.c
> index e20de58..109e6ec 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1592,7 +1592,7 @@ struct page *ksm_does_need_to_copy(struct page *page,
>  		if (page_evictable(new_page, vma))
>  			lru_cache_add_lru(new_page, LRU_ACTIVE_ANON);
>  		else
> -			add_page_to_unevictable_list(new_page);
> +			add_page_to_unevictable_list(new_page, false);
>  	}
>  
>  	return new_page;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 230f434..4de8044 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -697,7 +697,7 @@ mem_cgroup_zone_nr_lru_pages(struct mem_cgroup *memcg, int nid, int zid,
>  
>  	mz = mem_cgroup_zoneinfo(memcg, nid, zid);
>  
> -	for_each_lru(lru) {
> +	for_each_lru_counter(lru) {
>  		if (BIT(lru) & lru_mask)
>  			ret += mz->lruvec.pages_count[lru];
>  	}
> @@ -2354,6 +2354,17 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> +static void mem_cgroup_adjust_isolated(struct lruvec *lruvec,
> +				       struct page *page, int delta)
> +{
> +	int file = page_is_file_cache(page);
> +	unsigned long flags;
> +
> +	lock_lruvec(lruvec, &flags);
> +	lruvec->pages_count[LRU_ISOLATED + file] += delta;
> +	unlock_lruvec(lruvec, &flags);
> +}
> +
>  /**
>   * mem_cgroup_move_account - move account of the page
>   * @page: the page
> @@ -2452,6 +2463,7 @@ static int mem_cgroup_move_parent(struct page *page,
>  	struct mem_cgroup *parent;
>  	unsigned int nr_pages;
>  	unsigned long uninitialized_var(flags);
> +	struct lruvec *lruvec;
>  	int ret;
>  
>  	/* Is ROOT ? */
> @@ -2471,6 +2483,8 @@ static int mem_cgroup_move_parent(struct page *page,
>  	if (ret)
>  		goto put_back;
>  
> +	lruvec = page_lruvec(page);
> +
>  	if (nr_pages > 1)
>  		flags = compound_lock_irqsave(page);
>  
> @@ -2480,8 +2494,11 @@ static int mem_cgroup_move_parent(struct page *page,
>  
>  	if (nr_pages > 1)
>  		compound_unlock_irqrestore(page, flags);
> +	if (!ret)
> +		/* This also stabilize PageLRU() sign for lruvec lock holder. */
> +		mem_cgroup_adjust_isolated(lruvec, page, -nr_pages);
>  put_back:
> -	putback_lru_page(page);
> +	__putback_lru_page(page, !ret);
>  put:
>  	put_page(page);
>  out:
> @@ -3879,6 +3896,8 @@ enum {
>  	MCS_INACTIVE_FILE,
>  	MCS_ACTIVE_FILE,
>  	MCS_UNEVICTABLE,
> +	MCS_ISOLATED_ANON,
> +	MCS_ISOLATED_FILE,
>  	NR_MCS_STAT,
>  };
>  
> @@ -3902,7 +3921,9 @@ struct {
>  	{"active_anon", "total_active_anon"},
>  	{"inactive_file", "total_inactive_file"},
>  	{"active_file", "total_active_file"},
> -	{"unevictable", "total_unevictable"}
> +	{"unevictable", "total_unevictable"},
> +	{"isolated_anon", "total_isolated_anon"},
> +	{"isolated_file", "total_isolated_file"},
>  };
>  
>  
> @@ -3942,6 +3963,10 @@ mem_cgroup_get_local_stat(struct mem_cgroup *memcg, struct mcs_total_stat *s)
>  	s->stat[MCS_ACTIVE_FILE] += val * PAGE_SIZE;
>  	val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
>  	s->stat[MCS_UNEVICTABLE] += val * PAGE_SIZE;
> +	val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ISOLATED_ANON));
> +	s->stat[MCS_ISOLATED_ANON] += val * PAGE_SIZE;
> +	val = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_ISOLATED_FILE));
> +	s->stat[MCS_ISOLATED_FILE] += val * PAGE_SIZE;
>  }
>  
>  static void
> @@ -5243,6 +5268,7 @@ retry:
>  		struct page *page;
>  		struct page_cgroup *pc;
>  		swp_entry_t ent;
> +		struct lruvec *lruvec;
>  
>  		if (!mc.precharge)
>  			break;
> @@ -5253,14 +5279,17 @@ retry:
>  			page = target.page;
>  			if (isolate_lru_page(page))
>  				goto put;
> +			lruvec = page_lruvec(page);
>  			pc = lookup_page_cgroup(page);
>  			if (!mem_cgroup_move_account(page, 1, pc,
>  						     mc.from, mc.to, false)) {
>  				mc.precharge--;
>  				/* we uncharge from mc.from later. */
>  				mc.moved_charge++;
> -			}
> -			putback_lru_page(page);
> +				mem_cgroup_adjust_isolated(lruvec, page, -1);
> +				__putback_lru_page(page, false);
> +			} else
> +				__putback_lru_page(page, true);
>  put:			/* is_target_pte_for_mc() gets the page */
>  			put_page(page);
>  			break;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index df141f6..de13a0e 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -868,7 +868,7 @@ out:
>  	 * Move the new page to the LRU. If migration was not successful
>  	 * then this will free the page.
>  	 */
> -	putback_lru_page(newpage);
> +	__putback_lru_page(newpage, false);
>  	if (result) {
>  		if (rc)
>  			*result = rc;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index aa547d4..06b5def9 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1139,7 +1139,7 @@ void page_add_new_anon_rmap(struct page *page,
>  	if (page_evictable(page, vma))
>  		lru_cache_add_lru(page, LRU_ACTIVE_ANON);
>  	else
> -		add_page_to_unevictable_list(page);
> +		add_page_to_unevictable_list(page, false);
>  }
>  
>  /**
> diff --git a/mm/swap.c b/mm/swap.c
> index 3689e3d..998c71c 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -37,6 +37,8 @@
>  int page_cluster;
>  
>  static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
> +static DEFINE_PER_CPU(struct pagevec[NR_EVICTABLE_LRU_LISTS],
> +					   lru_add_isolated_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
>  
> @@ -381,6 +383,67 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
>  	__lru_cache_add(page, lru);
>  }
>  
> +static void __lru_add_isolated_fn(struct lruvec *lruvec,
> +				  struct page *page, void *arg)
> +{
> +	enum lru_list lru = (enum lru_list)arg;
> +
> +	VM_BUG_ON(PageActive(page));
> +	VM_BUG_ON(PageUnevictable(page));
> +	VM_BUG_ON(PageLRU(page));
> +
> +	SetPageLRU(page);
> +	if (is_active_lru(lru))
> +		SetPageActive(page);
> +	update_page_reclaim_stat(lruvec, lru);
> +	add_page_to_lru_list(lruvec, page, lru);
> +	lruvec->pages_count[LRU_ISOLATED + is_file_lru(lru)] -=
> +						hpage_nr_pages(page);
> +}
> +
> +static void __lru_add_isolated(struct pagevec *pvec, enum lru_list lru)
> +{
> +	VM_BUG_ON(is_unevictable_lru(lru));
> +	pagevec_lru_move_fn(pvec, __lru_add_isolated_fn, (void *)lru);
> +}
> +
> +/**
> + * add_page_to_evictable_list - add page to lru list
> + * @page	the page to be added into the lru list
> + * @lru		lru list id
> + * @isolated	need to adjust isolated pages counter
> + *
> + * Like lru_cache_add_lru() but reuses caller's reference to page and
> + * taking care about isolated pages counter on lruvec if isolated = true.
> + */
> +void add_page_to_evictable_list(struct page *page,
> +				enum lru_list lru, bool isolated)
> +{
> +	struct pagevec *pvec;
> +
> +	if (PageActive(page)) {
> +		VM_BUG_ON(PageUnevictable(page));
> +		ClearPageActive(page);
> +	} else if (PageUnevictable(page)) {
> +		VM_BUG_ON(PageActive(page));
> +		ClearPageUnevictable(page);
> +	}
> +
> +	VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
> +
> +	preempt_disable();
> +	if (isolated) {
> +		pvec = __this_cpu_ptr(lru_add_isolated_pvecs + lru);
> +		if (!pagevec_add(pvec, page))
> +			__lru_add_isolated(pvec, lru);
> +	} else {
> +		pvec = __this_cpu_ptr(lru_add_pvecs + lru);
> +		if (!pagevec_add(pvec, page))
> +			__pagevec_lru_add(pvec, lru);
> +	}
> +	preempt_enable();
> +}
> +
>  /**
>   * add_page_to_unevictable_list - add a page to the unevictable list
>   * @page:  the page to be added to the unevictable list
> @@ -391,7 +454,7 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
>   * while it's locked or otherwise "invisible" to other tasks.  This is
>   * difficult to do when using the pagevec cache, so bypass that.
>   */
> -void add_page_to_unevictable_list(struct page *page)
> +void add_page_to_unevictable_list(struct page *page, bool isolated)
>  {
>  	struct lruvec *lruvec;
>  
> @@ -399,6 +462,10 @@ void add_page_to_unevictable_list(struct page *page)
>  	SetPageUnevictable(page);
>  	SetPageLRU(page);
>  	add_page_to_lru_list(lruvec, page, LRU_UNEVICTABLE);
> +	if (isolated) {
> +		int type = LRU_ISOLATED + page_is_file_cache(page);
> +		lruvec->pages_count[type] -= hpage_nr_pages(page);
> +	}
>  	unlock_lruvec_irq(lruvec);
>  }
>  
> @@ -485,6 +552,13 @@ static void drain_cpu_pagevecs(int cpu)
>  			__pagevec_lru_add(pvec, lru);
>  	}
>  
> +	pvecs = per_cpu(lru_add_isolated_pvecs, cpu);
> +	for_each_evictable_lru(lru) {
> +		pvec = &pvecs[lru];
> +		if (pagevec_count(pvec))
> +			__lru_add_isolated(pvec, lru);
> +	}
> +
>  	pvec = &per_cpu(lru_rotate_pvecs, cpu);
>  	if (pagevec_count(pvec)) {
>  		unsigned long flags;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6eeeb4b..a1ff010 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -42,6 +42,7 @@
>  #include <linux/sysctl.h>
>  #include <linux/oom.h>
>  #include <linux/prefetch.h>
> +#include <trace/events/kmem.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/div64.h>
> @@ -585,15 +586,16 @@ int remove_mapping(struct address_space *mapping, struct page *page)
>  }
>  
>  /**
> - * putback_lru_page - put previously isolated page onto appropriate LRU list
> + * __putback_lru_page - put previously isolated page onto appropriate LRU list
>   * @page: page to be put back to appropriate lru list
> + * @isolated: isolated pages counter update required
>   *
>   * Add previously isolated @page to appropriate LRU list.
>   * Page may still be unevictable for other reasons.
>   *
>   * lru_lock must not be held, interrupts must be enabled.
>   */
> -void putback_lru_page(struct page *page)
> +void __putback_lru_page(struct page *page, bool isolated)
>  {
>  	int lru;
>  	int active = !!TestClearPageActive(page);
> @@ -612,14 +614,16 @@ redo:
>  		 * We know how to handle that.
>  		 */
>  		lru = active + page_lru_base_type(page);
> -		lru_cache_add_lru(page, lru);
> +		add_page_to_evictable_list(page, lru, isolated);
> +		if (was_unevictable)
> +			count_vm_event(UNEVICTABLE_PGRESCUED);
>  	} else {
>  		/*
>  		 * Put unevictable pages directly on zone's unevictable
>  		 * list.
>  		 */
>  		lru = LRU_UNEVICTABLE;
> -		add_page_to_unevictable_list(page);
> +		add_page_to_unevictable_list(page, isolated);
>  		/*
>  		 * When racing with an mlock or AS_UNEVICTABLE clearing
>  		 * (page is unlocked) make sure that if the other thread
> @@ -631,30 +635,26 @@ redo:
>  		 * The other side is TestClearPageMlocked() or shmem_lock().
>  		 */
>  		smp_mb();
> -	}
> -
> -	/*
> -	 * page's status can change while we move it among lru. If an evictable
> -	 * page is on unevictable list, it never be freed. To avoid that,
> -	 * check after we added it to the list, again.
> -	 */
> -	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> -		if (!isolate_lru_page(page)) {
> -			put_page(page);
> -			goto redo;
> -		}
> -		/* This means someone else dropped this page from LRU
> -		 * So, it will be freed or putback to LRU again. There is
> -		 * nothing to do here.
> +		/*
> +		 * page's status can change while we move it among lru.
> +		 * If an evictable page is on unevictable list, it never be freed.
> +		 * To avoid that, check after we added it to the list, again.
>  		 */
> +		if (page_evictable(page, NULL)) {
> +			if (!isolate_lru_page(page)) {
> +				isolated = true;
> +				put_page(page);
> +				goto redo;
> +			}
> +			/* This means someone else dropped this page from LRU
> +			 * So, it will be freed or putback to LRU again. There is
> +			 * nothing to do here.
> +			 */
> +		}
> +		put_page(page);		/* drop ref from isolate */
> +		if (!was_unevictable)
> +			count_vm_event(UNEVICTABLE_PGCULLED);
>  	}
> -
> -	if (was_unevictable && lru != LRU_UNEVICTABLE)
> -		count_vm_event(UNEVICTABLE_PGRESCUED);
> -	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> -		count_vm_event(UNEVICTABLE_PGCULLED);
> -
> -	put_page(page);		/* drop ref from isolate */
>  }
>  
>  enum page_references {
> @@ -724,6 +724,48 @@ static enum page_references page_check_references(struct page *page,
>  }
>  
>  /*
> + * Free a list of isolated 0-order pages
> + */
> +static void free_isolated_page_list(struct lruvec *lruvec,
> +				    struct list_head *list, int cold)
> +{
> +	struct page *page, *next;
> +	unsigned long nr_pages[2];
> +	struct list_head queue;
> +
> +again:
> +	INIT_LIST_HEAD(&queue);
> +	nr_pages[0] = nr_pages[1] = 0;
> +
> +	list_for_each_entry_safe(page, next, list, lru) {
> +		if (unlikely(lruvec != page_lruvec(page))) {
> +			list_add_tail(&page->lru, &queue);
> +			continue;
> +		}
> +		nr_pages[page_is_file_cache(page)]++;
> +		trace_mm_page_free_batched(page, cold);
> +		free_hot_cold_page(page, cold);
> +	}
> +
> +	lock_lruvec_irq(lruvec);
> +	lruvec->pages_count[LRU_ISOLATED_ANON] -= nr_pages[0];
> +	lruvec->pages_count[LRU_ISOLATED_FILE] -= nr_pages[1];
> +	unlock_lruvec_irq(lruvec);


Is it guaranteed that all pages are from the same lruvec ?

Thanks,
-Kame

> +
> +	/*
> +	 * Usually there will be only one iteration, because
> +	 * at 0-order reclaim all pages are from one lruvec
> +	 * if we didn't raced with memory cgroup shuffling.
> +	 */
> +	if (unlikely(!list_empty(&queue))) {
> +		list_replace(&queue, list);
> +		lruvec = page_lruvec(list_first_entry(list,
> +					struct page, lru));
> +		goto again;
> +	}
> +}
> +
> +/*
>   * shrink_page_list() returns the number of reclaimed pages
>   */
>  static unsigned long shrink_page_list(struct list_head *page_list,
> @@ -986,7 +1028,7 @@ keep_lumpy:
>  	if (nr_dirty && nr_dirty == nr_congested && global_reclaim(sc))
>  		zone_set_flag(lruvec_zone(lruvec), ZONE_CONGESTED);
>  
> -	free_hot_cold_page_list(&free_pages, 1);
> +	free_isolated_page_list(lruvec, &free_pages, 1);
>  
>  	list_splice(&ret_pages, page_list);
>  	count_vm_events(PGACTIVATE, pgactivate);
> @@ -1206,11 +1248,14 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
>  				unsigned int isolated_pages;
>  				int cursor_lru = page_lru(cursor_page);
> +				int cur_file = page_is_file_cache(cursor_page);
>  
>  				list_move(&cursor_page->lru, dst);
>  				isolated_pages = hpage_nr_pages(cursor_page);
>  				cursor_lruvec->pages_count[cursor_lru] -=
>  								isolated_pages;
> +				cursor_lruvec->pages_count[LRU_ISOLATED +
> +						cur_file] += isolated_pages;
>  				VM_BUG_ON((long)cursor_lruvec->
>  						pages_count[cursor_lru] < 0);
>  				nr_taken += isolated_pages;
> @@ -1248,6 +1293,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  	}
>  
>  	lruvec->pages_count[lru] -= nr_taken - nr_lumpy_taken;
> +	lruvec->pages_count[LRU_ISOLATED + file] += nr_taken - nr_lumpy_taken;
>  	VM_BUG_ON((long)lruvec->pages_count[lru] < 0);
>  
>  	*nr_scanned = scan;
> @@ -1296,11 +1342,14 @@ int isolate_lru_page(struct page *page)
>  
>  		lruvec = lock_page_lruvec_irq(page);
>  		if (PageLRU(page)) {
> +			int file = page_is_file_cache(page);
>  			int lru = page_lru(page);
>  			ret = 0;
>  			get_page(page);
>  			ClearPageLRU(page);
>  			del_page_from_lru_list(lruvec, page, lru);
> +			lruvec->pages_count[LRU_ISOLATED + file] +=
> +							hpage_nr_pages(page);
>  		}
>  		unlock_lruvec_irq(lruvec);
>  	}
> @@ -1347,7 +1396,7 @@ putback_inactive_pages(struct lruvec *lruvec,
>  	 */
>  	while (!list_empty(page_list)) {
>  		struct page *page = lru_to_page(page_list);
> -		int lru;
> +		int numpages, lru, file;
>  
>  		VM_BUG_ON(PageLRU(page));
>  		list_del(&page->lru);
> @@ -1363,13 +1412,13 @@ putback_inactive_pages(struct lruvec *lruvec,
>  
>  		SetPageLRU(page);
>  		lru = page_lru(page);
> +		file = is_file_lru(lru);
> +		numpages = hpage_nr_pages(page);
>  
>  		add_page_to_lru_list(lruvec, page, lru);
> -		if (is_active_lru(lru)) {
> -			int file = is_file_lru(lru);
> -			int numpages = hpage_nr_pages(page);
> +		lruvec->pages_count[LRU_ISOLATED + file] -= numpages;
> +		if (is_active_lru(lru))
>  			reclaim_stat->recent_rotated[file] += numpages;
> -		}
>  		if (put_page_testzero(page)) {
>  			__ClearPageLRU(page);
>  			__ClearPageActive(page);
> @@ -1656,6 +1705,9 @@ move_active_pages_to_lru(struct lruvec *lruvec,
>  		list_move(&page->lru, &lruvec->pages_lru[lru]);
>  		numpages = hpage_nr_pages(page);
>  		lruvec->pages_count[lru] += numpages;
> +		/* There should be no mess between file and anon pages */
> +		lruvec->pages_count[LRU_ISOLATED +
> +				    is_file_lru(lru)] -= numpages;
>  		pgmoved += numpages;
>  
>  		if (put_page_testzero(page)) {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 19/21] memcg: check lru vectors emptiness in pre-destroy
  2012-02-23 13:53 ` [PATCH v3 19/21] memcg: check lru vectors emptiness in pre-destroy Konstantin Khlebnikov
@ 2012-02-28  1:43   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  1:43 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:53:19 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> We must abort cgroup destroying if it still not empty,
> resource counter cannot catch isolated uncharged pages.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

I like this. 
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>

> ---
>  mm/memcontrol.c |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4de8044..fbeff85 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4859,8 +4859,16 @@ free_out:
>  static int mem_cgroup_pre_destroy(struct cgroup *cont)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	int ret;
> +
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	if (ret)
> +		return ret;
>  
> -	return mem_cgroup_force_empty(memcg, false);
> +	if (mem_cgroup_nr_lru_pages(memcg, -1))
> +		return -EBUSY;
> +
> +	return 0;
>  }
>  
>  static void mem_cgroup_destroy(struct cgroup *cont)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 20/21] mm: split zone->lru_lock
  2012-02-23 13:53 ` [PATCH v3 20/21] mm: split zone->lru_lock Konstantin Khlebnikov
@ 2012-02-28  1:49   ` KAMEZAWA Hiroyuki
  2012-02-28  6:39     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  1:49 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:53:23 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Looks like all ready for splitting zone->lru_lock into per-lruvec pieces.
> 
> lruvec locking loop protected with rcu, actually there is irq-disabling instead
> of rcu_read_lock(). Memory controller already releases its lru-vectors after
> syncronize_rcu() in cgroup_diput(). Probably it should be replaced with synchronize_sched()
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

Do we need rcu_read_lock() even if we check isolated pages at pre_destroy() ?
If pre_destroy() ends, pages under a memcg being destroyed were moved to other
cgroup while it's usolated.

So, 
 - PageLRU(page) guarantees lruvec is valid.
 - if !PageLRU(page), the caller of lru_lock should know what it does.
   Once isolated, pre_destroy() never ends and page_lruvec(page) is always stable.

Thanks,
-Kame

> ---
>  include/linux/mmzone.h |    3 +-
>  mm/compaction.c        |    2 +
>  mm/internal.h          |   66 +++++++++++++++++++++++++-----------------------
>  mm/page_alloc.c        |    2 +
>  mm/swap.c              |    2 +
>  5 files changed, 40 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2e3a298..9880150 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -304,6 +304,8 @@ struct zone_reclaim_stat {
>  };
>  
>  struct lruvec {
> +	spinlock_t		lru_lock;
> +
>  	struct list_head	pages_lru[NR_LRU_LISTS];
>  	unsigned long		pages_count[NR_LRU_COUNTERS];
>  
> @@ -386,7 +388,6 @@ struct zone {
>  	ZONE_PADDING(_pad1_)
>  
>  	/* Fields commonly accessed by the page reclaim scanner */
> -	spinlock_t		lru_lock;
>  	struct lruvec		lruvec;
>  
>  	unsigned long		pages_scanned;	   /* since last reclaim */
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fa74cbe..8661bb58 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -306,7 +306,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  			lruvec = NULL;
>  		}
>  		if (need_resched() ||
> -		    (lruvec && spin_is_contended(&zone->lru_lock))) {
> +		    (lruvec && spin_is_contended(&lruvec->lru_lock))) {
>  			if (lruvec)
>  				unlock_lruvec_irq(lruvec);
>  			lruvec = NULL;
> diff --git a/mm/internal.h b/mm/internal.h
> index 6dd2e70..9a9fd53 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -15,27 +15,27 @@
>  
>  static inline void lock_lruvec(struct lruvec *lruvec, unsigned long *flags)
>  {
> -	spin_lock_irqsave(&lruvec_zone(lruvec)->lru_lock, *flags);
> +	spin_lock_irqsave(&lruvec->lru_lock, *flags);
>  }
>  
>  static inline void lock_lruvec_irq(struct lruvec *lruvec)
>  {
> -	spin_lock_irq(&lruvec_zone(lruvec)->lru_lock);
> +	spin_lock_irq(&lruvec->lru_lock);
>  }
>  
>  static inline void unlock_lruvec(struct lruvec *lruvec, unsigned long *flags)
>  {
> -	spin_unlock_irqrestore(&lruvec_zone(lruvec)->lru_lock, *flags);
> +	spin_unlock_irqrestore(&lruvec->lru_lock, *flags);
>  }
>  
>  static inline void unlock_lruvec_irq(struct lruvec *lruvec)
>  {
> -	spin_unlock_irq(&lruvec_zone(lruvec)->lru_lock);
> +	spin_unlock_irq(&lruvec->lru_lock);
>  }
>  
>  static inline void wait_lruvec_unlock(struct lruvec *lruvec)
>  {
> -	spin_unlock_wait(&lruvec_zone(lruvec)->lru_lock);
> +	spin_unlock_wait(&lruvec->lru_lock);
>  }
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> @@ -46,37 +46,39 @@ static inline void wait_lruvec_unlock(struct lruvec *lruvec)
>  static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
>  						  struct page *page)
>  {
> -	/* Currenyly only one lru_lock per-zone */
> -	return page_lruvec(page);
> +	struct lruvec *lruvec;
> +
> +	do {
> +		lruvec = page_lruvec(page);
> +		if (likely(lruvec == locked_lruvec))
> +			return lruvec;
> +		spin_unlock(&locked_lruvec->lru_lock);
> +		spin_lock(&lruvec->lru_lock);
> +		locked_lruvec = lruvec;
> +	} while (1);
>  }
>  
>  static inline struct lruvec *relock_page_lruvec_irq(struct lruvec *lruvec,
>  						    struct page *page)
>  {
> -	struct zone *zone = page_zone(page);
> -
>  	if (!lruvec) {
> -		spin_lock_irq(&zone->lru_lock);
> -	} else if (zone != lruvec_zone(lruvec)) {
> -		unlock_lruvec_irq(lruvec);
> -		spin_lock_irq(&zone->lru_lock);
> +		local_irq_disable();
> +		lruvec = page_lruvec(page);
> +		spin_lock(&lruvec->lru_lock);
>  	}
> -	return page_lruvec(page);
> +	return __relock_page_lruvec(lruvec, page);
>  }
>  
>  static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
>  						struct page *page,
>  						unsigned long *flags)
>  {
> -	struct zone *zone = page_zone(page);
> -
>  	if (!lruvec) {
> -		spin_lock_irqsave(&zone->lru_lock, *flags);
> -	} else if (zone != lruvec_zone(lruvec)) {
> -		unlock_lruvec(lruvec, flags);
> -		spin_lock_irqsave(&zone->lru_lock, *flags);
> +		local_irq_save(*flags);
> +		lruvec = page_lruvec(page);
> +		spin_lock(&lruvec->lru_lock);
>  	}
> -	return page_lruvec(page);
> +	return __relock_page_lruvec(lruvec, page);
>  }
>  
>  /*
> @@ -87,22 +89,24 @@ static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
>  static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
>  					  struct page *page)
>  {
> -	struct zone *zone;
>  	bool ret = false;
>  
> +	rcu_read_lock();
> +	/*
> +	 * If we see there PageLRU(), it means page has valid lruvec link.
> +	 * We need protect whole operation with single rcu-interval, otherwise
> +	 * lruvec which hold this LRU sign can run out before we secure it.
> +	 */
>  	if (PageLRU(page)) {
>  		if (!*lruvec) {
> -			zone = page_zone(page);
> -			spin_lock_irq(&zone->lru_lock);
> -		} else
> -			zone = lruvec_zone(*lruvec);
> -
> -		if (PageLRU(page)) {
>  			*lruvec = page_lruvec(page);
> +			lock_lruvec_irq(*lruvec);
> +		}
> +		*lruvec = __relock_page_lruvec(*lruvec, page);
> +		if (PageLRU(page))
>  			ret = true;
> -		} else
> -			*lruvec = &zone->lruvec;
>  	}
> +	rcu_read_unlock();
>  
>  	return ret;
>  }
> @@ -110,7 +114,7 @@ static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
>  /* Wait for lruvec unlock before locking other lruvec for the same page */
>  static inline void __wait_lruvec_unlock(struct lruvec *lruvec)
>  {
> -	/* Currently only one lru_lock per-zone */
> +	wait_lruvec_unlock(lruvec);
>  }
>  
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ab42446..beadcc9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4294,6 +4294,7 @@ void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
>  	enum lru_list lru;
>  
>  	memset(lruvec, 0, sizeof(struct lruvec));
> +	spin_lock_init(&lruvec->lru_lock);
>  	for_each_lru(lru)
>  		INIT_LIST_HEAD(&lruvec->pages_lru[lru]);
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> @@ -4369,7 +4370,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  #endif
>  		zone->name = zone_names[j];
>  		spin_lock_init(&zone->lock);
> -		spin_lock_init(&zone->lru_lock);
>  		zone_seqlock_init(zone);
>  		zone->zone_pgdat = pgdat;
>  
> diff --git a/mm/swap.c b/mm/swap.c
> index 998c71c..8156181 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -700,7 +700,7 @@ void lru_add_page_tail(struct lruvec *lruvec,
>  	VM_BUG_ON(!PageHead(page));
>  	VM_BUG_ON(PageCompound(page_tail));
>  	VM_BUG_ON(PageLRU(page_tail));
> -	VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&lruvec_zone(lruvec)->lru_lock));
> +	VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&lruvec->lru_lock));
>  
>  	SetPageLRU(page_tail);
>  
> 
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 00/21] mm: lru_lock splitting
  2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
                   ` (22 preceding siblings ...)
  2012-02-25  2:15 ` KAMEZAWA Hiroyuki
@ 2012-02-28  1:52 ` KAMEZAWA Hiroyuki
  2012-02-28  6:49   ` Konstantin Khlebnikov
  23 siblings, 1 reply; 65+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-28  1:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

On Thu, 23 Feb 2012 17:51:36 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> v3 changes:
> * inactive-ratio reworked again, now it always calculated from from scratch
> * hierarchical pte reference bits filter in memory-cgroup reclaimer
> * fixed two bugs in locking, found by Hugh Dickins
> * locking functions slightly simplified
> * new patch for isolated pages accounting
> * new patch with lru interleaving
> 
> This patchset is based on next-20120210
> 
> git: https://github.com/koct9i/linux/commits/lruvec-v3
> 
I'm sorry I can't have enough review time in these days but the whole series
seems good to me.


BTW, how about trying to merge patch 1/21 -> 13or14/21 first ?
This series adds many changes to various place under /mm. So, step-by-step
merging will be better I think.
(Just says as this because I tend to split a long series of patch to 
 small sets of patches and merge them one by one to reduce my own maintainance cost.)

At final lock splitting, performance number should be in changelog.

Thanks,
-Kame

> ---
> 
> Konstantin Khlebnikov (21):
>       memcg: unify inactive_ratio calculation
>       memcg: make mm_match_cgroup() hirarchical
>       memcg: fix page_referencies cgroup filter on global reclaim
>       memcg: use vm_swappiness from target memory cgroup
>       mm: rename lruvec->lists into lruvec->pages_lru
>       mm: lruvec linking functions
>       mm: add lruvec->pages_count
>       mm: unify inactive_list_is_low()
>       mm: add lruvec->reclaim_stat
>       mm: kill struct mem_cgroup_zone
>       mm: move page-to-lruvec translation upper
>       mm: push lruvec into update_page_reclaim_stat()
>       mm: push lruvecs from pagevec_lru_move_fn() to iterator
>       mm: introduce lruvec locking primitives
>       mm: handle lruvec relocks on lumpy reclaim
>       mm: handle lruvec relocks in compaction
>       mm: handle lruvec relock in memory controller
>       mm: add to lruvec isolated pages counters
>       memcg: check lru vectors emptiness in pre-destroy
>       mm: split zone->lru_lock
>       mm: zone lru vectors interleaving
> 
> 
>  include/linux/huge_mm.h    |    3 
>  include/linux/memcontrol.h |   75 ------
>  include/linux/mm.h         |   66 +++++
>  include/linux/mm_inline.h  |   19 +-
>  include/linux/mmzone.h     |   39 ++-
>  include/linux/swap.h       |    6 
>  mm/Kconfig                 |   16 +
>  mm/compaction.c            |   31 +--
>  mm/huge_memory.c           |   14 +
>  mm/internal.h              |  204 +++++++++++++++++
>  mm/ksm.c                   |    2 
>  mm/memcontrol.c            |  343 +++++++++++-----------------
>  mm/migrate.c               |    2 
>  mm/page_alloc.c            |   70 +-----
>  mm/rmap.c                  |    2 
>  mm/swap.c                  |  217 ++++++++++--------
>  mm/vmscan.c                |  534 ++++++++++++++++++++++++--------------------
>  mm/vmstat.c                |    6 
>  18 files changed, 932 insertions(+), 717 deletions(-)
> 
> -- 
> Signature
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 05/21] mm: rename lruvec->lists into lruvec->pages_lru
  2012-02-28  0:20   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:04     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:52:00 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> This is much more unique and grep-friendly name.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> I worries this kind of change can cause many hunks and make merging difficult..
> But this seems not very destructive..
>
> Reviewed-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> I have no strong opinions to this naming. How other mm developpers think ?
>
> I personally think making this kind of changes in the head of patch set tend do
> make it difficult to merge full sets of patche series.

This rename allows to easily highlight lru-lists manipulations,
because some of them are in non-trivial places.

>
> Thanks,
> -Kame
>
>> ---
>>   include/linux/mm_inline.h |    2 +-
>>   include/linux/mmzone.h    |    2 +-
>>   mm/memcontrol.c           |    6 +++---
>>   mm/page_alloc.c           |    2 +-
>>   mm/swap.c                 |    4 ++--
>>   mm/vmscan.c               |    6 +++---
>>   6 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
>> index 227fd3e..8415596 100644
>> --- a/include/linux/mm_inline.h
>> +++ b/include/linux/mm_inline.h
>> @@ -27,7 +27,7 @@ add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
>>   	struct lruvec *lruvec;
>>
>>   	lruvec = mem_cgroup_lru_add_list(zone, page, lru);
>> -	list_add(&page->lru,&lruvec->lists[lru]);
>> +	list_add(&page->lru,&lruvec->pages_lru[lru]);
>>   	__mod_zone_page_state(zone, NR_LRU_BASE + lru, hpage_nr_pages(page));
>>   }
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 3e1f7ff..ddd0fd2 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -160,7 +160,7 @@ static inline int is_unevictable_lru(enum lru_list lru)
>>   }
>>
>>   struct lruvec {
>> -	struct list_head lists[NR_LRU_LISTS];
>> +	struct list_head pages_lru[NR_LRU_LISTS];
>>   };
>>
>>   /* Mask used at gathering information at once (see memcontrol.c) */
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 77f5d48..8f8c7c4 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1050,7 +1050,7 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
>>    * the lruvec for the given @zone and the memcg @page is charged to.
>>    *
>>    * The callsite is then responsible for physically linking the page to
>> - * the returned lruvec->lists[@lru].
>> + * the returned lruvec->pages_lru[@lru].
>>    */
>>   struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>>   				       enum lru_list lru)
>> @@ -3592,7 +3592,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
>>
>>   	zone =&NODE_DATA(node)->node_zones[zid];
>>   	mz = mem_cgroup_zoneinfo(memcg, node, zid);
>> -	list =&mz->lruvec.lists[lru];
>> +	list =&mz->lruvec.pages_lru[lru];
>>
>>   	loop = mz->lru_size[lru];
>>   	/* give some margin against EBUSY etc...*/
>> @@ -4716,7 +4716,7 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
>>   	for (zone = 0; zone<  MAX_NR_ZONES; zone++) {
>>   		mz =&pn->zoneinfo[zone];
>>   		for_each_lru(lru)
>> -			INIT_LIST_HEAD(&mz->lruvec.lists[lru]);
>> +			INIT_LIST_HEAD(&mz->lruvec.pages_lru[lru]);
>>   		mz->usage_in_excess = 0;
>>   		mz->on_tree = false;
>>   		mz->memcg = memcg;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 38f6744..5f19392 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4363,7 +4363,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>>
>>   		zone_pcp_init(zone);
>>   		for_each_lru(lru)
>> -			INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
>> +			INIT_LIST_HEAD(&zone->lruvec.pages_lru[lru]);
>>   		zone->reclaim_stat.recent_rotated[0] = 0;
>>   		zone->reclaim_stat.recent_rotated[1] = 0;
>>   		zone->reclaim_stat.recent_scanned[0] = 0;
>> diff --git a/mm/swap.c b/mm/swap.c
>> index fff1ff7..17993c0 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -238,7 +238,7 @@ static void pagevec_move_tail_fn(struct page *page, void *arg)
>>
>>   		lruvec = mem_cgroup_lru_move_lists(page_zone(page),
>>   						   page, lru, lru);
>> -		list_move_tail(&page->lru,&lruvec->lists[lru]);
>> +		list_move_tail(&page->lru,&lruvec->pages_lru[lru]);
>>   		(*pgmoved)++;
>>   	}
>>   }
>> @@ -482,7 +482,7 @@ static void lru_deactivate_fn(struct page *page, void *arg)
>>   		 * We moves tha page into tail of inactive.
>>   		 */
>>   		lruvec = mem_cgroup_lru_move_lists(zone, page, lru, lru);
>> -		list_move_tail(&page->lru,&lruvec->lists[lru]);
>> +		list_move_tail(&page->lru,&lruvec->pages_lru[lru]);
>>   		__count_vm_event(PGROTATED);
>>   	}
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 8b59cb5..e41ad52 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1164,7 +1164,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>   		lru += LRU_ACTIVE;
>>   	if (file)
>>   		lru += LRU_FILE;
>> -	src =&lruvec->lists[lru];
>> +	src =&lruvec->pages_lru[lru];
>>
>>   	for (scan = 0; scan<  nr_to_scan&&  !list_empty(src); scan++) {
>>   		struct page *page;
>> @@ -1663,7 +1663,7 @@ static void move_active_pages_to_lru(struct zone *zone,
>>   		SetPageLRU(page);
>>
>>   		lruvec = mem_cgroup_lru_add_list(zone, page, lru);
>> -		list_move(&page->lru,&lruvec->lists[lru]);
>> +		list_move(&page->lru,&lruvec->pages_lru[lru]);
>>   		pgmoved += hpage_nr_pages(page);
>>
>>   		if (put_page_testzero(page)) {
>> @@ -3592,7 +3592,7 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
>>   			__dec_zone_state(zone, NR_UNEVICTABLE);
>>   			lruvec = mem_cgroup_lru_move_lists(zone, page,
>>   						LRU_UNEVICTABLE, lru);
>> -			list_move(&page->lru,&lruvec->lists[lru]);
>> +			list_move(&page->lru,&lruvec->pages_lru[lru]);
>>   			__inc_zone_state(zone, NR_INACTIVE_ANON + lru);
>>   			pgrescued++;
>>   		}
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 06/21] mm: lruvec linking functions
  2012-02-28  0:27   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:09     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:52:04 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> This patch adds links from page to its lruvec and from lruvec to its zone and node.
>> If CONFIG_CGROUP_MEM_RES_CTLR=n they just page_zone() and container_of().
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> small comments in below.
>
>> ---
>>   include/linux/mm.h     |   37 +++++++++++++++++++++++++++++++++++++
>>   include/linux/mmzone.h |   12 ++++++++----
>>   mm/internal.h          |    1 +
>>   mm/memcontrol.c        |   27 ++++++++++++++++++++++++---
>>   mm/page_alloc.c        |   17 ++++++++++++++---
>>   5 files changed, 84 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index ee3ebc1..c6dc4ab 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -728,6 +728,43 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
>>   #endif
>>   }
>>
>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>> +
>> +/* Multiple lruvecs in zone */
>> +
>> +extern struct lruvec *page_lruvec(struct page *page);
>> +
>> +static inline struct zone *lruvec_zone(struct lruvec *lruvec)
>> +{
>> +	return lruvec->zone;
>> +}
>> +
>> +static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
>> +{
>> +	return lruvec->node;
>> +}
>> +
>> +#else /* CONFIG_CGROUP_MEM_RES_CTLR */
>> +
>> +/* Single lruvec in zone */
>> +
>> +static inline struct lruvec *page_lruvec(struct page *page)
>> +{
>> +	return&page_zone(page)->lruvec;
>> +}
>> +
>> +static inline struct zone *lruvec_zone(struct lruvec *lruvec)
>> +{
>> +	return container_of(lruvec, struct zone, lruvec);
>> +}
>> +
>> +static inline struct pglist_data *lruvec_node(struct lruvec *lruvec)
>> +{
>> +	return lruvec_zone(lruvec)->zone_pgdat;
>> +}
>> +
>> +#endif /* CONFIG_CGROUP_MEM_RES_CTLR */
>> +
>>   /*
>>    * Some inline functions in vmstat.h depend on page_zone()
>>    */
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index ddd0fd2..be8873a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -159,10 +159,6 @@ static inline int is_unevictable_lru(enum lru_list lru)
>>   	return (lru == LRU_UNEVICTABLE);
>>   }
>>
>> -struct lruvec {
>> -	struct list_head pages_lru[NR_LRU_LISTS];
>> -};
>> -
>>   /* Mask used at gathering information at once (see memcontrol.c) */
>>   #define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE))
>>   #define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON))
>> @@ -300,6 +296,14 @@ struct zone_reclaim_stat {
>>   	unsigned long		recent_scanned[2];
>>   };
>>
>> +struct lruvec {
>> +	struct list_head	pages_lru[NR_LRU_LISTS];
>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>> +	struct zone		*zone;
>> +	struct pglist_data	*node;
>> +#endif
>
> I don't think this #ifdef is very good ....this adds other #ifdefs in other headers.
> How bad if we remove this #ifdef and use ->zone, ->pgdat in lruvec_zone, lruvec_page
> always ?

This adds one dereference in lruvec_zone() if memcg is disabled in config.
We can remove ifdef from declaration and initialization, but keep optimized variant of lruvec_zone()

>
> There may be concerns to fit lruvec at el into cache-line...but this set will add
> a (big) hash here later..
>
> I'm sorry if you're asked to add this #ifdef in v1 or v2.
>
> Thanks,
> -Kame
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 07/21] mm: add lruvec->pages_count
  2012-02-28  0:35   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:16     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:52:08 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Move lru pages counter from mem_cgroup_per_zone->count[] to lruvec->pages_count[]
>>
>> Account pages in all lruvecs, incuding root,
>> this isn't a huge overhead, but it greatly simplifies all code.
>>
>> Redundant page_lruvec() calls will be optimized in further patches.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> Hmm, I like this but..a question below.
>
>> ---
>>   include/linux/memcontrol.h |   29 --------------
>>   include/linux/mm_inline.h  |   15 +++++--
>>   include/linux/mmzone.h     |    1
>>   mm/memcontrol.c            |   93 +-------------------------------------------
>>   mm/swap.c                  |    7 +--
>>   mm/vmscan.c                |   25 +++++++++---
>>   6 files changed, 34 insertions(+), 136 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 4822d53..b9d555b 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -63,12 +63,6 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>>   					gfp_t gfp_mask);
>>
>>   struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
>> -struct lruvec *mem_cgroup_lru_add_list(struct zone *, struct page *,
>> -				       enum lru_list);
>> -void mem_cgroup_lru_del_list(struct page *, enum lru_list);
>> -void mem_cgroup_lru_del(struct page *);
>> -struct lruvec *mem_cgroup_lru_move_lists(struct zone *, struct page *,
>> -					 enum lru_list, enum lru_list);
>>
>>   /* For coalescing uncharge for reducing memcg' overhead*/
>>   extern void mem_cgroup_uncharge_start(void);
>> @@ -212,29 +206,6 @@ static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
>>   	return&zone->lruvec;
>>   }
>>
>> -static inline struct lruvec *mem_cgroup_lru_add_list(struct zone *zone,
>> -						     struct page *page,
>> -						     enum lru_list lru)
>> -{
>> -	return&zone->lruvec;
>> -}
>> -
>> -static inline void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
>> -{
>> -}
>> -
>> -static inline void mem_cgroup_lru_del(struct page *page)
>> -{
>> -}
>> -
>> -static inline struct lruvec *mem_cgroup_lru_move_lists(struct zone *zone,
>> -						       struct page *page,
>> -						       enum lru_list from,
>> -						       enum lru_list to)
>> -{
>> -	return&zone->lruvec;
>> -}
>> -
>>   static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>>   {
>>   	return NULL;
>> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
>> index 8415596..daa3d15 100644
>> --- a/include/linux/mm_inline.h
>> +++ b/include/linux/mm_inline.h
>> @@ -24,19 +24,24 @@ static inline int page_is_file_cache(struct page *page)
>>   static inline void
>>   add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
>>   {
>> -	struct lruvec *lruvec;
>> +	struct lruvec *lruvec = page_lruvec(page);
>> +	int numpages = hpage_nr_pages(page);
>>
>> -	lruvec = mem_cgroup_lru_add_list(zone, page, lru);
>>   	list_add(&page->lru,&lruvec->pages_lru[lru]);
>> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, hpage_nr_pages(page));
>> +	lruvec->pages_count[lru] += numpages;
>> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, numpages);
>>   }
>>
>>   static inline void
>>   del_page_from_lru_list(struct zone *zone, struct page *page, enum lru_list lru)
>>   {
>> -	mem_cgroup_lru_del_list(page, lru);
>> +	struct lruvec *lruvec = page_lruvec(page);
>> +	int numpages = hpage_nr_pages(page);
>> +
>>   	list_del(&page->lru);
>> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -hpage_nr_pages(page));
>> +	lruvec->pages_count[lru] -= numpages;
>> +	VM_BUG_ON((long)lruvec->pages_count[lru]<  0);
>> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, -numpages);
>>   }
>>
>>   /**
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index be8873a..69b0f31 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -298,6 +298,7 @@ struct zone_reclaim_stat {
>>
>>   struct lruvec {
>>   	struct list_head	pages_lru[NR_LRU_LISTS];
>> +	unsigned long		pages_count[NR_LRU_LISTS];
>
> In this time, you don't put the objects under #ifdef...why ?

It will make the code much uglier and does not speed up it at all.

>
> How do you handle duplication "the number of pages in LRU" of zone->vm_stat and this ?

I don't think this is totally bad, vmstat usually has per-cpu drift, these numbers will be exact.

>
> Thanks,
> -Kame
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 14/21] mm: introduce lruvec locking primitives
  2012-02-28  0:56   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:23     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:52:47 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> This is initial preparation for lru_lock splitting.
>>
>> This locking primites designed to hide splitted nature of lru_lock
>> and to avoid overhead for non-splitted lru_lock in non-memcg case.
>>
>> * Lock via lruvec reference
>>
>> lock_lruvec(lruvec, flags)
>> lock_lruvec_irq(lruvec)
>>
>> * Lock via page reference
>>
>> lock_page_lruvec(page, flags)
>> lock_page_lruvec_irq(page)
>> relock_page_lruvec(lruvec, page, flags)
>> relock_page_lruvec_irq(lruvec, page)
>> __relock_page_lruvec(lruvec, page) ( lruvec != NULL, page in same zone )
>>
>> They always returns pointer to some locked lruvec, page anyway can be
>> not in lru, PageLRU() sign is stable while we hold returned lruvec lock.
>> Caller must guarantee page to lruvec reference validity.
>>
>> * Lock via page, without stable page reference
>>
>> __lock_page_lruvec_irq(&lruvec, page)
>>
>> It returns true of lruvec succesfully locked and PageLRU is set.
>> Initial lruvec can be NULL. Consequent calls must be in the same zone.
>>
>> * Unlock
>>
>> unlock_lruvec(lruvec, flags)
>> unlock_lruvec_irq(lruvec)
>>
>> * Wait
>>
>> wait_lruvec_unlock(lruvec)
>> Wait for lruvec unlock, caller must have stable reference to lruvec.
>>
>> __wait_lruvec_unlock(lruvec)
>> Wait for lruvec unlock before locking other lrulock for same page,
>> nothing if there only one possible lruvec per page.
>> Used at page-to-lruvec reference switching to stabilize PageLRU sign.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> O.K. I like this.
>
> Acked-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Hmm....Could you add a comment in memcg part ? (see below)
>
>
>
>> ---
>>   mm/huge_memory.c |    8 +-
>>   mm/internal.h    |  176 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   mm/memcontrol.c  |   14 ++--
>>   mm/swap.c        |   58 ++++++------------
>>   mm/vmscan.c      |   77 ++++++++++--------------
>>   5 files changed, 237 insertions(+), 96 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 09e7069..74996b8 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1228,13 +1228,11 @@ static int __split_huge_page_splitting(struct page *page,
>>   static void __split_huge_page_refcount(struct page *page)
>>   {
>>   	int i;
>> -	struct zone *zone = page_zone(page);
>>   	struct lruvec *lruvec;
>>   	int tail_count = 0;
>>
>>   	/* prevent PageLRU to go away from under us, and freeze lru stats */
>> -	spin_lock_irq(&zone->lru_lock);
>> -	lruvec = page_lruvec(page);
>> +	lruvec = lock_page_lruvec_irq(page);
>>   	compound_lock(page);
>>   	/* complete memcg works before add pages to LRU */
>>   	mem_cgroup_split_huge_fixup(page);
>> @@ -1316,11 +1314,11 @@ static void __split_huge_page_refcount(struct page *page)
>>   	BUG_ON(atomic_read(&page->_count)<= 0);
>>
>>   	__dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
>> -	__mod_zone_page_state(zone, NR_ANON_PAGES, HPAGE_PMD_NR);
>> +	__mod_zone_page_state(lruvec_zone(lruvec), NR_ANON_PAGES, HPAGE_PMD_NR);
>>
>>   	ClearPageCompound(page);
>>   	compound_unlock(page);
>> -	spin_unlock_irq(&zone->lru_lock);
>> +	unlock_lruvec_irq(lruvec);
>>
>>   	for (i = 1; i<  HPAGE_PMD_NR; i++) {
>>   		struct page *page_tail = page + i;
>> diff --git a/mm/internal.h b/mm/internal.h
>> index ef49dbf..9454752 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -13,6 +13,182 @@
>>
>>   #include<linux/mm.h>
>>
>> +static inline void lock_lruvec(struct lruvec *lruvec, unsigned long *flags)
>> +{
>> +	spin_lock_irqsave(&lruvec_zone(lruvec)->lru_lock, *flags);
>> +}
>> +
>> +static inline void lock_lruvec_irq(struct lruvec *lruvec)
>> +{
>> +	spin_lock_irq(&lruvec_zone(lruvec)->lru_lock);
>> +}
>> +
>> +static inline void unlock_lruvec(struct lruvec *lruvec, unsigned long *flags)
>> +{
>> +	spin_unlock_irqrestore(&lruvec_zone(lruvec)->lru_lock, *flags);
>> +}
>> +
>> +static inline void unlock_lruvec_irq(struct lruvec *lruvec)
>> +{
>> +	spin_unlock_irq(&lruvec_zone(lruvec)->lru_lock);
>> +}
>> +
>> +static inline void wait_lruvec_unlock(struct lruvec *lruvec)
>> +{
>> +	spin_unlock_wait(&lruvec_zone(lruvec)->lru_lock);
>> +}
>> +
>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>> +
>> +/* Dynamic page to lruvec mapping */
>> +
>> +/* Lock other lruvec for other page in the same zone */
>> +static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
>> +						  struct page *page)
>> +{
>> +	/* Currenyly only one lru_lock per-zone */
>> +	return page_lruvec(page);
>> +}
>> +
>> +static inline struct lruvec *relock_page_lruvec_irq(struct lruvec *lruvec,
>> +						    struct page *page)
>> +{
>> +	struct zone *zone = page_zone(page);
>> +
>> +	if (!lruvec) {
>> +		spin_lock_irq(&zone->lru_lock);
>> +	} else if (zone != lruvec_zone(lruvec)) {
>> +		unlock_lruvec_irq(lruvec);
>> +		spin_lock_irq(&zone->lru_lock);
>> +	}
>> +	return page_lruvec(page);
>> +}
>
> Could you add comments/caution to the caller
>
>   - !PageLRU(page) case ?
>   - Can the caller assume page_lruvec(page) == lruvec ? If no, which lru_vec is locked ?

Yes, caller can assume page_lruvec(page) == lruvec. And PageLRU() is stable,
it means it stays true or false while this lruvec is locked.

>
> etc...
>
>
>> +
>> +static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
>> +						struct page *page,
>> +						unsigned long *flags)
>> +{
>> +	struct zone *zone = page_zone(page);
>> +
>> +	if (!lruvec) {
>> +		spin_lock_irqsave(&zone->lru_lock, *flags);
>> +	} else if (zone != lruvec_zone(lruvec)) {
>> +		unlock_lruvec(lruvec, flags);
>> +		spin_lock_irqsave(&zone->lru_lock, *flags);
>> +	}
>> +	return page_lruvec(page);
>> +}
>
>
> Same here.
>
> Thanks,
> -Kame
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 15/21] mm: handle lruvec relocks on lumpy reclaim
  2012-02-28  1:01   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:25     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:52:52 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Prepare for lock splitting in lumly reclaim logic.
>> Now move_active_pages_to_lru() and putback_inactive_pages()
>> can put pages into different lruvecs.
>>
>> * relock book before SetPageLRU()
>
> lruvec ?

yeah, this came from v1

>
>> * update reclaim_stat pointer after relocks
>> * return currently locked lruvec
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> ---
>>   mm/vmscan.c |   45 +++++++++++++++++++++++++++++++++------------
>>   1 files changed, 33 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a3941d1..6eeeb4b 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1114,6 +1114,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>   		unsigned long *nr_scanned, struct scan_control *sc,
>>   		isolate_mode_t mode, int active, int file)
>>   {
>> +	struct lruvec *cursor_lruvec = lruvec;
>>   	struct list_head *src;
>>   	unsigned long nr_taken = 0;
>>   	unsigned long nr_lumpy_taken = 0;
>> @@ -1197,14 +1198,17 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>   			    !PageSwapCache(cursor_page))
>>   				break;
>>
>> +			/* Switch cursor_lruvec lock for lumpy isolate */
>> +			if (!__lock_page_lruvec_irq(&cursor_lruvec,
>> +						    cursor_page))
>> +				continue;
>> +
>>   			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
>>   				unsigned int isolated_pages;
>> -				struct lruvec *cursor_lruvec;
>>   				int cursor_lru = page_lru(cursor_page);
>>
>>   				list_move(&cursor_page->lru, dst);
>>   				isolated_pages = hpage_nr_pages(cursor_page);
>> -				cursor_lruvec = page_lruvec(cursor_page);
>>   				cursor_lruvec->pages_count[cursor_lru] -=
>>   								isolated_pages;
>>   				VM_BUG_ON((long)cursor_lruvec->
>> @@ -1235,6 +1239,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>   			}
>>   		}
>>
>> +		/* Restore original lruvec lock */
>> +		cursor_lruvec = __relock_page_lruvec(cursor_lruvec, page);
>> +
>>   		/* If we break out of the loop above, lumpy reclaim failed */
>>   		if (pfn<  end_pfn)
>>   			nr_lumpy_failed++;
>> @@ -1325,7 +1332,10 @@ static int too_many_isolated(struct zone *zone, int file,
>>   	return isolated>  inactive;
>>   }
>>
>> -static noinline_for_stack void
>> +/*
>> + * Returns currently locked lruvec
>> + */
>> +static noinline_for_stack struct lruvec *
>>   putback_inactive_pages(struct lruvec *lruvec,
>>   		       struct list_head *page_list)
>>   {
>> @@ -1347,10 +1357,13 @@ putback_inactive_pages(struct lruvec *lruvec,
>>   			lock_lruvec_irq(lruvec);
>>   			continue;
>>   		}
>> +
>> +		lruvec = __relock_page_lruvec(lruvec, page);
>> +		reclaim_stat =&lruvec->reclaim_stat;
>> +
>>   		SetPageLRU(page);
>>   		lru = page_lru(page);
>>
>> -		lruvec = page_lruvec(page);
>>   		add_page_to_lru_list(lruvec, page, lru);
>>   		if (is_active_lru(lru)) {
>>   			int file = is_file_lru(lru);
>> @@ -1375,6 +1388,8 @@ putback_inactive_pages(struct lruvec *lruvec,
>>   	 * To save our caller's stack, now use input list for pages to free.
>>   	 */
>>   	list_splice(&pages_to_free, page_list);
>> +
>> +	return lruvec;
>>   }
>>
>>   static noinline_for_stack void
>> @@ -1544,7 +1559,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>>   		__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
>>   	__count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);
>>
>> -	putback_inactive_pages(lruvec,&page_list);
>> +	lruvec = putback_inactive_pages(lruvec,&page_list);
>>
>>   	__mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
>>   	__mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
>> @@ -1603,12 +1618,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>>    *
>>    * The downside is that we have to touch page->_count against each page.
>>    * But we had to alter page->flags anyway.
>> + *
>> + * Returns currently locked lruvec
>>    */
>>
>> -static void move_active_pages_to_lru(struct lruvec *lruvec,
>> -				     struct list_head *list,
>> -				     struct list_head *pages_to_free,
>> -				     enum lru_list lru)
>> +static struct lruvec *
>> +move_active_pages_to_lru(struct lruvec *lruvec,
>> +			 struct list_head *list,
>> +			 struct list_head *pages_to_free,
>> +			 enum lru_list lru)
>>   {
>>   	unsigned long pgmoved = 0;
>>   	struct page *page;
>> @@ -1630,10 +1648,11 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
>>
>>   		page = lru_to_page(list);
>>
>> +		lruvec = __relock_page_lruvec(lruvec, page);
>> +
>>   		VM_BUG_ON(PageLRU(page));
>>   		SetPageLRU(page);
>>
>> -		lruvec = page_lruvec(page);
>>   		list_move(&page->lru,&lruvec->pages_lru[lru]);
>>   		numpages = hpage_nr_pages(page);
>>   		lruvec->pages_count[lru] += numpages;
>> @@ -1655,6 +1674,8 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
>>   	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, pgmoved);
>>   	if (!is_active_lru(lru))
>>   		__count_vm_events(PGDEACTIVATE, pgmoved);
>> +
>> +	return lruvec;
>>   }
>>
>>   static void shrink_active_list(unsigned long nr_to_scan,
>> @@ -1744,9 +1765,9 @@ static void shrink_active_list(unsigned long nr_to_scan,
>>   	 */
>>   	reclaim_stat->recent_rotated[file] += nr_rotated;
>>
>> -	move_active_pages_to_lru(lruvec,&l_active,&l_hold,
>> +	lruvec = move_active_pages_to_lru(lruvec,&l_active,&l_hold,
>>   						LRU_ACTIVE + file * LRU_FILE);
>> -	move_active_pages_to_lru(lruvec,&l_inactive,&l_hold,
>> +	lruvec = move_active_pages_to_lru(lruvec,&l_inactive,&l_hold,
>>   						LRU_BASE   + file * LRU_FILE);
>>   	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
>>   	unlock_lruvec_irq(lruvec);
>>
>
> Hmm...could you add comments to each function as
> "The caller should _lock_ lruvec before calling this functions.
>   This function returns a lruvec with _locked_. It may be different from passed one.
>   And The callser should unlock lruvec"

Ok. Documenting is not my best side =)

>
>
> Thanks,
> -Kame
>
>
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 16/21] mm: handle lruvec relocks in compaction
  2012-02-28  1:13   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:31     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:52:56 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Prepare for lru_lock splitting in memory compaction code.
>>
>> * disable irqs in acct_isolated() for __mod_zone_page_state(),
>>    lru_lock isn't required there.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> ---
>>   mm/compaction.c |   30 ++++++++++++++++--------------
>>   1 files changed, 16 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index a976b28..54340e4 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -224,8 +224,10 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
>>   	list_for_each_entry(page,&cc->migratepages, lru)
>>   		count[!!page_is_file_cache(page)]++;
>>
>> +	local_irq_disable();
>>   	__mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
>>   	__mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
>> +	local_irq_enable();
>
> Why we need to disable Irq here ??

__mod_zone_page_state() want this to protect per-cpu counters, maybe preempt_disable() is enough.

>
>
>
>>   }
>>
>>   /* Similar to reclaim, but different enough that they don't share logic */
>> @@ -262,7 +264,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>   	unsigned long nr_scanned = 0, nr_isolated = 0;
>>   	struct list_head *migratelist =&cc->migratepages;
>>   	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
>> -	struct lruvec *lruvec;
>> +	struct lruvec *lruvec = NULL;
>>
>>   	/* Do not scan outside zone boundaries */
>>   	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
>> @@ -294,25 +296,24 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>
>>   	/* Time to isolate some pages for migration */
>>   	cond_resched();
>> -	spin_lock_irq(&zone->lru_lock);
>>   	for (; low_pfn<  end_pfn; low_pfn++) {
>>   		struct page *page;
>> -		bool locked = true;
>>
>>   		/* give a chance to irqs before checking need_resched() */
>>   		if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
>> -			spin_unlock_irq(&zone->lru_lock);
>> -			locked = false;
>> +			if (lruvec)
>> +				unlock_lruvec_irq(lruvec);
>> +			lruvec = NULL;
>>   		}
>> -		if (need_resched() || spin_is_contended(&zone->lru_lock)) {
>> -			if (locked)
>> -				spin_unlock_irq(&zone->lru_lock);
>> +		if (need_resched() ||
>> +		    (lruvec&&  spin_is_contended(&zone->lru_lock))) {
>> +			if (lruvec)
>> +				unlock_lruvec_irq(lruvec);
>> +			lruvec = NULL;
>>   			cond_resched();
>> -			spin_lock_irq(&zone->lru_lock);
>>   			if (fatal_signal_pending(current))
>>   				break;
>> -		} else if (!locked)
>> -			spin_lock_irq(&zone->lru_lock);
>> +		}
>>
>>   		/*
>>   		 * migrate_pfn does not necessarily start aligned to a
>> @@ -359,7 +360,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>   			continue;
>>   		}
>>
>> -		if (!PageLRU(page))
>> +		if (!__lock_page_lruvec_irq(&lruvec, page))
>>   			continue;
>
> Could you add more comments onto __lock_page_lruvec_irq() ?

Actually there is a very unlikely race with page free-realloc,
(which is fixed in Hugh's patchset, and surprisingly fixed in my old memory controller)
thus this part will be redesigned.

>
> Thanks,
> -Kame
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical
  2012-02-28  0:11   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:31     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:51:46 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Check mm-owner cgroup membership hierarchically.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
>
> Ack. but ... see below.
>
>> ---
>>   include/linux/memcontrol.h |   11 ++---------
>>   mm/memcontrol.c            |   20 ++++++++++++++++++++
>>   2 files changed, 22 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 8c4d74f..4822d53 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -87,15 +87,8 @@ extern struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm);
>>   extern struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg);
>>   extern struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont);
>>
>> -static inline
>> -int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
>> -{
>> -	struct mem_cgroup *memcg;
>> -	rcu_read_lock();
>> -	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
>> -	rcu_read_unlock();
>> -	return cgroup == memcg;
>> -}
>> +extern int mm_match_cgroup(const struct mm_struct *mm,
>> +			   const struct mem_cgroup *cgroup);
>>
>>   extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index b8039d2..77f5d48 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -821,6 +821,26 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
>>   				struct mem_cgroup, css);
>>   }
>>
>> +/**
>> + * mm_match_cgroup - cgroup hierarchy mm membership test
>> + * @mm		mm_struct to test
>> + * @cgroup	target cgroup
>> + *
>> + * Returns true if mm belong this cgroup or any its child in hierarchy
>
> belongs to ?
>
>> + */
>> +int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
>> +{
>
> Please use "memcg" for representing "memory cgroup" (other function's arguments uses "memcg")
>
>> +	struct mem_cgroup *memcg;
>
> So, rename this as *cur_memcg or some.
>
>> +
>> +	rcu_read_lock();
>> +	memcg = mem_cgroup_from_task(rcu_dereference((mm)->owner));
>> +	while (memcg != cgroup&&  memcg&&  memcg->use_hierarchy)
>> +		memcg = parent_mem_cgroup(memcg);
>
> IIUC, parent_mem_cgroup() checks mem->res.parent. mem->res.parent is set only when
> parent->use_hierarchy == true. Then,
>
> 	while (memcg != cgroup)
> 		memcg = parent_mem_cgroup(memcg);
>
> will be enough.

Here will be mem_cgroup_same_or_subtree(), see reply from Johannes Weiner.

>
> Thanks,
> -Kame
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 20/21] mm: split zone->lru_lock
  2012-02-28  1:49   ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:39     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:53:23 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Looks like all ready for splitting zone->lru_lock into per-lruvec pieces.
>>
>> lruvec locking loop protected with rcu, actually there is irq-disabling instead
>> of rcu_read_lock(). Memory controller already releases its lru-vectors after
>> syncronize_rcu() in cgroup_diput(). Probably it should be replaced with synchronize_sched()
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> Do we need rcu_read_lock() even if we check isolated pages at pre_destroy() ?
> If pre_destroy() ends, pages under a memcg being destroyed were moved to other
> cgroup while it's usolated.
>
> So,
>   - PageLRU(page) guarantees lruvec is valid.
>   - if !PageLRU(page), the caller of lru_lock should know what it does.
>     Once isolated, pre_destroy() never ends and page_lruvec(page) is always stable.

This will be racy, because lruvec can be changed and released between checking PageLRU() and page_lruvec() or
between page_lruvec() and spin_lock(), so we need to protect whole operation with single locking interval.

>
> Thanks,
> -Kame
>
>> ---
>>   include/linux/mmzone.h |    3 +-
>>   mm/compaction.c        |    2 +
>>   mm/internal.h          |   66 +++++++++++++++++++++++++-----------------------
>>   mm/page_alloc.c        |    2 +
>>   mm/swap.c              |    2 +
>>   5 files changed, 40 insertions(+), 35 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 2e3a298..9880150 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -304,6 +304,8 @@ struct zone_reclaim_stat {
>>   };
>>
>>   struct lruvec {
>> +	spinlock_t		lru_lock;
>> +
>>   	struct list_head	pages_lru[NR_LRU_LISTS];
>>   	unsigned long		pages_count[NR_LRU_COUNTERS];
>>
>> @@ -386,7 +388,6 @@ struct zone {
>>   	ZONE_PADDING(_pad1_)
>>
>>   	/* Fields commonly accessed by the page reclaim scanner */
>> -	spinlock_t		lru_lock;
>>   	struct lruvec		lruvec;
>>
>>   	unsigned long		pages_scanned;	   /* since last reclaim */
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index fa74cbe..8661bb58 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -306,7 +306,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>   			lruvec = NULL;
>>   		}
>>   		if (need_resched() ||
>> -		    (lruvec&&  spin_is_contended(&zone->lru_lock))) {
>> +		    (lruvec&&  spin_is_contended(&lruvec->lru_lock))) {
>>   			if (lruvec)
>>   				unlock_lruvec_irq(lruvec);
>>   			lruvec = NULL;
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 6dd2e70..9a9fd53 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -15,27 +15,27 @@
>>
>>   static inline void lock_lruvec(struct lruvec *lruvec, unsigned long *flags)
>>   {
>> -	spin_lock_irqsave(&lruvec_zone(lruvec)->lru_lock, *flags);
>> +	spin_lock_irqsave(&lruvec->lru_lock, *flags);
>>   }
>>
>>   static inline void lock_lruvec_irq(struct lruvec *lruvec)
>>   {
>> -	spin_lock_irq(&lruvec_zone(lruvec)->lru_lock);
>> +	spin_lock_irq(&lruvec->lru_lock);
>>   }
>>
>>   static inline void unlock_lruvec(struct lruvec *lruvec, unsigned long *flags)
>>   {
>> -	spin_unlock_irqrestore(&lruvec_zone(lruvec)->lru_lock, *flags);
>> +	spin_unlock_irqrestore(&lruvec->lru_lock, *flags);
>>   }
>>
>>   static inline void unlock_lruvec_irq(struct lruvec *lruvec)
>>   {
>> -	spin_unlock_irq(&lruvec_zone(lruvec)->lru_lock);
>> +	spin_unlock_irq(&lruvec->lru_lock);
>>   }
>>
>>   static inline void wait_lruvec_unlock(struct lruvec *lruvec)
>>   {
>> -	spin_unlock_wait(&lruvec_zone(lruvec)->lru_lock);
>> +	spin_unlock_wait(&lruvec->lru_lock);
>>   }
>>
>>   #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>> @@ -46,37 +46,39 @@ static inline void wait_lruvec_unlock(struct lruvec *lruvec)
>>   static inline struct lruvec *__relock_page_lruvec(struct lruvec *locked_lruvec,
>>   						  struct page *page)
>>   {
>> -	/* Currenyly only one lru_lock per-zone */
>> -	return page_lruvec(page);
>> +	struct lruvec *lruvec;
>> +
>> +	do {
>> +		lruvec = page_lruvec(page);
>> +		if (likely(lruvec == locked_lruvec))
>> +			return lruvec;
>> +		spin_unlock(&locked_lruvec->lru_lock);
>> +		spin_lock(&lruvec->lru_lock);
>> +		locked_lruvec = lruvec;
>> +	} while (1);
>>   }
>>
>>   static inline struct lruvec *relock_page_lruvec_irq(struct lruvec *lruvec,
>>   						    struct page *page)
>>   {
>> -	struct zone *zone = page_zone(page);
>> -
>>   	if (!lruvec) {
>> -		spin_lock_irq(&zone->lru_lock);
>> -	} else if (zone != lruvec_zone(lruvec)) {
>> -		unlock_lruvec_irq(lruvec);
>> -		spin_lock_irq(&zone->lru_lock);
>> +		local_irq_disable();
>> +		lruvec = page_lruvec(page);
>> +		spin_lock(&lruvec->lru_lock);
>>   	}
>> -	return page_lruvec(page);
>> +	return __relock_page_lruvec(lruvec, page);
>>   }
>>
>>   static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
>>   						struct page *page,
>>   						unsigned long *flags)
>>   {
>> -	struct zone *zone = page_zone(page);
>> -
>>   	if (!lruvec) {
>> -		spin_lock_irqsave(&zone->lru_lock, *flags);
>> -	} else if (zone != lruvec_zone(lruvec)) {
>> -		unlock_lruvec(lruvec, flags);
>> -		spin_lock_irqsave(&zone->lru_lock, *flags);
>> +		local_irq_save(*flags);
>> +		lruvec = page_lruvec(page);
>> +		spin_lock(&lruvec->lru_lock);
>>   	}
>> -	return page_lruvec(page);
>> +	return __relock_page_lruvec(lruvec, page);
>>   }
>>
>>   /*
>> @@ -87,22 +89,24 @@ static inline struct lruvec *relock_page_lruvec(struct lruvec *lruvec,
>>   static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
>>   					  struct page *page)
>>   {
>> -	struct zone *zone;
>>   	bool ret = false;
>>
>> +	rcu_read_lock();
>> +	/*
>> +	 * If we see there PageLRU(), it means page has valid lruvec link.
>> +	 * We need protect whole operation with single rcu-interval, otherwise
>> +	 * lruvec which hold this LRU sign can run out before we secure it.
>> +	 */
>>   	if (PageLRU(page)) {
>>   		if (!*lruvec) {
>> -			zone = page_zone(page);
>> -			spin_lock_irq(&zone->lru_lock);
>> -		} else
>> -			zone = lruvec_zone(*lruvec);
>> -
>> -		if (PageLRU(page)) {
>>   			*lruvec = page_lruvec(page);
>> +			lock_lruvec_irq(*lruvec);
>> +		}
>> +		*lruvec = __relock_page_lruvec(*lruvec, page);
>> +		if (PageLRU(page))
>>   			ret = true;
>> -		} else
>> -			*lruvec =&zone->lruvec;
>>   	}
>> +	rcu_read_unlock();
>>
>>   	return ret;
>>   }
>> @@ -110,7 +114,7 @@ static inline bool __lock_page_lruvec_irq(struct lruvec **lruvec,
>>   /* Wait for lruvec unlock before locking other lruvec for the same page */
>>   static inline void __wait_lruvec_unlock(struct lruvec *lruvec)
>>   {
>> -	/* Currently only one lru_lock per-zone */
>> +	wait_lruvec_unlock(lruvec);
>>   }
>>
>>   #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ab42446..beadcc9 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4294,6 +4294,7 @@ void init_zone_lruvec(struct zone *zone, struct lruvec *lruvec)
>>   	enum lru_list lru;
>>
>>   	memset(lruvec, 0, sizeof(struct lruvec));
>> +	spin_lock_init(&lruvec->lru_lock);
>>   	for_each_lru(lru)
>>   		INIT_LIST_HEAD(&lruvec->pages_lru[lru]);
>>   #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>> @@ -4369,7 +4370,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>>   #endif
>>   		zone->name = zone_names[j];
>>   		spin_lock_init(&zone->lock);
>> -		spin_lock_init(&zone->lru_lock);
>>   		zone_seqlock_init(zone);
>>   		zone->zone_pgdat = pgdat;
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 998c71c..8156181 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -700,7 +700,7 @@ void lru_add_page_tail(struct lruvec *lruvec,
>>   	VM_BUG_ON(!PageHead(page));
>>   	VM_BUG_ON(PageCompound(page_tail));
>>   	VM_BUG_ON(PageLRU(page_tail));
>> -	VM_BUG_ON(NR_CPUS != 1&&  !spin_is_locked(&lruvec_zone(lruvec)->lru_lock));
>> +	VM_BUG_ON(NR_CPUS != 1&&  !spin_is_locked(&lruvec->lru_lock));
>>
>>   	SetPageLRU(page_tail);
>>
>>
>>
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 00/21] mm: lru_lock splitting
  2012-02-28  1:52 ` KAMEZAWA Hiroyuki
@ 2012-02-28  6:49   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 65+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-28  6:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-kernel, linux-mm, Johannes Weiner,
	Andrew Morton, Andi Kleen

KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Feb 2012 17:51:36 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> v3 changes:
>> * inactive-ratio reworked again, now it always calculated from from scratch
>> * hierarchical pte reference bits filter in memory-cgroup reclaimer
>> * fixed two bugs in locking, found by Hugh Dickins
>> * locking functions slightly simplified
>> * new patch for isolated pages accounting
>> * new patch with lru interleaving
>>
>> This patchset is based on next-20120210
>>
>> git: https://github.com/koct9i/linux/commits/lruvec-v3
>>
> I'm sorry I can't have enough review time in these days but the whole series
> seems good to me.
>
>
> BTW, how about trying to merge patch 1/21 ->  13or14/21 first ?
> This series adds many changes to various place under /mm. So, step-by-step
> merging will be better I think.
> (Just says as this because I tend to split a long series of patch to
>   small sets of patches and merge them one by one to reduce my own maintainance cost.)

I agree, the amount of required cleanups has exceeded the limit.
I'll send them as separate patchset, maybe tomorrow.

>
> At final lock splitting, performance number should be in changelog.

For sure

>
> Thanks,
> -Kame
>
>> ---
>>
>> Konstantin Khlebnikov (21):
>>        memcg: unify inactive_ratio calculation
>>        memcg: make mm_match_cgroup() hirarchical
>>        memcg: fix page_referencies cgroup filter on global reclaim
>>        memcg: use vm_swappiness from target memory cgroup
>>        mm: rename lruvec->lists into lruvec->pages_lru
>>        mm: lruvec linking functions
>>        mm: add lruvec->pages_count
>>        mm: unify inactive_list_is_low()
>>        mm: add lruvec->reclaim_stat
>>        mm: kill struct mem_cgroup_zone
>>        mm: move page-to-lruvec translation upper
>>        mm: push lruvec into update_page_reclaim_stat()
>>        mm: push lruvecs from pagevec_lru_move_fn() to iterator
>>        mm: introduce lruvec locking primitives
>>        mm: handle lruvec relocks on lumpy reclaim
>>        mm: handle lruvec relocks in compaction
>>        mm: handle lruvec relock in memory controller
>>        mm: add to lruvec isolated pages counters
>>        memcg: check lru vectors emptiness in pre-destroy
>>        mm: split zone->lru_lock
>>        mm: zone lru vectors interleaving
>>
>>
>>   include/linux/huge_mm.h    |    3
>>   include/linux/memcontrol.h |   75 ------
>>   include/linux/mm.h         |   66 +++++
>>   include/linux/mm_inline.h  |   19 +-
>>   include/linux/mmzone.h     |   39 ++-
>>   include/linux/swap.h       |    6
>>   mm/Kconfig                 |   16 +
>>   mm/compaction.c            |   31 +--
>>   mm/huge_memory.c           |   14 +
>>   mm/internal.h              |  204 +++++++++++++++++
>>   mm/ksm.c                   |    2
>>   mm/memcontrol.c            |  343 +++++++++++-----------------
>>   mm/migrate.c               |    2
>>   mm/page_alloc.c            |   70 +-----
>>   mm/rmap.c                  |    2
>>   mm/swap.c                  |  217 ++++++++++--------
>>   mm/vmscan.c                |  534 ++++++++++++++++++++++++--------------------
>>   mm/vmstat.c                |    6
>>   18 files changed, 932 insertions(+), 717 deletions(-)
>>
>> --
>> Signature
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
>>
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2012-02-28  6:49 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-23 13:51 [PATCH v3 00/21] mm: lru_lock splitting Konstantin Khlebnikov
2012-02-23 13:51 ` [PATCH v3 01/21] memcg: unify inactive_ratio calculation Konstantin Khlebnikov
2012-02-28  0:05   ` KAMEZAWA Hiroyuki
2012-02-23 13:51 ` [PATCH v3 02/21] memcg: make mm_match_cgroup() hirarchical Konstantin Khlebnikov
2012-02-23 18:03   ` Johannes Weiner
2012-02-23 19:46     ` Konstantin Khlebnikov
2012-02-23 22:06       ` Johannes Weiner
2012-02-28  0:11   ` KAMEZAWA Hiroyuki
2012-02-28  6:31     ` Konstantin Khlebnikov
2012-02-23 13:51 ` [PATCH v3 03/21] memcg: fix page_referencies cgroup filter on global reclaim Konstantin Khlebnikov
2012-02-28  0:13   ` KAMEZAWA Hiroyuki
2012-02-23 13:51 ` [PATCH v3 04/21] memcg: use vm_swappiness from target memory cgroup Konstantin Khlebnikov
2012-02-28  0:15   ` KAMEZAWA Hiroyuki
2012-02-23 13:52 ` [PATCH v3 05/21] mm: rename lruvec->lists into lruvec->pages_lru Konstantin Khlebnikov
2012-02-28  0:20   ` KAMEZAWA Hiroyuki
2012-02-28  6:04     ` Konstantin Khlebnikov
2012-02-23 13:52 ` [PATCH v3 06/21] mm: lruvec linking functions Konstantin Khlebnikov
2012-02-28  0:27   ` KAMEZAWA Hiroyuki
2012-02-28  6:09     ` Konstantin Khlebnikov
2012-02-23 13:52 ` [PATCH v3 07/21] mm: add lruvec->pages_count Konstantin Khlebnikov
2012-02-28  0:35   ` KAMEZAWA Hiroyuki
2012-02-28  6:16     ` Konstantin Khlebnikov
2012-02-23 13:52 ` [PATCH v3 08/21] mm: unify inactive_list_is_low() Konstantin Khlebnikov
2012-02-28  0:36   ` KAMEZAWA Hiroyuki
2012-02-23 13:52 ` [PATCH v3 09/21] mm: add lruvec->reclaim_stat Konstantin Khlebnikov
2012-02-28  0:38   ` KAMEZAWA Hiroyuki
2012-02-23 13:52 ` [PATCH v3 10/21] mm: kill struct mem_cgroup_zone Konstantin Khlebnikov
2012-02-28  0:41   ` KAMEZAWA Hiroyuki
2012-02-23 13:52 ` [PATCH v3 11/21] mm: move page-to-lruvec translation upper Konstantin Khlebnikov
2012-02-28  0:42   ` KAMEZAWA Hiroyuki
2012-02-23 13:52 ` [PATCH v3 12/21] mm: push lruvec into update_page_reclaim_stat() Konstantin Khlebnikov
2012-02-28  0:44   ` KAMEZAWA Hiroyuki
2012-02-23 13:52 ` [PATCH v3 13/21] mm: push lruvecs from pagevec_lru_move_fn() to iterator Konstantin Khlebnikov
2012-02-28  0:45   ` KAMEZAWA Hiroyuki
2012-02-23 13:52 ` [PATCH v3 14/21] mm: introduce lruvec locking primitives Konstantin Khlebnikov
2012-02-28  0:56   ` KAMEZAWA Hiroyuki
2012-02-28  6:23     ` Konstantin Khlebnikov
2012-02-23 13:52 ` [PATCH v3 15/21] mm: handle lruvec relocks on lumpy reclaim Konstantin Khlebnikov
2012-02-28  1:01   ` KAMEZAWA Hiroyuki
2012-02-28  6:25     ` Konstantin Khlebnikov
2012-02-23 13:52 ` [PATCH v3 16/21] mm: handle lruvec relocks in compaction Konstantin Khlebnikov
2012-02-28  1:13   ` KAMEZAWA Hiroyuki
2012-02-28  6:31     ` Konstantin Khlebnikov
2012-02-23 13:53 ` [PATCH v3 17/21] mm: handle lruvec relock in memory controller Konstantin Khlebnikov
2012-02-28  1:22   ` KAMEZAWA Hiroyuki
2012-02-23 13:53 ` [PATCH v3 18/21] mm: add to lruvec isolated pages counters Konstantin Khlebnikov
2012-02-24  5:32   ` Konstantin Khlebnikov
2012-02-28  1:38   ` KAMEZAWA Hiroyuki
2012-02-23 13:53 ` [PATCH v3 19/21] memcg: check lru vectors emptiness in pre-destroy Konstantin Khlebnikov
2012-02-28  1:43   ` KAMEZAWA Hiroyuki
2012-02-23 13:53 ` [PATCH v3 20/21] mm: split zone->lru_lock Konstantin Khlebnikov
2012-02-28  1:49   ` KAMEZAWA Hiroyuki
2012-02-28  6:39     ` Konstantin Khlebnikov
2012-02-23 13:53 ` [PATCH v3 21/21] mm: zone lru vectors interleaving Konstantin Khlebnikov
2012-02-23 14:44   ` Hillf Danton
2012-02-23 16:21   ` Andi Kleen
2012-02-23 18:48     ` [PATCH 1/2] mm: configure lruvec split by boot options Konstantin Khlebnikov
2012-02-23 18:48     ` [PATCH 2/2] mm: show zone lruvec state in /proc/zoneinfo Konstantin Khlebnikov
2012-02-25  0:05 ` [PATCH v3 00/21] mm: lru_lock splitting Tim Chen
2012-02-25  5:34   ` Konstantin Khlebnikov
2012-02-25  2:15 ` KAMEZAWA Hiroyuki
2012-02-25  5:31   ` Konstantin Khlebnikov
2012-02-26 23:54     ` KAMEZAWA Hiroyuki
2012-02-28  1:52 ` KAMEZAWA Hiroyuki
2012-02-28  6:49   ` Konstantin Khlebnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).