linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Follow-up fixes to node-lru series v2
@ 2016-07-15 13:09 Mel Gorman
  2016-07-15 13:09 ` [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix Mel Gorman
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Mel Gorman @ 2016-07-15 13:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Vlastimil Babka, Linux-MM, LKML,
	Mel Gorman

This is another round of fixups to the node-lru series. The most
important patch is the last one which prevents a warning in memcg
from being triggered.

 include/linux/memcontrol.h |  2 +-
 include/linux/mm_inline.h  |  5 ++---
 mm/memcontrol.c            |  5 +----
 mm/page_alloc.c            |  6 +++---
 mm/swap.c                  | 20 ++++++++++----------
 mm/vmscan.c                | 43 +++++++++++++++++++++++++++++++++++--------
 6 files changed, 52 insertions(+), 29 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix
  2016-07-15 13:09 [PATCH 0/5] Follow-up fixes to node-lru series v2 Mel Gorman
@ 2016-07-15 13:09 ` Mel Gorman
  2016-07-15 15:42   ` Minchan Kim
  2016-07-18 16:14   ` Johannes Weiner
  2016-07-15 13:09 ` [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix Mel Gorman
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Mel Gorman @ 2016-07-15 13:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Vlastimil Babka, Linux-MM, LKML,
	Mel Gorman

The patch "mm, vmscan: make shrink_node decisions more node-centric"
checks whether compaction is suitable on empty nodes. This is expensive
rather than wrong but is worth fixing.

This is a fix to the mmotm patch
mm-vmscan-make-shrink_node-decisions-more-node-centric.patch

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/vmscan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 864a3b1e5f8b..4fdb9e419588 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2408,6 +2408,8 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 	/* If compaction would go ahead or the allocation would succeed, stop */
 	for (z = 0; z <= sc->reclaim_idx; z++) {
 		struct zone *zone = &pgdat->node_zones[z];
+		if (!populated_zone(zone))
+			continue;
 
 		switch (compaction_suitable(zone, sc->order, 0, sc->reclaim_idx)) {
 		case COMPACT_PARTIAL:
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix
  2016-07-15 13:09 [PATCH 0/5] Follow-up fixes to node-lru series v2 Mel Gorman
  2016-07-15 13:09 ` [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix Mel Gorman
@ 2016-07-15 13:09 ` Mel Gorman
  2016-07-15 15:50   ` Minchan Kim
  2016-07-18 16:15   ` Johannes Weiner
  2016-07-15 13:09 ` [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change Mel Gorman
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Mel Gorman @ 2016-07-15 13:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Vlastimil Babka, Linux-MM, LKML,
	Mel Gorman

As pointed out by Vlastimil, there is a redundant check in shrink_zones
since commit "mm, vmscan: avoid passing in classzone_idx unnecessarily to
compaction_ready".  The zonelist iterator only returns zones that already
meet the requirements of the allocation request.

This is a fix to the mmotm patch
mm-vmscan-avoid-passing-in-classzone_idx-unnecessarily-to-compaction_ready.patch

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/vmscan.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4fdb9e419588..c2ad4263f965 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2606,7 +2606,6 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			 */
 			if (IS_ENABLED(CONFIG_COMPACTION) &&
 			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
-			    zonelist_zone_idx(z) <= sc->reclaim_idx &&
 			    compaction_ready(zone, sc)) {
 				sc->compaction_ready = true;
 				continue;
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change
  2016-07-15 13:09 [PATCH 0/5] Follow-up fixes to node-lru series v2 Mel Gorman
  2016-07-15 13:09 ` [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix Mel Gorman
  2016-07-15 13:09 ` [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix Mel Gorman
@ 2016-07-15 13:09 ` Mel Gorman
  2016-07-15 15:53   ` Minchan Kim
  2016-07-18 16:20   ` Johannes Weiner
  2016-07-15 13:09 ` [PATCH 4/5] mm: show node_pages_scanned per node, not zone Mel Gorman
  2016-07-15 13:09 ` [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg Mel Gorman
  4 siblings, 2 replies; 16+ messages in thread
From: Mel Gorman @ 2016-07-15 13:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Vlastimil Babka, Linux-MM, LKML,
	Mel Gorman

With node-lru, the locking is based on the pgdat. Previously it was
required that a pagevec drain released one zone lru_lock and acquired
another zone lru_lock on every zone change. Now, it's only necessary if
the node changes. The end-result is fewer lock release/acquires if the
pages are all on the same node but in different zones.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/swap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 77af473635fe..75c63bb2a1da 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -179,26 +179,26 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 	void *arg)
 {
 	int i;
-	struct zone *zone = NULL;
+	struct pglist_data *pgdat = NULL;
 	struct lruvec *lruvec;
 	unsigned long flags = 0;
 
 	for (i = 0; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
-		struct zone *pagezone = page_zone(page);
+		struct pglist_data *pagepgdat = page_pgdat(page);
 
-		if (pagezone != zone) {
-			if (zone)
-				spin_unlock_irqrestore(zone_lru_lock(zone), flags);
-			zone = pagezone;
-			spin_lock_irqsave(zone_lru_lock(zone), flags);
+		if (pagepgdat != pgdat) {
+			if (pgdat)
+				spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+			pgdat = pagepgdat;
+			spin_lock_irqsave(&pgdat->lru_lock, flags);
 		}
 
-		lruvec = mem_cgroup_page_lruvec(page, zone->zone_pgdat);
+		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		(*move_fn)(page, lruvec, arg);
 	}
-	if (zone)
-		spin_unlock_irqrestore(zone_lru_lock(zone), flags);
+	if (pgdat)
+		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
 	release_pages(pvec->pages, pvec->nr, pvec->cold);
 	pagevec_reinit(pvec);
 }
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] mm: show node_pages_scanned per node, not zone
  2016-07-15 13:09 [PATCH 0/5] Follow-up fixes to node-lru series v2 Mel Gorman
                   ` (2 preceding siblings ...)
  2016-07-15 13:09 ` [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change Mel Gorman
@ 2016-07-15 13:09 ` Mel Gorman
  2016-07-16 10:14   ` Minchan Kim
  2016-07-15 13:09 ` [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg Mel Gorman
  4 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2016-07-15 13:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Vlastimil Babka, Linux-MM, LKML,
	Mel Gorman

From: Minchan Kim <minchan@kernel.org>

The node_pages_scanned represents the number of scanned pages
of node for reclaim so it's pointless to show it as kilobytes.

As well, node_pages_scanned is per-node value, not per-zone.

This patch changes node_pages_scanned per-zone-killobytes
with per-node-count.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f80a0e57dcc8..7edd311a63f1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4345,6 +4345,7 @@ void show_free_areas(unsigned int filter)
 #endif
 			" writeback_tmp:%lukB"
 			" unstable:%lukB"
+			" pages_scanned:%lu"
 			" all_unreclaimable? %s"
 			"\n",
 			pgdat->node_id,
@@ -4367,6 +4368,7 @@ void show_free_areas(unsigned int filter)
 			K(node_page_state(pgdat, NR_SHMEM)),
 			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
 			K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
+			node_page_state(zone->zone_pgdat, NR_PAGES_SCANNED),
 			!pgdat_reclaimable(pgdat) ? "yes" : "no");
 	}
 
@@ -4397,7 +4399,6 @@ void show_free_areas(unsigned int filter)
 			" free_pcp:%lukB"
 			" local_pcp:%ukB"
 			" free_cma:%lukB"
-			" node_pages_scanned:%lu"
 			"\n",
 			zone->name,
 			K(zone_page_state(zone, NR_FREE_PAGES)),
@@ -4415,8 +4416,7 @@ void show_free_areas(unsigned int filter)
 			K(zone_page_state(zone, NR_BOUNCE)),
 			K(free_pcp),
 			K(this_cpu_read(zone->pageset->pcp.count)),
-			K(zone_page_state(zone, NR_FREE_CMA_PAGES)),
-			K(node_page_state(zone->zone_pgdat, NR_PAGES_SCANNED)));
+			K(zone_page_state(zone, NR_FREE_CMA_PAGES)));
 		printk("lowmem_reserve[]:");
 		for (i = 0; i < MAX_NR_ZONES; i++)
 			printk(" %ld", zone->lowmem_reserve[i]);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg
  2016-07-15 13:09 [PATCH 0/5] Follow-up fixes to node-lru series v2 Mel Gorman
                   ` (3 preceding siblings ...)
  2016-07-15 13:09 ` [PATCH 4/5] mm: show node_pages_scanned per node, not zone Mel Gorman
@ 2016-07-15 13:09 ` Mel Gorman
  2016-07-15 14:45   ` Minchan Kim
  2016-07-15 15:54   ` Minchan Kim
  4 siblings, 2 replies; 16+ messages in thread
From: Mel Gorman @ 2016-07-15 13:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Vlastimil Babka, Linux-MM, LKML,
	Mel Gorman

Minchan Kim reported setting the following warning on a 32-bit system
although it can affect 64-bit systems.

  WARNING: CPU: 4 PID: 1322 at mm/memcontrol.c:998 mem_cgroup_update_lru_size+0x103/0x110
  mem_cgroup_update_lru_size(f44b4000, 1, -7): zid 1 lru_size 1 but empty
  Modules linked in:
  CPU: 4 PID: 1322 Comm: cp Not tainted 4.7.0-rc4-mm1+ #143
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   00000086 00000086 c2bc5a10 db3e4a97 c2bc5a54 db9d4025 c2bc5a40 db07b82a
   db9d0594 c2bc5a70 0000052a db9d4025 000003e6 db208463 000003e6 00000001
   f44b4000 00000001 c2bc5a5c db07b88b 00000009 00000000 c2bc5a54 db9d0594
  Call Trace:
   [<db3e4a97>] dump_stack+0x76/0xaf
   [<db07b82a>] __warn+0xea/0x110
   [<db208463>] ? mem_cgroup_update_lru_size+0x103/0x110
   [<db07b88b>] warn_slowpath_fmt+0x3b/0x40
   [<db208463>] mem_cgroup_update_lru_size+0x103/0x110
   [<db1b52a2>] isolate_lru_pages.isra.61+0x2e2/0x360
   [<db1b6ffc>] shrink_active_list+0xac/0x2a0
   [<db3f136e>] ? __delay+0xe/0x10
   [<db1b772c>] shrink_node_memcg+0x53c/0x7a0
   [<db1b7a3b>] shrink_node+0xab/0x2a0
   [<db1b7cf6>] do_try_to_free_pages+0xc6/0x390
   [<db1b8205>] try_to_free_pages+0x245/0x590

LRU list contents and counts are updated separately. Counts are updated
before pages are added to the LRU and updated after pages are removed.
The warning above is from a check in mem_cgroup_update_lru_size that
ensures that list sizes of zero are empty.

The problem is that node-lru needs to account for highmem pages if
CONFIG_HIGHMEM is set. One impact of the implementation is that the
sizes are updated in multiple passes when pages from multiple zones were
isolated. This happens whether HIGHMEM is set or not. When multiple zones
are isolated, it's possible for a debugging check in memcg to be tripped.

This patch forces all the zone counts to be updated before the memcg
function is called.

Reported-and-tested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/memcontrol.h |  2 +-
 include/linux/mm_inline.h  |  5 ++---
 mm/memcontrol.c            |  5 +----
 mm/vmscan.c                | 40 +++++++++++++++++++++++++++++++++-------
 4 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 80bf8458148a..79c17e1732ae 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -431,7 +431,7 @@ static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
-		enum zone_type zid, int nr_pages);
+		int nr_pages);
 
 unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
 					   int nid, unsigned int lru_mask);
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index ccd40e357b56..d29237428199 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -56,10 +56,9 @@ static __always_inline void update_lru_size(struct lruvec *lruvec,
 				enum lru_list lru, enum zone_type zid,
 				int nr_pages)
 {
-#ifdef CONFIG_MEMCG
-	mem_cgroup_update_lru_size(lruvec, lru, zid, nr_pages);
-#else
 	__update_lru_size(lruvec, lru, zid, nr_pages);
+#ifdef CONFIG_MEMCG
+	mem_cgroup_update_lru_size(lruvec, lru, nr_pages);
 #endif
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9cbd40ebccd1..13be30c3ea78 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -965,7 +965,6 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
  * mem_cgroup_update_lru_size - account for adding or removing an lru page
  * @lruvec: mem_cgroup per zone lru vector
  * @lru: index of lru list the page is sitting on
- * @zid: Zone ID of the zone pages have been added to
  * @nr_pages: positive when adding or negative when removing
  *
  * This function must be called under lru_lock, just before a page is added
@@ -973,15 +972,13 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
  * so as to allow it to check that lru_size 0 is consistent with list_empty).
  */
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
-				enum zone_type zid, int nr_pages)
+				int nr_pages)
 {
 	struct mem_cgroup_per_node *mz;
 	unsigned long *lru_size;
 	long size;
 	bool empty;
 
-	__update_lru_size(lruvec, lru, zid, nr_pages);
-
 	if (mem_cgroup_disabled())
 		return;
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c2ad4263f965..3f06a7a0d135 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1350,6 +1350,38 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 	return ret;
 }
 
+
+/*
+ * Update LRU sizes after isolating pages. The LRU size updates must
+ * be complete before mem_cgroup_update_lru_size due to a santity check.
+ */
+static __always_inline void update_lru_sizes(struct lruvec *lruvec,
+			enum lru_list lru, unsigned long *nr_zone_taken,
+			unsigned long nr_taken)
+{
+#ifdef CONFIG_HIGHMEM
+	int zid;
+
+	/*
+	 * Highmem has separate accounting for highmem pages so each zone
+	 * is updated separately.
+	 */
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		if (!nr_zone_taken[zid])
+			continue;
+
+		__update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]);
+	}
+#else
+	/* Zone ID does not matter on !HIGHMEM */
+	__update_lru_size(lruvec, lru, 0, -nr_taken);
+#endif
+
+#ifdef CONFIG_MEMCG
+	mem_cgroup_update_lru_size(lruvec, lru, -nr_taken);
+#endif
+}
+
 /*
  * zone_lru_lock is heavily contended.  Some of the functions that
  * shrink the lists perform better by taking out a batch of pages
@@ -1436,13 +1468,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	*nr_scanned = scan;
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, scan,
 				    nr_taken, mode, is_file_lru(lru));
-	for (scan = 0; scan < MAX_NR_ZONES; scan++) {
-		nr_pages = nr_zone_taken[scan];
-		if (!nr_pages)
-			continue;
-
-		update_lru_size(lruvec, lru, scan, -nr_pages);
-	}
+	update_lru_sizes(lruvec, lru, nr_zone_taken, nr_taken);
 	return nr_taken;
 }
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg
  2016-07-15 13:09 ` [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg Mel Gorman
@ 2016-07-15 14:45   ` Minchan Kim
  2016-07-15 15:01     ` Mel Gorman
  2016-07-15 15:54   ` Minchan Kim
  1 sibling, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2016-07-15 14:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:25PM +0100, Mel Gorman wrote:
> Minchan Kim reported setting the following warning on a 32-bit system
> although it can affect 64-bit systems.
> 
>   WARNING: CPU: 4 PID: 1322 at mm/memcontrol.c:998 mem_cgroup_update_lru_size+0x103/0x110
>   mem_cgroup_update_lru_size(f44b4000, 1, -7): zid 1 lru_size 1 but empty
>   Modules linked in:
>   CPU: 4 PID: 1322 Comm: cp Not tainted 4.7.0-rc4-mm1+ #143
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>    00000086 00000086 c2bc5a10 db3e4a97 c2bc5a54 db9d4025 c2bc5a40 db07b82a
>    db9d0594 c2bc5a70 0000052a db9d4025 000003e6 db208463 000003e6 00000001
>    f44b4000 00000001 c2bc5a5c db07b88b 00000009 00000000 c2bc5a54 db9d0594
>   Call Trace:
>    [<db3e4a97>] dump_stack+0x76/0xaf
>    [<db07b82a>] __warn+0xea/0x110
>    [<db208463>] ? mem_cgroup_update_lru_size+0x103/0x110
>    [<db07b88b>] warn_slowpath_fmt+0x3b/0x40
>    [<db208463>] mem_cgroup_update_lru_size+0x103/0x110
>    [<db1b52a2>] isolate_lru_pages.isra.61+0x2e2/0x360
>    [<db1b6ffc>] shrink_active_list+0xac/0x2a0
>    [<db3f136e>] ? __delay+0xe/0x10
>    [<db1b772c>] shrink_node_memcg+0x53c/0x7a0
>    [<db1b7a3b>] shrink_node+0xab/0x2a0
>    [<db1b7cf6>] do_try_to_free_pages+0xc6/0x390
>    [<db1b8205>] try_to_free_pages+0x245/0x590
> 
> LRU list contents and counts are updated separately. Counts are updated
> before pages are added to the LRU and updated after pages are removed.
> The warning above is from a check in mem_cgroup_update_lru_size that
> ensures that list sizes of zero are empty.
> 
> The problem is that node-lru needs to account for highmem pages if
> CONFIG_HIGHMEM is set. One impact of the implementation is that the
> sizes are updated in multiple passes when pages from multiple zones were
> isolated. This happens whether HIGHMEM is set or not. When multiple zones
> are isolated, it's possible for a debugging check in memcg to be tripped.
> 
> This patch forces all the zone counts to be updated before the memcg
> function is called.
> 
> Reported-and-tested-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---

< snip >

> +/*
> + * Update LRU sizes after isolating pages. The LRU size updates must
> + * be complete before mem_cgroup_update_lru_size due to a santity check.
> + */

Looks better.

> +static __always_inline void update_lru_sizes(struct lruvec *lruvec,
> +			enum lru_list lru, unsigned long *nr_zone_taken,
> +			unsigned long nr_taken)
> +{
> +#ifdef CONFIG_HIGHMEM

If you think it's really worth to optimize it for non-highmem system,
we don't need to account nr_zone_taken in *isolate_lru_pages*
from the beginning for non-highmem system, either.

> +	int zid;
> +
> +	/*
> +	 * Highmem has separate accounting for highmem pages so each zone

                                               highmem file pages

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg
  2016-07-15 14:45   ` Minchan Kim
@ 2016-07-15 15:01     ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2016-07-15 15:01 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 11:45:34PM +0900, Minchan Kim wrote:
> > +static __always_inline void update_lru_sizes(struct lruvec *lruvec,
> > +			enum lru_list lru, unsigned long *nr_zone_taken,
> > +			unsigned long nr_taken)
> > +{
> > +#ifdef CONFIG_HIGHMEM
> 
> If you think it's really worth to optimize it for non-highmem system,
> we don't need to account nr_zone_taken in *isolate_lru_pages*
> from the beginning for non-highmem system, either.
> 

It becomes a mess of ifdefs and given the marginal overhead, I left it
for now.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix
  2016-07-15 13:09 ` [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix Mel Gorman
@ 2016-07-15 15:42   ` Minchan Kim
  2016-07-18 16:14   ` Johannes Weiner
  1 sibling, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2016-07-15 15:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:21PM +0100, Mel Gorman wrote:
> The patch "mm, vmscan: make shrink_node decisions more node-centric"
> checks whether compaction is suitable on empty nodes. This is expensive
> rather than wrong but is worth fixing.
> 
> This is a fix to the mmotm patch
> mm-vmscan-make-shrink_node-decisions-more-node-centric.patch
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix
  2016-07-15 13:09 ` [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix Mel Gorman
@ 2016-07-15 15:50   ` Minchan Kim
  2016-07-18 16:15   ` Johannes Weiner
  1 sibling, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2016-07-15 15:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:22PM +0100, Mel Gorman wrote:
> As pointed out by Vlastimil, there is a redundant check in shrink_zones
> since commit "mm, vmscan: avoid passing in classzone_idx unnecessarily to
> compaction_ready".  The zonelist iterator only returns zones that already
> meet the requirements of the allocation request.
> 
> This is a fix to the mmotm patch
> mm-vmscan-avoid-passing-in-classzone_idx-unnecessarily-to-compaction_ready.patch
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Minchan Kim <minchan@kernel.org>

Just a Nit:
It seems there is another redundant check in there.

shrink_zones
..
        for_each_zone_zonelist_nodemask(zone, z, zonelist,
                                        sc->reclaim_idx, sc->nodemask) {
                if (!populated_zone(zone)) <==
                        continue;

Of course, it's not your fault but it would be a good chance to
remove such trivial thing :)
If I don't miss something, I hope piggyback on this patch. Andrew?
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change
  2016-07-15 13:09 ` [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change Mel Gorman
@ 2016-07-15 15:53   ` Minchan Kim
  2016-07-18 16:20   ` Johannes Weiner
  1 sibling, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2016-07-15 15:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:23PM +0100, Mel Gorman wrote:
> With node-lru, the locking is based on the pgdat. Previously it was
> required that a pagevec drain released one zone lru_lock and acquired
> another zone lru_lock on every zone change. Now, it's only necessary if
> the node changes. The end-result is fewer lock release/acquires if the
> pages are all on the same node but in different zones.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Minchan Kim <minchan@kernel.org>

check_move_unevictable_pages could be a candidate, too.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg
  2016-07-15 13:09 ` [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg Mel Gorman
  2016-07-15 14:45   ` Minchan Kim
@ 2016-07-15 15:54   ` Minchan Kim
  1 sibling, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2016-07-15 15:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:25PM +0100, Mel Gorman wrote:
> Minchan Kim reported setting the following warning on a 32-bit system
> although it can affect 64-bit systems.
> 
>   WARNING: CPU: 4 PID: 1322 at mm/memcontrol.c:998 mem_cgroup_update_lru_size+0x103/0x110
>   mem_cgroup_update_lru_size(f44b4000, 1, -7): zid 1 lru_size 1 but empty
>   Modules linked in:
>   CPU: 4 PID: 1322 Comm: cp Not tainted 4.7.0-rc4-mm1+ #143
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>    00000086 00000086 c2bc5a10 db3e4a97 c2bc5a54 db9d4025 c2bc5a40 db07b82a
>    db9d0594 c2bc5a70 0000052a db9d4025 000003e6 db208463 000003e6 00000001
>    f44b4000 00000001 c2bc5a5c db07b88b 00000009 00000000 c2bc5a54 db9d0594
>   Call Trace:
>    [<db3e4a97>] dump_stack+0x76/0xaf
>    [<db07b82a>] __warn+0xea/0x110
>    [<db208463>] ? mem_cgroup_update_lru_size+0x103/0x110
>    [<db07b88b>] warn_slowpath_fmt+0x3b/0x40
>    [<db208463>] mem_cgroup_update_lru_size+0x103/0x110
>    [<db1b52a2>] isolate_lru_pages.isra.61+0x2e2/0x360
>    [<db1b6ffc>] shrink_active_list+0xac/0x2a0
>    [<db3f136e>] ? __delay+0xe/0x10
>    [<db1b772c>] shrink_node_memcg+0x53c/0x7a0
>    [<db1b7a3b>] shrink_node+0xab/0x2a0
>    [<db1b7cf6>] do_try_to_free_pages+0xc6/0x390
>    [<db1b8205>] try_to_free_pages+0x245/0x590
> 
> LRU list contents and counts are updated separately. Counts are updated
> before pages are added to the LRU and updated after pages are removed.
> The warning above is from a check in mem_cgroup_update_lru_size that
> ensures that list sizes of zero are empty.
> 
> The problem is that node-lru needs to account for highmem pages if
> CONFIG_HIGHMEM is set. One impact of the implementation is that the
> sizes are updated in multiple passes when pages from multiple zones were
> isolated. This happens whether HIGHMEM is set or not. When multiple zones
> are isolated, it's possible for a debugging check in memcg to be tripped.
> 
> This patch forces all the zone counts to be updated before the memcg
> function is called.
> 
> Reported-and-tested-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] mm: show node_pages_scanned per node, not zone
  2016-07-15 13:09 ` [PATCH 4/5] mm: show node_pages_scanned per node, not zone Mel Gorman
@ 2016-07-16 10:14   ` Minchan Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2016-07-16 10:14 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:24PM +0100, Mel Gorman wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> The node_pages_scanned represents the number of scanned pages
> of node for reclaim so it's pointless to show it as kilobytes.
> 
> As well, node_pages_scanned is per-node value, not per-zone.
> 
> This patch changes node_pages_scanned per-zone-killobytes
> with per-node-count.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/page_alloc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f80a0e57dcc8..7edd311a63f1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4345,6 +4345,7 @@ void show_free_areas(unsigned int filter)
>  #endif
>  			" writeback_tmp:%lukB"
>  			" unstable:%lukB"
> +			" pages_scanned:%lu"
>  			" all_unreclaimable? %s"
>  			"\n",
>  			pgdat->node_id,
> @@ -4367,6 +4368,7 @@ void show_free_areas(unsigned int filter)
>  			K(node_page_state(pgdat, NR_SHMEM)),
>  			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
>  			K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
> +			node_page_state(zone->zone_pgdat, NR_PAGES_SCANNED),

Oops, It should be pgdat, not zone->zone_pgdat.
Andrew, please fold it.

>From 0b058f64335764b82439a3c24fad8573cc1474d7 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Sat, 16 Jul 2016 19:07:51 +0900
Subject: [PATCH] mm: fix node_pages_scanned

Use pgdat for node stat instead of zone->pgdat.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7edd311..7547b6b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4368,7 +4368,7 @@ void show_free_areas(unsigned int filter)
 			K(node_page_state(pgdat, NR_SHMEM)),
 			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
 			K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
-			node_page_state(zone->zone_pgdat, NR_PAGES_SCANNED),
+			node_page_state(pgdat, NR_PAGES_SCANNED),
 			!pgdat_reclaimable(pgdat) ? "yes" : "no");
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix
  2016-07-15 13:09 ` [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix Mel Gorman
  2016-07-15 15:42   ` Minchan Kim
@ 2016-07-18 16:14   ` Johannes Weiner
  1 sibling, 0 replies; 16+ messages in thread
From: Johannes Weiner @ 2016-07-18 16:14 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Minchan Kim, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:21PM +0100, Mel Gorman wrote:
> The patch "mm, vmscan: make shrink_node decisions more node-centric"
> checks whether compaction is suitable on empty nodes. This is expensive
> rather than wrong but is worth fixing.
> 
> This is a fix to the mmotm patch
> mm-vmscan-make-shrink_node-decisions-more-node-centric.patch
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix
  2016-07-15 13:09 ` [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix Mel Gorman
  2016-07-15 15:50   ` Minchan Kim
@ 2016-07-18 16:15   ` Johannes Weiner
  1 sibling, 0 replies; 16+ messages in thread
From: Johannes Weiner @ 2016-07-18 16:15 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Minchan Kim, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:22PM +0100, Mel Gorman wrote:
> As pointed out by Vlastimil, there is a redundant check in shrink_zones
> since commit "mm, vmscan: avoid passing in classzone_idx unnecessarily to
> compaction_ready".  The zonelist iterator only returns zones that already
> meet the requirements of the allocation request.
> 
> This is a fix to the mmotm patch
> mm-vmscan-avoid-passing-in-classzone_idx-unnecessarily-to-compaction_ready.patch
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change
  2016-07-15 13:09 ` [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change Mel Gorman
  2016-07-15 15:53   ` Minchan Kim
@ 2016-07-18 16:20   ` Johannes Weiner
  1 sibling, 0 replies; 16+ messages in thread
From: Johannes Weiner @ 2016-07-18 16:20 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Minchan Kim, Vlastimil Babka, Linux-MM, LKML

On Fri, Jul 15, 2016 at 02:09:23PM +0100, Mel Gorman wrote:
> With node-lru, the locking is based on the pgdat. Previously it was
> required that a pagevec drain released one zone lru_lock and acquired
> another zone lru_lock on every zone change. Now, it's only necessary if
> the node changes. The end-result is fewer lock release/acquires if the
> pages are all on the same node but in different zones.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

This could make quite a difference on some workloads, from a whole
series perspective, when considering that we had the round robin fair
zone allocator on top of this. Page batches that span multiple nodes
on the other hand are much less likely.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-07-18 16:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-15 13:09 [PATCH 0/5] Follow-up fixes to node-lru series v2 Mel Gorman
2016-07-15 13:09 ` [PATCH 1/5] mm, vmscan: make shrink_node decisions more node-centric -fix Mel Gorman
2016-07-15 15:42   ` Minchan Kim
2016-07-18 16:14   ` Johannes Weiner
2016-07-15 13:09 ` [PATCH 2/5] mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready -fix Mel Gorman
2016-07-15 15:50   ` Minchan Kim
2016-07-18 16:15   ` Johannes Weiner
2016-07-15 13:09 ` [PATCH 3/5] mm, pagevec: Release/reacquire lru_lock on pgdat change Mel Gorman
2016-07-15 15:53   ` Minchan Kim
2016-07-18 16:20   ` Johannes Weiner
2016-07-15 13:09 ` [PATCH 4/5] mm: show node_pages_scanned per node, not zone Mel Gorman
2016-07-16 10:14   ` Minchan Kim
2016-07-15 13:09 ` [PATCH 5/5] mm, vmscan: Update all zone LRU sizes before updating memcg Mel Gorman
2016-07-15 14:45   ` Minchan Kim
2016-07-15 15:01     ` Mel Gorman
2016-07-15 15:54   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).