All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] mm: per-lruvec slab stats
@ 2017-05-30 18:17 ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Hi everyone,

Josef is working on a new approach to balancing slab caches and the
page cache. For this to work, he needs slab cache statistics on the
lruvec level. These patches implement that by adding infrastructure
that allows updating and reading generic VM stat items per lruvec,
then switches some existing VM accounting sites, including the slab
accounting ones, to this new cgroup-aware API.

I'll follow up with more patches on this, because there is actually
substantial simplification that can be done to the memory controller
when we replace private memcg accounting with making the existing VM
accounting sites cgroup-aware. But this is enough for Josef to base
his slab reclaim work on, so here goes.

 drivers/base/node.c        |  10 +-
 include/linux/memcontrol.h | 257 ++++++++++++++++++++++++++++++++++++---------
 include/linux/mmzone.h     |   4 +-
 include/linux/swap.h       |   1 -
 include/linux/vmstat.h     |   1 -
 kernel/fork.c              |   8 +-
 mm/memcontrol.c            |  14 ++-
 mm/page-writeback.c        |  15 +--
 mm/page_alloc.c            |   4 -
 mm/rmap.c                  |   8 +-
 mm/slab.c                  |  12 +--
 mm/slab.h                  |  18 +---
 mm/slub.c                  |   4 +-
 mm/vmscan.c                |  18 +---
 mm/vmstat.c                |   4 +-
 mm/workingset.c            |   9 +-
 16 files changed, 250 insertions(+), 137 deletions(-)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 0/6] mm: per-lruvec slab stats
@ 2017-05-30 18:17 ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Hi everyone,

Josef is working on a new approach to balancing slab caches and the
page cache. For this to work, he needs slab cache statistics on the
lruvec level. These patches implement that by adding infrastructure
that allows updating and reading generic VM stat items per lruvec,
then switches some existing VM accounting sites, including the slab
accounting ones, to this new cgroup-aware API.

I'll follow up with more patches on this, because there is actually
substantial simplification that can be done to the memory controller
when we replace private memcg accounting with making the existing VM
accounting sites cgroup-aware. But this is enough for Josef to base
his slab reclaim work on, so here goes.

 drivers/base/node.c        |  10 +-
 include/linux/memcontrol.h | 257 ++++++++++++++++++++++++++++++++++++---------
 include/linux/mmzone.h     |   4 +-
 include/linux/swap.h       |   1 -
 include/linux/vmstat.h     |   1 -
 kernel/fork.c              |   8 +-
 mm/memcontrol.c            |  14 ++-
 mm/page-writeback.c        |  15 +--
 mm/page_alloc.c            |   4 -
 mm/rmap.c                  |   8 +-
 mm/slab.c                  |  12 +--
 mm/slab.h                  |  18 +---
 mm/slub.c                  |   4 +-
 mm/vmscan.c                |  18 +---
 mm/vmstat.c                |   4 +-
 mm/workingset.c            |   9 +-
 16 files changed, 250 insertions(+), 137 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/6] mm: vmscan: delete unused pgdat_reclaimable_pages()
  2017-05-30 18:17 ` Johannes Weiner
@ 2017-05-30 18:17   ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/swap.h |  1 -
 mm/vmscan.c          | 16 ----------------
 2 files changed, 17 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index ba5882419a7d..6e3d1d0a7f48 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -289,7 +289,6 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
 
 /* linux/mm/vmscan.c */
 extern unsigned long zone_reclaimable_pages(struct zone *zone);
-extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8ad39bbc79e6..c5f9d1673392 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -219,22 +219,6 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 	return nr;
 }
 
-unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat)
-{
-	unsigned long nr;
-
-	nr = node_page_state_snapshot(pgdat, NR_ACTIVE_FILE) +
-	     node_page_state_snapshot(pgdat, NR_INACTIVE_FILE) +
-	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE);
-
-	if (get_nr_swap_pages() > 0)
-		nr += node_page_state_snapshot(pgdat, NR_ACTIVE_ANON) +
-		      node_page_state_snapshot(pgdat, NR_INACTIVE_ANON) +
-		      node_page_state_snapshot(pgdat, NR_ISOLATED_ANON);
-
-	return nr;
-}
-
 /**
  * lruvec_lru_size -  Returns the number of pages on the given LRU list.
  * @lruvec: lru vector
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 1/6] mm: vmscan: delete unused pgdat_reclaimable_pages()
@ 2017-05-30 18:17   ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/swap.h |  1 -
 mm/vmscan.c          | 16 ----------------
 2 files changed, 17 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index ba5882419a7d..6e3d1d0a7f48 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -289,7 +289,6 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
 
 /* linux/mm/vmscan.c */
 extern unsigned long zone_reclaimable_pages(struct zone *zone);
-extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8ad39bbc79e6..c5f9d1673392 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -219,22 +219,6 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 	return nr;
 }
 
-unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat)
-{
-	unsigned long nr;
-
-	nr = node_page_state_snapshot(pgdat, NR_ACTIVE_FILE) +
-	     node_page_state_snapshot(pgdat, NR_INACTIVE_FILE) +
-	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE);
-
-	if (get_nr_swap_pages() > 0)
-		nr += node_page_state_snapshot(pgdat, NR_ACTIVE_ANON) +
-		      node_page_state_snapshot(pgdat, NR_INACTIVE_ANON) +
-		      node_page_state_snapshot(pgdat, NR_ISOLATED_ANON);
-
-	return nr;
-}
-
 /**
  * lruvec_lru_size -  Returns the number of pages on the given LRU list.
  * @lruvec: lru vector
-- 
2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-05-30 18:17 ` Johannes Weiner
@ 2017-05-30 18:17   ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

To re-implement slab cache vs. page cache balancing, we'll need the
slab counters at the lruvec level, which, ever since lru reclaim was
moved from the zone to the node, is the intersection of the node, not
the zone, and the memcg.

We could retain the per-zone counters for when the page allocator
dumps its memory information on failures, and have counters on both
levels - which on all but NUMA node 0 is usually redundant. But let's
keep it simple for now and just move them. If anybody complains we can
restore the per-zone counters.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 drivers/base/node.c    | 10 +++++-----
 include/linux/mmzone.h |  4 ++--
 mm/page_alloc.c        |  4 ----
 mm/slab.c              |  8 ++++----
 mm/slub.c              |  4 ++--
 mm/vmscan.c            |  2 +-
 mm/vmstat.c            |  4 ++--
 7 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f9686016..e57e06e6df4c 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -129,11 +129,11 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
 		       nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
 		       nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE) +
-				sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE)),
+		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) +
+			      node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
+		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)),
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
+		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(pgdat, NR_ANON_THPS) *
 				       HPAGE_PMD_NR),
 		       nid, K(node_page_state(pgdat, NR_SHMEM_THPS) *
@@ -141,7 +141,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) *
 				       HPAGE_PMD_NR));
 #else
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
+		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)));
 #endif
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ebaccd4e7d8c..eacadee83964 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -125,8 +125,6 @@ enum zone_stat_item {
 	NR_ZONE_UNEVICTABLE,
 	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
 	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
-	NR_SLAB_RECLAIMABLE,
-	NR_SLAB_UNRECLAIMABLE,
 	NR_PAGETABLE,		/* used for pagetables */
 	NR_KERNEL_STACK_KB,	/* measured in KiB */
 	/* Second 128 byte cacheline */
@@ -152,6 +150,8 @@ enum node_stat_item {
 	NR_INACTIVE_FILE,	/*  "     "     "   "       "         */
 	NR_ACTIVE_FILE,		/*  "     "     "   "       "         */
 	NR_UNEVICTABLE,		/*  "     "     "   "       "         */
+	NR_SLAB_RECLAIMABLE,
+	NR_SLAB_UNRECLAIMABLE,
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	WORKINGSET_REFAULT,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f9e450c6b6e4..5f89cfaddc4b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4601,8 +4601,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			" present:%lukB"
 			" managed:%lukB"
 			" mlocked:%lukB"
-			" slab_reclaimable:%lukB"
-			" slab_unreclaimable:%lukB"
 			" kernel_stack:%lukB"
 			" pagetables:%lukB"
 			" bounce:%lukB"
@@ -4624,8 +4622,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			K(zone->present_pages),
 			K(zone->managed_pages),
 			K(zone_page_state(zone, NR_MLOCK)),
-			K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
-			K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
 			zone_page_state(zone, NR_KERNEL_STACK_KB),
 			K(zone_page_state(zone, NR_PAGETABLE)),
 			K(zone_page_state(zone, NR_BOUNCE)),
diff --git a/mm/slab.c b/mm/slab.c
index 2a31ee3c5814..b55853399559 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1425,10 +1425,10 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 
 	nr_pages = (1 << cachep->gfporder);
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		add_zone_page_state(page_zone(page),
+		add_node_page_state(page_pgdat(page),
 			NR_SLAB_RECLAIMABLE, nr_pages);
 	else
-		add_zone_page_state(page_zone(page),
+		add_node_page_state(page_pgdat(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
 
 	__SetPageSlab(page);
@@ -1459,10 +1459,10 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	kmemcheck_free_shadow(page, order);
 
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		sub_zone_page_state(page_zone(page),
+		sub_node_page_state(page_pgdat(page),
 				NR_SLAB_RECLAIMABLE, nr_freed);
 	else
-		sub_zone_page_state(page_zone(page),
+		sub_node_page_state(page_pgdat(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
 
 	BUG_ON(!PageSlab(page));
diff --git a/mm/slub.c b/mm/slub.c
index 57e5156f02be..673e72698d9b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1615,7 +1615,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (!page)
 		return NULL;
 
-	mod_zone_page_state(page_zone(page),
+	mod_node_page_state(page_pgdat(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		1 << oo_order(oo));
@@ -1655,7 +1655,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 
 	kmemcheck_free_shadow(page, compound_order(page));
 
-	mod_zone_page_state(page_zone(page),
+	mod_node_page_state(page_pgdat(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		-pages);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c5f9d1673392..5d187ee618c0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3815,7 +3815,7 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
 	 * unmapped file backed pages.
 	 */
 	if (node_pagecache_reclaimable(pgdat) <= pgdat->min_unmapped_pages &&
-	    sum_zone_node_page_state(pgdat->node_id, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
+	    node_page_state(pgdat, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
 		return NODE_RECLAIM_FULL;
 
 	/*
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 76f73670200a..a64f1c764f17 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -928,8 +928,6 @@ const char * const vmstat_text[] = {
 	"nr_zone_unevictable",
 	"nr_zone_write_pending",
 	"nr_mlock",
-	"nr_slab_reclaimable",
-	"nr_slab_unreclaimable",
 	"nr_page_table_pages",
 	"nr_kernel_stack",
 	"nr_bounce",
@@ -952,6 +950,8 @@ const char * const vmstat_text[] = {
 	"nr_inactive_file",
 	"nr_active_file",
 	"nr_unevictable",
+	"nr_slab_reclaimable",
+	"nr_slab_unreclaimable",
 	"nr_isolated_anon",
 	"nr_isolated_file",
 	"workingset_refault",
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-05-30 18:17   ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

To re-implement slab cache vs. page cache balancing, we'll need the
slab counters at the lruvec level, which, ever since lru reclaim was
moved from the zone to the node, is the intersection of the node, not
the zone, and the memcg.

We could retain the per-zone counters for when the page allocator
dumps its memory information on failures, and have counters on both
levels - which on all but NUMA node 0 is usually redundant. But let's
keep it simple for now and just move them. If anybody complains we can
restore the per-zone counters.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 drivers/base/node.c    | 10 +++++-----
 include/linux/mmzone.h |  4 ++--
 mm/page_alloc.c        |  4 ----
 mm/slab.c              |  8 ++++----
 mm/slub.c              |  4 ++--
 mm/vmscan.c            |  2 +-
 mm/vmstat.c            |  4 ++--
 7 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f9686016..e57e06e6df4c 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -129,11 +129,11 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
 		       nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
 		       nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE) +
-				sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE)),
+		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) +
+			      node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
+		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)),
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
+		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
 		       nid, K(node_page_state(pgdat, NR_ANON_THPS) *
 				       HPAGE_PMD_NR),
 		       nid, K(node_page_state(pgdat, NR_SHMEM_THPS) *
@@ -141,7 +141,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) *
 				       HPAGE_PMD_NR));
 #else
-		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
+		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)));
 #endif
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ebaccd4e7d8c..eacadee83964 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -125,8 +125,6 @@ enum zone_stat_item {
 	NR_ZONE_UNEVICTABLE,
 	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
 	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
-	NR_SLAB_RECLAIMABLE,
-	NR_SLAB_UNRECLAIMABLE,
 	NR_PAGETABLE,		/* used for pagetables */
 	NR_KERNEL_STACK_KB,	/* measured in KiB */
 	/* Second 128 byte cacheline */
@@ -152,6 +150,8 @@ enum node_stat_item {
 	NR_INACTIVE_FILE,	/*  "     "     "   "       "         */
 	NR_ACTIVE_FILE,		/*  "     "     "   "       "         */
 	NR_UNEVICTABLE,		/*  "     "     "   "       "         */
+	NR_SLAB_RECLAIMABLE,
+	NR_SLAB_UNRECLAIMABLE,
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	WORKINGSET_REFAULT,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f9e450c6b6e4..5f89cfaddc4b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4601,8 +4601,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			" present:%lukB"
 			" managed:%lukB"
 			" mlocked:%lukB"
-			" slab_reclaimable:%lukB"
-			" slab_unreclaimable:%lukB"
 			" kernel_stack:%lukB"
 			" pagetables:%lukB"
 			" bounce:%lukB"
@@ -4624,8 +4622,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			K(zone->present_pages),
 			K(zone->managed_pages),
 			K(zone_page_state(zone, NR_MLOCK)),
-			K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
-			K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
 			zone_page_state(zone, NR_KERNEL_STACK_KB),
 			K(zone_page_state(zone, NR_PAGETABLE)),
 			K(zone_page_state(zone, NR_BOUNCE)),
diff --git a/mm/slab.c b/mm/slab.c
index 2a31ee3c5814..b55853399559 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1425,10 +1425,10 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 
 	nr_pages = (1 << cachep->gfporder);
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		add_zone_page_state(page_zone(page),
+		add_node_page_state(page_pgdat(page),
 			NR_SLAB_RECLAIMABLE, nr_pages);
 	else
-		add_zone_page_state(page_zone(page),
+		add_node_page_state(page_pgdat(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
 
 	__SetPageSlab(page);
@@ -1459,10 +1459,10 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	kmemcheck_free_shadow(page, order);
 
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		sub_zone_page_state(page_zone(page),
+		sub_node_page_state(page_pgdat(page),
 				NR_SLAB_RECLAIMABLE, nr_freed);
 	else
-		sub_zone_page_state(page_zone(page),
+		sub_node_page_state(page_pgdat(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
 
 	BUG_ON(!PageSlab(page));
diff --git a/mm/slub.c b/mm/slub.c
index 57e5156f02be..673e72698d9b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1615,7 +1615,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (!page)
 		return NULL;
 
-	mod_zone_page_state(page_zone(page),
+	mod_node_page_state(page_pgdat(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		1 << oo_order(oo));
@@ -1655,7 +1655,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 
 	kmemcheck_free_shadow(page, compound_order(page));
 
-	mod_zone_page_state(page_zone(page),
+	mod_node_page_state(page_pgdat(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		-pages);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c5f9d1673392..5d187ee618c0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3815,7 +3815,7 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
 	 * unmapped file backed pages.
 	 */
 	if (node_pagecache_reclaimable(pgdat) <= pgdat->min_unmapped_pages &&
-	    sum_zone_node_page_state(pgdat->node_id, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
+	    node_page_state(pgdat, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
 		return NODE_RECLAIM_FULL;
 
 	/*
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 76f73670200a..a64f1c764f17 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -928,8 +928,6 @@ const char * const vmstat_text[] = {
 	"nr_zone_unevictable",
 	"nr_zone_write_pending",
 	"nr_mlock",
-	"nr_slab_reclaimable",
-	"nr_slab_unreclaimable",
 	"nr_page_table_pages",
 	"nr_kernel_stack",
 	"nr_bounce",
@@ -952,6 +950,8 @@ const char * const vmstat_text[] = {
 	"nr_inactive_file",
 	"nr_active_file",
 	"nr_unevictable",
+	"nr_slab_reclaimable",
+	"nr_slab_unreclaimable",
 	"nr_isolated_anon",
 	"nr_isolated_file",
 	"workingset_refault",
-- 
2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 3/6] mm: memcontrol: use the node-native slab memory counters
  2017-05-30 18:17 ` Johannes Weiner
@ 2017-05-30 18:17   ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Now that the slab counters are moved from the zone to the node level
we can drop the private memcg node stats and use the official ones.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 2 --
 mm/memcontrol.c            | 8 ++++----
 mm/slab.h                  | 4 ++--
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 899949bbb2f9..7b8f0f239fd6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -44,8 +44,6 @@ enum memcg_stat_item {
 	MEMCG_SOCK,
 	/* XXX: why are these zone and not node counters? */
 	MEMCG_KERNEL_STACK_KB,
-	MEMCG_SLAB_RECLAIMABLE,
-	MEMCG_SLAB_UNRECLAIMABLE,
 	MEMCG_NR_STAT,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 94172089f52f..9c68a40c83e3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5197,8 +5197,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	seq_printf(m, "kernel_stack %llu\n",
 		   (u64)stat[MEMCG_KERNEL_STACK_KB] * 1024);
 	seq_printf(m, "slab %llu\n",
-		   (u64)(stat[MEMCG_SLAB_RECLAIMABLE] +
-			 stat[MEMCG_SLAB_UNRECLAIMABLE]) * PAGE_SIZE);
+		   (u64)(stat[NR_SLAB_RECLAIMABLE] +
+			 stat[NR_SLAB_UNRECLAIMABLE]) * PAGE_SIZE);
 	seq_printf(m, "sock %llu\n",
 		   (u64)stat[MEMCG_SOCK] * PAGE_SIZE);
 
@@ -5222,9 +5222,9 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	}
 
 	seq_printf(m, "slab_reclaimable %llu\n",
-		   (u64)stat[MEMCG_SLAB_RECLAIMABLE] * PAGE_SIZE);
+		   (u64)stat[NR_SLAB_RECLAIMABLE] * PAGE_SIZE);
 	seq_printf(m, "slab_unreclaimable %llu\n",
-		   (u64)stat[MEMCG_SLAB_UNRECLAIMABLE] * PAGE_SIZE);
+		   (u64)stat[NR_SLAB_UNRECLAIMABLE] * PAGE_SIZE);
 
 	/* Accumulated memory events */
 
diff --git a/mm/slab.h b/mm/slab.h
index 9cfcf099709c..69f0579cb5aa 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -287,7 +287,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
 
 	memcg_kmem_update_page_stat(page,
 			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			MEMCG_SLAB_RECLAIMABLE : MEMCG_SLAB_UNRECLAIMABLE,
+			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 			1 << order);
 	return 0;
 }
@@ -300,7 +300,7 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 
 	memcg_kmem_update_page_stat(page,
 			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			MEMCG_SLAB_RECLAIMABLE : MEMCG_SLAB_UNRECLAIMABLE,
+			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 			-(1 << order));
 	memcg_kmem_uncharge(page, order);
 }
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 3/6] mm: memcontrol: use the node-native slab memory counters
@ 2017-05-30 18:17   ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Now that the slab counters are moved from the zone to the node level
we can drop the private memcg node stats and use the official ones.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 2 --
 mm/memcontrol.c            | 8 ++++----
 mm/slab.h                  | 4 ++--
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 899949bbb2f9..7b8f0f239fd6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -44,8 +44,6 @@ enum memcg_stat_item {
 	MEMCG_SOCK,
 	/* XXX: why are these zone and not node counters? */
 	MEMCG_KERNEL_STACK_KB,
-	MEMCG_SLAB_RECLAIMABLE,
-	MEMCG_SLAB_UNRECLAIMABLE,
 	MEMCG_NR_STAT,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 94172089f52f..9c68a40c83e3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5197,8 +5197,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	seq_printf(m, "kernel_stack %llu\n",
 		   (u64)stat[MEMCG_KERNEL_STACK_KB] * 1024);
 	seq_printf(m, "slab %llu\n",
-		   (u64)(stat[MEMCG_SLAB_RECLAIMABLE] +
-			 stat[MEMCG_SLAB_UNRECLAIMABLE]) * PAGE_SIZE);
+		   (u64)(stat[NR_SLAB_RECLAIMABLE] +
+			 stat[NR_SLAB_UNRECLAIMABLE]) * PAGE_SIZE);
 	seq_printf(m, "sock %llu\n",
 		   (u64)stat[MEMCG_SOCK] * PAGE_SIZE);
 
@@ -5222,9 +5222,9 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	}
 
 	seq_printf(m, "slab_reclaimable %llu\n",
-		   (u64)stat[MEMCG_SLAB_RECLAIMABLE] * PAGE_SIZE);
+		   (u64)stat[NR_SLAB_RECLAIMABLE] * PAGE_SIZE);
 	seq_printf(m, "slab_unreclaimable %llu\n",
-		   (u64)stat[MEMCG_SLAB_UNRECLAIMABLE] * PAGE_SIZE);
+		   (u64)stat[NR_SLAB_UNRECLAIMABLE] * PAGE_SIZE);
 
 	/* Accumulated memory events */
 
diff --git a/mm/slab.h b/mm/slab.h
index 9cfcf099709c..69f0579cb5aa 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -287,7 +287,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
 
 	memcg_kmem_update_page_stat(page,
 			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			MEMCG_SLAB_RECLAIMABLE : MEMCG_SLAB_UNRECLAIMABLE,
+			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 			1 << order);
 	return 0;
 }
@@ -300,7 +300,7 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 
 	memcg_kmem_update_page_stat(page,
 			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			MEMCG_SLAB_RECLAIMABLE : MEMCG_SLAB_UNRECLAIMABLE,
+			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 			-(1 << order));
 	memcg_kmem_uncharge(page, order);
 }
-- 
2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 4/6] mm: memcontrol: use generic mod_memcg_page_state for kmem pages
  2017-05-30 18:17 ` Johannes Weiner
@ 2017-05-30 18:17   ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

The kmem-specific functions do the same thing. Switch and drop.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 17 -----------------
 kernel/fork.c              |  8 ++++----
 mm/slab.h                  | 16 ++++++++--------
 3 files changed, 12 insertions(+), 29 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7b8f0f239fd6..62139aff6033 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -884,19 +884,6 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
 	return memcg ? memcg->kmemcg_id : -1;
 }
 
-/**
- * memcg_kmem_update_page_stat - update kmem page state statistics
- * @page: the page
- * @idx: page state item to account
- * @val: number of pages (positive or negative)
- */
-static inline void memcg_kmem_update_page_stat(struct page *page,
-				enum memcg_stat_item idx, int val)
-{
-	if (memcg_kmem_enabled() && page->mem_cgroup)
-		this_cpu_add(page->mem_cgroup->stat->count[idx], val);
-}
-
 #else
 #define for_each_memcg_cache_index(_idx)	\
 	for (; NULL; )
@@ -919,10 +906,6 @@ static inline void memcg_put_cache_ids(void)
 {
 }
 
-static inline void memcg_kmem_update_page_stat(struct page *page,
-				enum memcg_stat_item idx, int val)
-{
-}
 #endif /* CONFIG_MEMCG && !CONFIG_SLOB */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index aa1076c5e4a9..b5f45fe81a43 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -326,8 +326,8 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 		}
 
 		/* All stack pages belong to the same memcg. */
-		memcg_kmem_update_page_stat(vm->pages[0], MEMCG_KERNEL_STACK_KB,
-					    account * (THREAD_SIZE / 1024));
+		mod_memcg_page_state(vm->pages[0], MEMCG_KERNEL_STACK_KB,
+				     account * (THREAD_SIZE / 1024));
 	} else {
 		/*
 		 * All stack pages are in the same zone and belong to the
@@ -338,8 +338,8 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 		mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB,
 				    THREAD_SIZE / 1024 * account);
 
-		memcg_kmem_update_page_stat(first_page, MEMCG_KERNEL_STACK_KB,
-					    account * (THREAD_SIZE / 1024));
+		mod_memcg_page_state(first_page, MEMCG_KERNEL_STACK_KB,
+				     account * (THREAD_SIZE / 1024));
 	}
 }
 
diff --git a/mm/slab.h b/mm/slab.h
index 69f0579cb5aa..7b84e3839dfe 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -285,10 +285,10 @@ static __always_inline int memcg_charge_slab(struct page *page,
 	if (ret)
 		return ret;
 
-	memcg_kmem_update_page_stat(page,
-			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			1 << order);
+	mod_memcg_page_state(page,
+			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
+			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
+			     1 << order);
 	return 0;
 }
 
@@ -298,10 +298,10 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 	if (!memcg_kmem_enabled())
 		return;
 
-	memcg_kmem_update_page_stat(page,
-			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			-(1 << order));
+	mod_memcg_page_state(page,
+			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
+			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
+			     -(1 << order));
 	memcg_kmem_uncharge(page, order);
 }
 
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 4/6] mm: memcontrol: use generic mod_memcg_page_state for kmem pages
@ 2017-05-30 18:17   ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

The kmem-specific functions do the same thing. Switch and drop.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 17 -----------------
 kernel/fork.c              |  8 ++++----
 mm/slab.h                  | 16 ++++++++--------
 3 files changed, 12 insertions(+), 29 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7b8f0f239fd6..62139aff6033 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -884,19 +884,6 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
 	return memcg ? memcg->kmemcg_id : -1;
 }
 
-/**
- * memcg_kmem_update_page_stat - update kmem page state statistics
- * @page: the page
- * @idx: page state item to account
- * @val: number of pages (positive or negative)
- */
-static inline void memcg_kmem_update_page_stat(struct page *page,
-				enum memcg_stat_item idx, int val)
-{
-	if (memcg_kmem_enabled() && page->mem_cgroup)
-		this_cpu_add(page->mem_cgroup->stat->count[idx], val);
-}
-
 #else
 #define for_each_memcg_cache_index(_idx)	\
 	for (; NULL; )
@@ -919,10 +906,6 @@ static inline void memcg_put_cache_ids(void)
 {
 }
 
-static inline void memcg_kmem_update_page_stat(struct page *page,
-				enum memcg_stat_item idx, int val)
-{
-}
 #endif /* CONFIG_MEMCG && !CONFIG_SLOB */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index aa1076c5e4a9..b5f45fe81a43 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -326,8 +326,8 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 		}
 
 		/* All stack pages belong to the same memcg. */
-		memcg_kmem_update_page_stat(vm->pages[0], MEMCG_KERNEL_STACK_KB,
-					    account * (THREAD_SIZE / 1024));
+		mod_memcg_page_state(vm->pages[0], MEMCG_KERNEL_STACK_KB,
+				     account * (THREAD_SIZE / 1024));
 	} else {
 		/*
 		 * All stack pages are in the same zone and belong to the
@@ -338,8 +338,8 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 		mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB,
 				    THREAD_SIZE / 1024 * account);
 
-		memcg_kmem_update_page_stat(first_page, MEMCG_KERNEL_STACK_KB,
-					    account * (THREAD_SIZE / 1024));
+		mod_memcg_page_state(first_page, MEMCG_KERNEL_STACK_KB,
+				     account * (THREAD_SIZE / 1024));
 	}
 }
 
diff --git a/mm/slab.h b/mm/slab.h
index 69f0579cb5aa..7b84e3839dfe 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -285,10 +285,10 @@ static __always_inline int memcg_charge_slab(struct page *page,
 	if (ret)
 		return ret;
 
-	memcg_kmem_update_page_stat(page,
-			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			1 << order);
+	mod_memcg_page_state(page,
+			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
+			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
+			     1 << order);
 	return 0;
 }
 
@@ -298,10 +298,10 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 	if (!memcg_kmem_enabled())
 		return;
 
-	memcg_kmem_update_page_stat(page,
-			(s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			-(1 << order));
+	mod_memcg_page_state(page,
+			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
+			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
+			     -(1 << order));
 	memcg_kmem_uncharge(page, order);
 }
 
-- 
2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-05-30 18:17 ` Johannes Weiner
@ 2017-05-30 18:17   ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

lruvecs are at the intersection of the NUMA node and memcg, which is
the scope for most paging activity.

Introduce a convenient accounting infrastructure that maintains
statistics per node, per memcg, and the lruvec itself.

Then convert over accounting sites for statistics that are already
tracked in both nodes and memcgs and can be easily switched.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
 include/linux/vmstat.h     |   1 -
 mm/memcontrol.c            |   6 ++
 mm/page-writeback.c        |  15 +--
 mm/rmap.c                  |   8 +-
 mm/workingset.c            |   9 +-
 6 files changed, 225 insertions(+), 52 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 62139aff6033..a282eb2a6cc3 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -26,7 +26,8 @@
 #include <linux/page_counter.h>
 #include <linux/vmpressure.h>
 #include <linux/eventfd.h>
-#include <linux/mmzone.h>
+#include <linux/mm.h>
+#include <linux/vmstat.h>
 #include <linux/writeback.h>
 #include <linux/page-flags.h>
 
@@ -98,11 +99,16 @@ struct mem_cgroup_reclaim_iter {
 	unsigned int generation;
 };
 
+struct lruvec_stat {
+	long count[NR_VM_NODE_STAT_ITEMS];
+};
+
 /*
  * per-zone information in memory controller.
  */
 struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
+	struct lruvec_stat __percpu *lruvec_stat;
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
 
 	struct mem_cgroup_reclaim_iter	iter[DEF_PRIORITY + 1];
@@ -485,23 +491,18 @@ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 	return val;
 }
 
-static inline void mod_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx, int val)
+static inline void __mod_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx, int val)
 {
 	if (!mem_cgroup_disabled())
-		this_cpu_add(memcg->stat->count[idx], val);
-}
-
-static inline void inc_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
-{
-	mod_memcg_state(memcg, idx, 1);
+		__this_cpu_add(memcg->stat->count[idx], val);
 }
 
-static inline void dec_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void mod_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx, int val)
 {
-	mod_memcg_state(memcg, idx, -1);
+	if (!mem_cgroup_disabled())
+		this_cpu_add(memcg->stat->count[idx], val);
 }
 
 /**
@@ -521,6 +522,13 @@ static inline void dec_memcg_state(struct mem_cgroup *memcg,
  *
  * Kernel pages are an exception to this, since they'll never move.
  */
+static inline void __mod_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx, int val)
+{
+	if (page->mem_cgroup)
+		__mod_memcg_state(page->mem_cgroup, idx, val);
+}
+
 static inline void mod_memcg_page_state(struct page *page,
 					enum memcg_stat_item idx, int val)
 {
@@ -528,16 +536,68 @@ static inline void mod_memcg_page_state(struct page *page,
 		mod_memcg_state(page->mem_cgroup, idx, val);
 }
 
-static inline void inc_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
+					      enum node_stat_item idx)
 {
-	mod_memcg_page_state(page, idx, 1);
+	struct mem_cgroup_per_node *pn;
+	long val = 0;
+	int cpu;
+
+	if (mem_cgroup_disabled())
+		return node_page_state(lruvec_pgdat(lruvec), idx);
+
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	for_each_possible_cpu(cpu)
+		val += per_cpu(pn->lruvec_stat->count[idx], cpu);
+
+	if (val < 0)
+		val = 0;
+
+	return val;
 }
 
-static inline void dec_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline void __mod_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx, int val)
 {
-	mod_memcg_page_state(page, idx, -1);
+	struct mem_cgroup_per_node *pn;
+
+	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	if (mem_cgroup_disabled())
+		return;
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	__mod_memcg_state(pn->memcg, idx, val);
+	__this_cpu_add(pn->lruvec_stat->count[idx], val);
+}
+
+static inline void mod_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx, int val)
+{
+	struct mem_cgroup_per_node *pn;
+
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	if (mem_cgroup_disabled())
+		return;
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	mod_memcg_state(pn->memcg, idx, val);
+	this_cpu_add(pn->lruvec_stat->count[idx], val);
+}
+
+static inline void __mod_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx, int val)
+{
+	struct lruvec *lruvec;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	__mod_lruvec_state(lruvec, idx, val);
+}
+
+static inline void mod_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx, int val)
+{
+	struct lruvec *lruvec;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	mod_lruvec_state(lruvec, idx, val);
 }
 
 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
@@ -743,19 +803,21 @@ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 	return 0;
 }
 
-static inline void mod_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx,
-				   int nr)
+static inline void __mod_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx,
+				     int nr)
 {
 }
 
-static inline void inc_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void mod_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx,
+				   int nr)
 {
 }
 
-static inline void dec_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void __mod_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx,
+					  int nr)
 {
 }
 
@@ -765,14 +827,34 @@ static inline void mod_memcg_page_state(struct page *page,
 {
 }
 
-static inline void inc_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
+					      enum node_stat_item idx)
 {
+	return node_page_state(lruvec_pgdat(lruvec), idx);
 }
 
-static inline void dec_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline void __mod_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx, int val)
+{
+	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+}
+
+static inline void mod_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx, int val)
+{
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+}
+
+static inline void __mod_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx, int val)
+{
+	__mod_node_page_state(page_pgdat(page), idx, val);
+}
+
+static inline void mod_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx, int val)
 {
+	mod_node_page_state(page_pgdat(page), idx, val);
 }
 
 static inline
@@ -793,6 +875,102 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 }
 #endif /* CONFIG_MEMCG */
 
+static inline void __inc_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx)
+{
+	__mod_memcg_state(memcg, idx, 1);
+}
+
+static inline void __dec_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx)
+{
+	__mod_memcg_state(memcg, idx, -1);
+}
+
+static inline void __inc_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx)
+{
+	__mod_memcg_page_state(page, idx, 1);
+}
+
+static inline void __dec_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx)
+{
+	__mod_memcg_page_state(page, idx, -1);
+}
+
+static inline void __inc_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx)
+{
+	__mod_lruvec_state(lruvec, idx, 1);
+}
+
+static inline void __dec_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx)
+{
+	__mod_lruvec_state(lruvec, idx, -1);
+}
+
+static inline void __inc_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx)
+{
+	__mod_lruvec_page_state(page, idx, 1);
+}
+
+static inline void __dec_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx)
+{
+	__mod_lruvec_page_state(page, idx, -1);
+}
+
+static inline void inc_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx)
+{
+	mod_memcg_state(memcg, idx, 1);
+}
+
+static inline void dec_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx)
+{
+	mod_memcg_state(memcg, idx, -1);
+}
+
+static inline void inc_memcg_page_state(struct page *page,
+					enum memcg_stat_item idx)
+{
+	mod_memcg_page_state(page, idx, 1);
+}
+
+static inline void dec_memcg_page_state(struct page *page,
+					enum memcg_stat_item idx)
+{
+	mod_memcg_page_state(page, idx, -1);
+}
+
+static inline void inc_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx)
+{
+	mod_lruvec_state(lruvec, idx, 1);
+}
+
+static inline void dec_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx)
+{
+	mod_lruvec_state(lruvec, idx, -1);
+}
+
+static inline void inc_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx)
+{
+	mod_lruvec_page_state(page, idx, 1);
+}
+
+static inline void dec_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx)
+{
+	mod_lruvec_page_state(page, idx, -1);
+}
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg);
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 613771909b6e..b3d85f30d424 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -3,7 +3,6 @@
 
 #include <linux/types.h>
 #include <linux/percpu.h>
-#include <linux/mm.h>
 #include <linux/mmzone.h>
 #include <linux/vm_event_item.h>
 #include <linux/atomic.h>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9c68a40c83e3..e37908606c0f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	if (!pn)
 		return 1;
 
+	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
+	if (!pn->lruvec_stat) {
+		kfree(pn);
+		return 1;
+	}
+
 	lruvec_init(&pn->lruvec);
 	pn->usage_in_excess = 0;
 	pn->on_tree = false;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 143c1c25d680..8989eada0ef7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2433,8 +2433,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		inode_attach_wb(inode, page);
 		wb = inode_to_wb(inode);
 
-		inc_memcg_page_state(page, NR_FILE_DIRTY);
-		__inc_node_page_state(page, NR_FILE_DIRTY);
+		__inc_lruvec_page_state(page, NR_FILE_DIRTY);
 		__inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		__inc_node_page_state(page, NR_DIRTIED);
 		__inc_wb_stat(wb, WB_RECLAIMABLE);
@@ -2455,8 +2454,7 @@ void account_page_cleaned(struct page *page, struct address_space *mapping,
 			  struct bdi_writeback *wb)
 {
 	if (mapping_cap_account_dirty(mapping)) {
-		dec_memcg_page_state(page, NR_FILE_DIRTY);
-		dec_node_page_state(page, NR_FILE_DIRTY);
+		dec_lruvec_page_state(page, NR_FILE_DIRTY);
 		dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		dec_wb_stat(wb, WB_RECLAIMABLE);
 		task_io_account_cancelled_write(PAGE_SIZE);
@@ -2712,8 +2710,7 @@ int clear_page_dirty_for_io(struct page *page)
 		 */
 		wb = unlocked_inode_to_wb_begin(inode, &locked);
 		if (TestClearPageDirty(page)) {
-			dec_memcg_page_state(page, NR_FILE_DIRTY);
-			dec_node_page_state(page, NR_FILE_DIRTY);
+			dec_lruvec_page_state(page, NR_FILE_DIRTY);
 			dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 			dec_wb_stat(wb, WB_RECLAIMABLE);
 			ret = 1;
@@ -2759,8 +2756,7 @@ int test_clear_page_writeback(struct page *page)
 		ret = TestClearPageWriteback(page);
 	}
 	if (ret) {
-		dec_memcg_page_state(page, NR_WRITEBACK);
-		dec_node_page_state(page, NR_WRITEBACK);
+		dec_lruvec_page_state(page, NR_WRITEBACK);
 		dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		inc_node_page_state(page, NR_WRITTEN);
 	}
@@ -2814,8 +2810,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		ret = TestSetPageWriteback(page);
 	}
 	if (!ret) {
-		inc_memcg_page_state(page, NR_WRITEBACK);
-		inc_node_page_state(page, NR_WRITEBACK);
+		inc_lruvec_page_state(page, NR_WRITEBACK);
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
 	unlock_page_memcg(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index d405f0e0ee96..8ee842aa06ee 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1157,8 +1157,7 @@ void page_add_file_rmap(struct page *page, bool compound)
 		if (!atomic_inc_and_test(&page->_mapcount))
 			goto out;
 	}
-	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, nr);
-	mod_memcg_page_state(page, NR_FILE_MAPPED, nr);
+	__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
 out:
 	unlock_page_memcg(page);
 }
@@ -1193,12 +1192,11 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 	}
 
 	/*
-	 * We use the irq-unsafe __{inc|mod}_zone_page_state because
+	 * We use the irq-unsafe __{inc|mod}_lruvec_page_state because
 	 * these counters are not modified in interrupt context, and
 	 * pte lock(a spinlock) is held, which implies preemption disabled.
 	 */
-	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, -nr);
-	mod_memcg_page_state(page, NR_FILE_MAPPED, -nr);
+	__mod_lruvec_page_state(page, NR_FILE_MAPPED, -nr);
 
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);
diff --git a/mm/workingset.c b/mm/workingset.c
index b8c9ab678479..7119cd745ace 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -288,12 +288,10 @@ bool workingset_refault(void *shadow)
 	 */
 	refault_distance = (refault - eviction) & EVICTION_MASK;
 
-	inc_node_state(pgdat, WORKINGSET_REFAULT);
-	inc_memcg_state(memcg, WORKINGSET_REFAULT);
+	inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
 
 	if (refault_distance <= active_file) {
-		inc_node_state(pgdat, WORKINGSET_ACTIVATE);
-		inc_memcg_state(memcg, WORKINGSET_ACTIVATE);
+		inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
 		rcu_read_unlock();
 		return true;
 	}
@@ -474,8 +472,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	}
 	if (WARN_ON_ONCE(node->exceptional))
 		goto out_invalid;
-	inc_node_state(page_pgdat(virt_to_page(node)), WORKINGSET_NODERECLAIM);
-	inc_memcg_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
+	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
 	__radix_tree_delete_node(&mapping->page_tree, node,
 				 workingset_update_node, mapping);
 
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-05-30 18:17   ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

lruvecs are at the intersection of the NUMA node and memcg, which is
the scope for most paging activity.

Introduce a convenient accounting infrastructure that maintains
statistics per node, per memcg, and the lruvec itself.

Then convert over accounting sites for statistics that are already
tracked in both nodes and memcgs and can be easily switched.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
 include/linux/vmstat.h     |   1 -
 mm/memcontrol.c            |   6 ++
 mm/page-writeback.c        |  15 +--
 mm/rmap.c                  |   8 +-
 mm/workingset.c            |   9 +-
 6 files changed, 225 insertions(+), 52 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 62139aff6033..a282eb2a6cc3 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -26,7 +26,8 @@
 #include <linux/page_counter.h>
 #include <linux/vmpressure.h>
 #include <linux/eventfd.h>
-#include <linux/mmzone.h>
+#include <linux/mm.h>
+#include <linux/vmstat.h>
 #include <linux/writeback.h>
 #include <linux/page-flags.h>
 
@@ -98,11 +99,16 @@ struct mem_cgroup_reclaim_iter {
 	unsigned int generation;
 };
 
+struct lruvec_stat {
+	long count[NR_VM_NODE_STAT_ITEMS];
+};
+
 /*
  * per-zone information in memory controller.
  */
 struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
+	struct lruvec_stat __percpu *lruvec_stat;
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
 
 	struct mem_cgroup_reclaim_iter	iter[DEF_PRIORITY + 1];
@@ -485,23 +491,18 @@ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 	return val;
 }
 
-static inline void mod_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx, int val)
+static inline void __mod_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx, int val)
 {
 	if (!mem_cgroup_disabled())
-		this_cpu_add(memcg->stat->count[idx], val);
-}
-
-static inline void inc_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
-{
-	mod_memcg_state(memcg, idx, 1);
+		__this_cpu_add(memcg->stat->count[idx], val);
 }
 
-static inline void dec_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void mod_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx, int val)
 {
-	mod_memcg_state(memcg, idx, -1);
+	if (!mem_cgroup_disabled())
+		this_cpu_add(memcg->stat->count[idx], val);
 }
 
 /**
@@ -521,6 +522,13 @@ static inline void dec_memcg_state(struct mem_cgroup *memcg,
  *
  * Kernel pages are an exception to this, since they'll never move.
  */
+static inline void __mod_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx, int val)
+{
+	if (page->mem_cgroup)
+		__mod_memcg_state(page->mem_cgroup, idx, val);
+}
+
 static inline void mod_memcg_page_state(struct page *page,
 					enum memcg_stat_item idx, int val)
 {
@@ -528,16 +536,68 @@ static inline void mod_memcg_page_state(struct page *page,
 		mod_memcg_state(page->mem_cgroup, idx, val);
 }
 
-static inline void inc_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
+					      enum node_stat_item idx)
 {
-	mod_memcg_page_state(page, idx, 1);
+	struct mem_cgroup_per_node *pn;
+	long val = 0;
+	int cpu;
+
+	if (mem_cgroup_disabled())
+		return node_page_state(lruvec_pgdat(lruvec), idx);
+
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	for_each_possible_cpu(cpu)
+		val += per_cpu(pn->lruvec_stat->count[idx], cpu);
+
+	if (val < 0)
+		val = 0;
+
+	return val;
 }
 
-static inline void dec_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline void __mod_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx, int val)
 {
-	mod_memcg_page_state(page, idx, -1);
+	struct mem_cgroup_per_node *pn;
+
+	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	if (mem_cgroup_disabled())
+		return;
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	__mod_memcg_state(pn->memcg, idx, val);
+	__this_cpu_add(pn->lruvec_stat->count[idx], val);
+}
+
+static inline void mod_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx, int val)
+{
+	struct mem_cgroup_per_node *pn;
+
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	if (mem_cgroup_disabled())
+		return;
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	mod_memcg_state(pn->memcg, idx, val);
+	this_cpu_add(pn->lruvec_stat->count[idx], val);
+}
+
+static inline void __mod_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx, int val)
+{
+	struct lruvec *lruvec;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	__mod_lruvec_state(lruvec, idx, val);
+}
+
+static inline void mod_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx, int val)
+{
+	struct lruvec *lruvec;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	mod_lruvec_state(lruvec, idx, val);
 }
 
 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
@@ -743,19 +803,21 @@ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 	return 0;
 }
 
-static inline void mod_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx,
-				   int nr)
+static inline void __mod_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx,
+				     int nr)
 {
 }
 
-static inline void inc_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void mod_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx,
+				   int nr)
 {
 }
 
-static inline void dec_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void __mod_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx,
+					  int nr)
 {
 }
 
@@ -765,14 +827,34 @@ static inline void mod_memcg_page_state(struct page *page,
 {
 }
 
-static inline void inc_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
+					      enum node_stat_item idx)
 {
+	return node_page_state(lruvec_pgdat(lruvec), idx);
 }
 
-static inline void dec_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline void __mod_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx, int val)
+{
+	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+}
+
+static inline void mod_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx, int val)
+{
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+}
+
+static inline void __mod_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx, int val)
+{
+	__mod_node_page_state(page_pgdat(page), idx, val);
+}
+
+static inline void mod_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx, int val)
 {
+	mod_node_page_state(page_pgdat(page), idx, val);
 }
 
 static inline
@@ -793,6 +875,102 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 }
 #endif /* CONFIG_MEMCG */
 
+static inline void __inc_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx)
+{
+	__mod_memcg_state(memcg, idx, 1);
+}
+
+static inline void __dec_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx)
+{
+	__mod_memcg_state(memcg, idx, -1);
+}
+
+static inline void __inc_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx)
+{
+	__mod_memcg_page_state(page, idx, 1);
+}
+
+static inline void __dec_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx)
+{
+	__mod_memcg_page_state(page, idx, -1);
+}
+
+static inline void __inc_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx)
+{
+	__mod_lruvec_state(lruvec, idx, 1);
+}
+
+static inline void __dec_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx)
+{
+	__mod_lruvec_state(lruvec, idx, -1);
+}
+
+static inline void __inc_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx)
+{
+	__mod_lruvec_page_state(page, idx, 1);
+}
+
+static inline void __dec_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx)
+{
+	__mod_lruvec_page_state(page, idx, -1);
+}
+
+static inline void inc_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx)
+{
+	mod_memcg_state(memcg, idx, 1);
+}
+
+static inline void dec_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx)
+{
+	mod_memcg_state(memcg, idx, -1);
+}
+
+static inline void inc_memcg_page_state(struct page *page,
+					enum memcg_stat_item idx)
+{
+	mod_memcg_page_state(page, idx, 1);
+}
+
+static inline void dec_memcg_page_state(struct page *page,
+					enum memcg_stat_item idx)
+{
+	mod_memcg_page_state(page, idx, -1);
+}
+
+static inline void inc_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx)
+{
+	mod_lruvec_state(lruvec, idx, 1);
+}
+
+static inline void dec_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx)
+{
+	mod_lruvec_state(lruvec, idx, -1);
+}
+
+static inline void inc_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx)
+{
+	mod_lruvec_page_state(page, idx, 1);
+}
+
+static inline void dec_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx)
+{
+	mod_lruvec_page_state(page, idx, -1);
+}
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg);
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 613771909b6e..b3d85f30d424 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -3,7 +3,6 @@
 
 #include <linux/types.h>
 #include <linux/percpu.h>
-#include <linux/mm.h>
 #include <linux/mmzone.h>
 #include <linux/vm_event_item.h>
 #include <linux/atomic.h>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9c68a40c83e3..e37908606c0f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	if (!pn)
 		return 1;
 
+	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
+	if (!pn->lruvec_stat) {
+		kfree(pn);
+		return 1;
+	}
+
 	lruvec_init(&pn->lruvec);
 	pn->usage_in_excess = 0;
 	pn->on_tree = false;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 143c1c25d680..8989eada0ef7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2433,8 +2433,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		inode_attach_wb(inode, page);
 		wb = inode_to_wb(inode);
 
-		inc_memcg_page_state(page, NR_FILE_DIRTY);
-		__inc_node_page_state(page, NR_FILE_DIRTY);
+		__inc_lruvec_page_state(page, NR_FILE_DIRTY);
 		__inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		__inc_node_page_state(page, NR_DIRTIED);
 		__inc_wb_stat(wb, WB_RECLAIMABLE);
@@ -2455,8 +2454,7 @@ void account_page_cleaned(struct page *page, struct address_space *mapping,
 			  struct bdi_writeback *wb)
 {
 	if (mapping_cap_account_dirty(mapping)) {
-		dec_memcg_page_state(page, NR_FILE_DIRTY);
-		dec_node_page_state(page, NR_FILE_DIRTY);
+		dec_lruvec_page_state(page, NR_FILE_DIRTY);
 		dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		dec_wb_stat(wb, WB_RECLAIMABLE);
 		task_io_account_cancelled_write(PAGE_SIZE);
@@ -2712,8 +2710,7 @@ int clear_page_dirty_for_io(struct page *page)
 		 */
 		wb = unlocked_inode_to_wb_begin(inode, &locked);
 		if (TestClearPageDirty(page)) {
-			dec_memcg_page_state(page, NR_FILE_DIRTY);
-			dec_node_page_state(page, NR_FILE_DIRTY);
+			dec_lruvec_page_state(page, NR_FILE_DIRTY);
 			dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 			dec_wb_stat(wb, WB_RECLAIMABLE);
 			ret = 1;
@@ -2759,8 +2756,7 @@ int test_clear_page_writeback(struct page *page)
 		ret = TestClearPageWriteback(page);
 	}
 	if (ret) {
-		dec_memcg_page_state(page, NR_WRITEBACK);
-		dec_node_page_state(page, NR_WRITEBACK);
+		dec_lruvec_page_state(page, NR_WRITEBACK);
 		dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		inc_node_page_state(page, NR_WRITTEN);
 	}
@@ -2814,8 +2810,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		ret = TestSetPageWriteback(page);
 	}
 	if (!ret) {
-		inc_memcg_page_state(page, NR_WRITEBACK);
-		inc_node_page_state(page, NR_WRITEBACK);
+		inc_lruvec_page_state(page, NR_WRITEBACK);
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
 	unlock_page_memcg(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index d405f0e0ee96..8ee842aa06ee 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1157,8 +1157,7 @@ void page_add_file_rmap(struct page *page, bool compound)
 		if (!atomic_inc_and_test(&page->_mapcount))
 			goto out;
 	}
-	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, nr);
-	mod_memcg_page_state(page, NR_FILE_MAPPED, nr);
+	__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
 out:
 	unlock_page_memcg(page);
 }
@@ -1193,12 +1192,11 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 	}
 
 	/*
-	 * We use the irq-unsafe __{inc|mod}_zone_page_state because
+	 * We use the irq-unsafe __{inc|mod}_lruvec_page_state because
 	 * these counters are not modified in interrupt context, and
 	 * pte lock(a spinlock) is held, which implies preemption disabled.
 	 */
-	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, -nr);
-	mod_memcg_page_state(page, NR_FILE_MAPPED, -nr);
+	__mod_lruvec_page_state(page, NR_FILE_MAPPED, -nr);
 
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);
diff --git a/mm/workingset.c b/mm/workingset.c
index b8c9ab678479..7119cd745ace 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -288,12 +288,10 @@ bool workingset_refault(void *shadow)
 	 */
 	refault_distance = (refault - eviction) & EVICTION_MASK;
 
-	inc_node_state(pgdat, WORKINGSET_REFAULT);
-	inc_memcg_state(memcg, WORKINGSET_REFAULT);
+	inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
 
 	if (refault_distance <= active_file) {
-		inc_node_state(pgdat, WORKINGSET_ACTIVATE);
-		inc_memcg_state(memcg, WORKINGSET_ACTIVATE);
+		inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
 		rcu_read_unlock();
 		return true;
 	}
@@ -474,8 +472,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	}
 	if (WARN_ON_ONCE(node->exceptional))
 		goto out_invalid;
-	inc_node_state(page_pgdat(virt_to_page(node)), WORKINGSET_NODERECLAIM);
-	inc_memcg_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
+	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
 	__radix_tree_delete_node(&mapping->page_tree, node,
 				 workingset_update_node, mapping);
 
-- 
2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 6/6] mm: memcontrol: account slab stats per lruvec
  2017-05-30 18:17 ` Johannes Weiner
@ 2017-05-30 18:17   ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Josef's redesign of the balancing between slab caches and the page
cache requires slab cache statistics at the lruvec level.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/slab.c | 12 ++++--------
 mm/slab.h | 18 +-----------------
 mm/slub.c |  4 ++--
 3 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index b55853399559..908908aa8250 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1425,11 +1425,9 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 
 	nr_pages = (1 << cachep->gfporder);
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		add_node_page_state(page_pgdat(page),
-			NR_SLAB_RECLAIMABLE, nr_pages);
+		mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, nr_pages);
 	else
-		add_node_page_state(page_pgdat(page),
-			NR_SLAB_UNRECLAIMABLE, nr_pages);
+		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, nr_pages);
 
 	__SetPageSlab(page);
 	/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
@@ -1459,11 +1457,9 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	kmemcheck_free_shadow(page, order);
 
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		sub_node_page_state(page_pgdat(page),
-				NR_SLAB_RECLAIMABLE, nr_freed);
+		mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, -nr_freed);
 	else
-		sub_node_page_state(page_pgdat(page),
-				NR_SLAB_UNRECLAIMABLE, nr_freed);
+		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, -nr_freed);
 
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
diff --git a/mm/slab.h b/mm/slab.h
index 7b84e3839dfe..6885e1192ec5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -274,22 +274,11 @@ static __always_inline int memcg_charge_slab(struct page *page,
 					     gfp_t gfp, int order,
 					     struct kmem_cache *s)
 {
-	int ret;
-
 	if (!memcg_kmem_enabled())
 		return 0;
 	if (is_root_cache(s))
 		return 0;
-
-	ret = memcg_kmem_charge_memcg(page, gfp, order, s->memcg_params.memcg);
-	if (ret)
-		return ret;
-
-	mod_memcg_page_state(page,
-			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			     1 << order);
-	return 0;
+	return memcg_kmem_charge_memcg(page, gfp, order, s->memcg_params.memcg);
 }
 
 static __always_inline void memcg_uncharge_slab(struct page *page, int order,
@@ -297,11 +286,6 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 {
 	if (!memcg_kmem_enabled())
 		return;
-
-	mod_memcg_page_state(page,
-			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			     -(1 << order));
 	memcg_kmem_uncharge(page, order);
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index 673e72698d9b..edaf102284e8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1615,7 +1615,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (!page)
 		return NULL;
 
-	mod_node_page_state(page_pgdat(page),
+	mod_lruvec_page_state(page,
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		1 << oo_order(oo));
@@ -1655,7 +1655,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 
 	kmemcheck_free_shadow(page, compound_order(page));
 
-	mod_node_page_state(page_pgdat(page),
+	mod_lruvec_page_state(page,
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		-pages);
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 6/6] mm: memcontrol: account slab stats per lruvec
@ 2017-05-30 18:17   ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Josef's redesign of the balancing between slab caches and the page
cache requires slab cache statistics at the lruvec level.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/slab.c | 12 ++++--------
 mm/slab.h | 18 +-----------------
 mm/slub.c |  4 ++--
 3 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index b55853399559..908908aa8250 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1425,11 +1425,9 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 
 	nr_pages = (1 << cachep->gfporder);
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		add_node_page_state(page_pgdat(page),
-			NR_SLAB_RECLAIMABLE, nr_pages);
+		mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, nr_pages);
 	else
-		add_node_page_state(page_pgdat(page),
-			NR_SLAB_UNRECLAIMABLE, nr_pages);
+		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, nr_pages);
 
 	__SetPageSlab(page);
 	/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
@@ -1459,11 +1457,9 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	kmemcheck_free_shadow(page, order);
 
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
-		sub_node_page_state(page_pgdat(page),
-				NR_SLAB_RECLAIMABLE, nr_freed);
+		mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, -nr_freed);
 	else
-		sub_node_page_state(page_pgdat(page),
-				NR_SLAB_UNRECLAIMABLE, nr_freed);
+		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, -nr_freed);
 
 	BUG_ON(!PageSlab(page));
 	__ClearPageSlabPfmemalloc(page);
diff --git a/mm/slab.h b/mm/slab.h
index 7b84e3839dfe..6885e1192ec5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -274,22 +274,11 @@ static __always_inline int memcg_charge_slab(struct page *page,
 					     gfp_t gfp, int order,
 					     struct kmem_cache *s)
 {
-	int ret;
-
 	if (!memcg_kmem_enabled())
 		return 0;
 	if (is_root_cache(s))
 		return 0;
-
-	ret = memcg_kmem_charge_memcg(page, gfp, order, s->memcg_params.memcg);
-	if (ret)
-		return ret;
-
-	mod_memcg_page_state(page,
-			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			     1 << order);
-	return 0;
+	return memcg_kmem_charge_memcg(page, gfp, order, s->memcg_params.memcg);
 }
 
 static __always_inline void memcg_uncharge_slab(struct page *page, int order,
@@ -297,11 +286,6 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 {
 	if (!memcg_kmem_enabled())
 		return;
-
-	mod_memcg_page_state(page,
-			     (s->flags & SLAB_RECLAIM_ACCOUNT) ?
-			     NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-			     -(1 << order));
 	memcg_kmem_uncharge(page, order);
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index 673e72698d9b..edaf102284e8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1615,7 +1615,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (!page)
 		return NULL;
 
-	mod_node_page_state(page_pgdat(page),
+	mod_lruvec_page_state(page,
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		1 << oo_order(oo));
@@ -1655,7 +1655,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 
 	kmemcheck_free_shadow(page, compound_order(page));
 
-	mod_node_page_state(page_pgdat(page),
+	mod_lruvec_page_state(page,
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		-pages);
-- 
2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/6] mm: vmscan: delete unused pgdat_reclaimable_pages()
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-05-30 21:50     ` Andrew Morton
  -1 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2017-05-30 21:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

On Tue, 30 May 2017 14:17:19 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> -extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);

Josef's "mm: make kswapd try harder to keep active pages in cache"
added a new callsite.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/6] mm: vmscan: delete unused pgdat_reclaimable_pages()
@ 2017-05-30 21:50     ` Andrew Morton
  0 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2017-05-30 21:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

On Tue, 30 May 2017 14:17:19 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> -extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);

Josef's "mm: make kswapd try harder to keep active pages in cache"
added a new callsite.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/6] mm: vmscan: delete unused pgdat_reclaimable_pages()
  2017-05-30 21:50     ` Andrew Morton
@ 2017-05-30 22:02       ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 22:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:50:29PM -0700, Andrew Morton wrote:
> On Tue, 30 May 2017 14:17:19 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > -extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
> 
> Josef's "mm: make kswapd try harder to keep active pages in cache"
> added a new callsite.

Ah yes, I forgot you pulled that in. The next version of his patch
shouldn't need it anymore.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/6] mm: vmscan: delete unused pgdat_reclaimable_pages()
@ 2017-05-30 22:02       ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-30 22:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:50:29PM -0700, Andrew Morton wrote:
> On Tue, 30 May 2017 14:17:19 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > -extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
> 
> Josef's "mm: make kswapd try harder to keep active pages in cache"
> added a new callsite.

Ah yes, I forgot you pulled that in. The next version of his patch
shouldn't need it anymore.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-05-31  9:12     ` Heiko Carstens
  -1 siblings, 0 replies; 62+ messages in thread
From: Heiko Carstens @ 2017-05-31  9:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team,
	linux-s390

On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> To re-implement slab cache vs. page cache balancing, we'll need the
> slab counters at the lruvec level, which, ever since lru reclaim was
> moved from the zone to the node, is the intersection of the node, not
> the zone, and the memcg.
> 
> We could retain the per-zone counters for when the page allocator
> dumps its memory information on failures, and have counters on both
> levels - which on all but NUMA node 0 is usually redundant. But let's
> keep it simple for now and just move them. If anybody complains we can
> restore the per-zone counters.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

This patch causes an early boot crash on s390 (linux-next as of today).
CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
further into this yet, maybe you have an idea?

Kernel BUG at 00000000002b0362 [verbose debug info unavailable]
addressing exception: 0005 ilc:3 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 0000000000d75d00 task.stack: 0000000000d60000
Krnl PSW : 0404200180000000 00000000002b0362 (mod_node_page_state+0x62/0x158)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000001 000000003d81f000 0000000000000000 0000000000000006
           0000000000000001 0000000000f29b52 0000000000000041 0000000000000000
           0000000000000007 0000000000000040 000000003fe81000 000003d100ffa000
           0000000000ee1cd0 0000000000979040 0000000000300abc 0000000000d63c90
Krnl Code: 00000000002b0350: e31003900004 lg %r1,912
           00000000002b0356: e320f0a80004 lg %r2,168(%r15)
          #00000000002b035c: e31120000090 llgc %r1,0(%r1,%r2)
          >00000000002b0362: b9060011  lgbr %r1,%r1
           00000000002b0366: e32003900004 lg %r2,912
           00000000002b036c: e3c280000090 llgc %r12,0(%r2,%r8)
           00000000002b0372: b90600ac  lgbr %r10,%r12
           00000000002b0376: b904002a  lgr %r2,%r10
Call Trace:
([<0000000000000000>]           (null))
 [<0000000000300abc>] new_slab+0x35c/0x628
 [<000000000030740c>] __kmem_cache_create+0x33c/0x638
 [<0000000000e99c0e>] create_boot_cache+0xae/0xe0
 [<0000000000e9e12c>] kmem_cache_init+0x5c/0x138
 [<0000000000e7999c>] start_kernel+0x24c/0x440
 [<0000000000100020>] _stext+0x20/0x80
Last Breaking-Event-Address:
 [<0000000000300ab6>] new_slab+0x356/0x628

Kernel panic - not syncing: Fatal exception: panic_on_oops

> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 5548f9686016..e57e06e6df4c 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -129,11 +129,11 @@ static ssize_t node_read_meminfo(struct device *dev,
>  		       nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
>  		       nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
>  		       nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE) +
> -				sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE)),
> +		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) +
> +			      node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
> +		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)),
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
> +		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
>  		       nid, K(node_page_state(pgdat, NR_ANON_THPS) *
>  				       HPAGE_PMD_NR),
>  		       nid, K(node_page_state(pgdat, NR_SHMEM_THPS) *
> @@ -141,7 +141,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>  		       nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) *
>  				       HPAGE_PMD_NR));
>  #else
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
> +		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)));
>  #endif
>  	n += hugetlb_report_node_meminfo(nid, buf + n);
>  	return n;
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ebaccd4e7d8c..eacadee83964 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -125,8 +125,6 @@ enum zone_stat_item {
>  	NR_ZONE_UNEVICTABLE,
>  	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
>  	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
> -	NR_SLAB_RECLAIMABLE,
> -	NR_SLAB_UNRECLAIMABLE,
>  	NR_PAGETABLE,		/* used for pagetables */
>  	NR_KERNEL_STACK_KB,	/* measured in KiB */
>  	/* Second 128 byte cacheline */
> @@ -152,6 +150,8 @@ enum node_stat_item {
>  	NR_INACTIVE_FILE,	/*  "     "     "   "       "         */
>  	NR_ACTIVE_FILE,		/*  "     "     "   "       "         */
>  	NR_UNEVICTABLE,		/*  "     "     "   "       "         */
> +	NR_SLAB_RECLAIMABLE,
> +	NR_SLAB_UNRECLAIMABLE,
>  	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
>  	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
>  	WORKINGSET_REFAULT,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f9e450c6b6e4..5f89cfaddc4b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4601,8 +4601,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  			" present:%lukB"
>  			" managed:%lukB"
>  			" mlocked:%lukB"
> -			" slab_reclaimable:%lukB"
> -			" slab_unreclaimable:%lukB"
>  			" kernel_stack:%lukB"
>  			" pagetables:%lukB"
>  			" bounce:%lukB"
> @@ -4624,8 +4622,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  			K(zone->present_pages),
>  			K(zone->managed_pages),
>  			K(zone_page_state(zone, NR_MLOCK)),
> -			K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
> -			K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
>  			zone_page_state(zone, NR_KERNEL_STACK_KB),
>  			K(zone_page_state(zone, NR_PAGETABLE)),
>  			K(zone_page_state(zone, NR_BOUNCE)),
> diff --git a/mm/slab.c b/mm/slab.c
> index 2a31ee3c5814..b55853399559 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1425,10 +1425,10 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>  
>  	nr_pages = (1 << cachep->gfporder);
>  	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
> -		add_zone_page_state(page_zone(page),
> +		add_node_page_state(page_pgdat(page),
>  			NR_SLAB_RECLAIMABLE, nr_pages);
>  	else
> -		add_zone_page_state(page_zone(page),
> +		add_node_page_state(page_pgdat(page),
>  			NR_SLAB_UNRECLAIMABLE, nr_pages);
>  
>  	__SetPageSlab(page);
> @@ -1459,10 +1459,10 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
>  	kmemcheck_free_shadow(page, order);
>  
>  	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
> -		sub_zone_page_state(page_zone(page),
> +		sub_node_page_state(page_pgdat(page),
>  				NR_SLAB_RECLAIMABLE, nr_freed);
>  	else
> -		sub_zone_page_state(page_zone(page),
> +		sub_node_page_state(page_pgdat(page),
>  				NR_SLAB_UNRECLAIMABLE, nr_freed);
>  
>  	BUG_ON(!PageSlab(page));
> diff --git a/mm/slub.c b/mm/slub.c
> index 57e5156f02be..673e72698d9b 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1615,7 +1615,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  	if (!page)
>  		return NULL;
>  
> -	mod_zone_page_state(page_zone(page),
> +	mod_node_page_state(page_pgdat(page),
>  		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
>  		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
>  		1 << oo_order(oo));
> @@ -1655,7 +1655,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
>  
>  	kmemcheck_free_shadow(page, compound_order(page));
>  
> -	mod_zone_page_state(page_zone(page),
> +	mod_node_page_state(page_pgdat(page),
>  		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
>  		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
>  		-pages);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c5f9d1673392..5d187ee618c0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3815,7 +3815,7 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
>  	 * unmapped file backed pages.
>  	 */
>  	if (node_pagecache_reclaimable(pgdat) <= pgdat->min_unmapped_pages &&
> -	    sum_zone_node_page_state(pgdat->node_id, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
> +	    node_page_state(pgdat, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
>  		return NODE_RECLAIM_FULL;
>  
>  	/*
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 76f73670200a..a64f1c764f17 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -928,8 +928,6 @@ const char * const vmstat_text[] = {
>  	"nr_zone_unevictable",
>  	"nr_zone_write_pending",
>  	"nr_mlock",
> -	"nr_slab_reclaimable",
> -	"nr_slab_unreclaimable",
>  	"nr_page_table_pages",
>  	"nr_kernel_stack",
>  	"nr_bounce",
> @@ -952,6 +950,8 @@ const char * const vmstat_text[] = {
>  	"nr_inactive_file",
>  	"nr_active_file",
>  	"nr_unevictable",
> +	"nr_slab_reclaimable",
> +	"nr_slab_unreclaimable",
>  	"nr_isolated_anon",
>  	"nr_isolated_file",
>  	"workingset_refault",
> -- 
> 2.12.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-05-31  9:12     ` Heiko Carstens
  0 siblings, 0 replies; 62+ messages in thread
From: Heiko Carstens @ 2017-05-31  9:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team,
	linux-s390

On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> To re-implement slab cache vs. page cache balancing, we'll need the
> slab counters at the lruvec level, which, ever since lru reclaim was
> moved from the zone to the node, is the intersection of the node, not
> the zone, and the memcg.
> 
> We could retain the per-zone counters for when the page allocator
> dumps its memory information on failures, and have counters on both
> levels - which on all but NUMA node 0 is usually redundant. But let's
> keep it simple for now and just move them. If anybody complains we can
> restore the per-zone counters.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

This patch causes an early boot crash on s390 (linux-next as of today).
CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
further into this yet, maybe you have an idea?

Kernel BUG at 00000000002b0362 [verbose debug info unavailable]
addressing exception: 0005 ilc:3 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 0000000000d75d00 task.stack: 0000000000d60000
Krnl PSW : 0404200180000000 00000000002b0362 (mod_node_page_state+0x62/0x158)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000001 000000003d81f000 0000000000000000 0000000000000006
           0000000000000001 0000000000f29b52 0000000000000041 0000000000000000
           0000000000000007 0000000000000040 000000003fe81000 000003d100ffa000
           0000000000ee1cd0 0000000000979040 0000000000300abc 0000000000d63c90
Krnl Code: 00000000002b0350: e31003900004 lg %r1,912
           00000000002b0356: e320f0a80004 lg %r2,168(%r15)
          #00000000002b035c: e31120000090 llgc %r1,0(%r1,%r2)
          >00000000002b0362: b9060011  lgbr %r1,%r1
           00000000002b0366: e32003900004 lg %r2,912
           00000000002b036c: e3c280000090 llgc %r12,0(%r2,%r8)
           00000000002b0372: b90600ac  lgbr %r10,%r12
           00000000002b0376: b904002a  lgr %r2,%r10
Call Trace:
([<0000000000000000>]           (null))
 [<0000000000300abc>] new_slab+0x35c/0x628
 [<000000000030740c>] __kmem_cache_create+0x33c/0x638
 [<0000000000e99c0e>] create_boot_cache+0xae/0xe0
 [<0000000000e9e12c>] kmem_cache_init+0x5c/0x138
 [<0000000000e7999c>] start_kernel+0x24c/0x440
 [<0000000000100020>] _stext+0x20/0x80
Last Breaking-Event-Address:
 [<0000000000300ab6>] new_slab+0x356/0x628

Kernel panic - not syncing: Fatal exception: panic_on_oops

> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 5548f9686016..e57e06e6df4c 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -129,11 +129,11 @@ static ssize_t node_read_meminfo(struct device *dev,
>  		       nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
>  		       nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
>  		       nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE) +
> -				sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE)),
> +		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) +
> +			      node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
> +		       nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)),
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
> +		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
>  		       nid, K(node_page_state(pgdat, NR_ANON_THPS) *
>  				       HPAGE_PMD_NR),
>  		       nid, K(node_page_state(pgdat, NR_SHMEM_THPS) *
> @@ -141,7 +141,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>  		       nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) *
>  				       HPAGE_PMD_NR));
>  #else
> -		       nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
> +		       nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)));
>  #endif
>  	n += hugetlb_report_node_meminfo(nid, buf + n);
>  	return n;
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ebaccd4e7d8c..eacadee83964 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -125,8 +125,6 @@ enum zone_stat_item {
>  	NR_ZONE_UNEVICTABLE,
>  	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
>  	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
> -	NR_SLAB_RECLAIMABLE,
> -	NR_SLAB_UNRECLAIMABLE,
>  	NR_PAGETABLE,		/* used for pagetables */
>  	NR_KERNEL_STACK_KB,	/* measured in KiB */
>  	/* Second 128 byte cacheline */
> @@ -152,6 +150,8 @@ enum node_stat_item {
>  	NR_INACTIVE_FILE,	/*  "     "     "   "       "         */
>  	NR_ACTIVE_FILE,		/*  "     "     "   "       "         */
>  	NR_UNEVICTABLE,		/*  "     "     "   "       "         */
> +	NR_SLAB_RECLAIMABLE,
> +	NR_SLAB_UNRECLAIMABLE,
>  	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
>  	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
>  	WORKINGSET_REFAULT,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f9e450c6b6e4..5f89cfaddc4b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4601,8 +4601,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  			" present:%lukB"
>  			" managed:%lukB"
>  			" mlocked:%lukB"
> -			" slab_reclaimable:%lukB"
> -			" slab_unreclaimable:%lukB"
>  			" kernel_stack:%lukB"
>  			" pagetables:%lukB"
>  			" bounce:%lukB"
> @@ -4624,8 +4622,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  			K(zone->present_pages),
>  			K(zone->managed_pages),
>  			K(zone_page_state(zone, NR_MLOCK)),
> -			K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
> -			K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
>  			zone_page_state(zone, NR_KERNEL_STACK_KB),
>  			K(zone_page_state(zone, NR_PAGETABLE)),
>  			K(zone_page_state(zone, NR_BOUNCE)),
> diff --git a/mm/slab.c b/mm/slab.c
> index 2a31ee3c5814..b55853399559 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1425,10 +1425,10 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>  
>  	nr_pages = (1 << cachep->gfporder);
>  	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
> -		add_zone_page_state(page_zone(page),
> +		add_node_page_state(page_pgdat(page),
>  			NR_SLAB_RECLAIMABLE, nr_pages);
>  	else
> -		add_zone_page_state(page_zone(page),
> +		add_node_page_state(page_pgdat(page),
>  			NR_SLAB_UNRECLAIMABLE, nr_pages);
>  
>  	__SetPageSlab(page);
> @@ -1459,10 +1459,10 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
>  	kmemcheck_free_shadow(page, order);
>  
>  	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
> -		sub_zone_page_state(page_zone(page),
> +		sub_node_page_state(page_pgdat(page),
>  				NR_SLAB_RECLAIMABLE, nr_freed);
>  	else
> -		sub_zone_page_state(page_zone(page),
> +		sub_node_page_state(page_pgdat(page),
>  				NR_SLAB_UNRECLAIMABLE, nr_freed);
>  
>  	BUG_ON(!PageSlab(page));
> diff --git a/mm/slub.c b/mm/slub.c
> index 57e5156f02be..673e72698d9b 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1615,7 +1615,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  	if (!page)
>  		return NULL;
>  
> -	mod_zone_page_state(page_zone(page),
> +	mod_node_page_state(page_pgdat(page),
>  		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
>  		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
>  		1 << oo_order(oo));
> @@ -1655,7 +1655,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
>  
>  	kmemcheck_free_shadow(page, compound_order(page));
>  
> -	mod_zone_page_state(page_zone(page),
> +	mod_node_page_state(page_pgdat(page),
>  		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
>  		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
>  		-pages);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c5f9d1673392..5d187ee618c0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3815,7 +3815,7 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
>  	 * unmapped file backed pages.
>  	 */
>  	if (node_pagecache_reclaimable(pgdat) <= pgdat->min_unmapped_pages &&
> -	    sum_zone_node_page_state(pgdat->node_id, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
> +	    node_page_state(pgdat, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
>  		return NODE_RECLAIM_FULL;
>  
>  	/*
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 76f73670200a..a64f1c764f17 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -928,8 +928,6 @@ const char * const vmstat_text[] = {
>  	"nr_zone_unevictable",
>  	"nr_zone_write_pending",
>  	"nr_mlock",
> -	"nr_slab_reclaimable",
> -	"nr_slab_unreclaimable",
>  	"nr_page_table_pages",
>  	"nr_kernel_stack",
>  	"nr_bounce",
> @@ -952,6 +950,8 @@ const char * const vmstat_text[] = {
>  	"nr_inactive_file",
>  	"nr_active_file",
>  	"nr_unevictable",
> +	"nr_slab_reclaimable",
> +	"nr_slab_unreclaimable",
>  	"nr_isolated_anon",
>  	"nr_isolated_file",
>  	"workingset_refault",
> -- 
> 2.12.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-05-31  9:12     ` Heiko Carstens
@ 2017-05-31 11:39       ` Heiko Carstens
  -1 siblings, 0 replies; 62+ messages in thread
From: Heiko Carstens @ 2017-05-31 11:39 UTC (permalink / raw)
  To: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, linux-s390

On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
> On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> > To re-implement slab cache vs. page cache balancing, we'll need the
> > slab counters at the lruvec level, which, ever since lru reclaim was
> > moved from the zone to the node, is the intersection of the node, not
> > the zone, and the memcg.
> > 
> > We could retain the per-zone counters for when the page allocator
> > dumps its memory information on failures, and have counters on both
> > levels - which on all but NUMA node 0 is usually redundant. But let's
> > keep it simple for now and just move them. If anybody complains we can
> > restore the per-zone counters.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> This patch causes an early boot crash on s390 (linux-next as of today).
> CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
> further into this yet, maybe you have an idea?
> 
> Kernel BUG at 00000000002b0362 [verbose debug info unavailable]
> addressing exception: 0005 ilc:3 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16
> Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
> task: 0000000000d75d00 task.stack: 0000000000d60000
> Krnl PSW : 0404200180000000 00000000002b0362 (mod_node_page_state+0x62/0x158)
>            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000001 000000003d81f000 0000000000000000 0000000000000006
>            0000000000000001 0000000000f29b52 0000000000000041 0000000000000000
>            0000000000000007 0000000000000040 000000003fe81000 000003d100ffa000
>            0000000000ee1cd0 0000000000979040 0000000000300abc 0000000000d63c90
> Krnl Code: 00000000002b0350: e31003900004 lg %r1,912
>            00000000002b0356: e320f0a80004 lg %r2,168(%r15)
>           #00000000002b035c: e31120000090 llgc %r1,0(%r1,%r2)
>           >00000000002b0362: b9060011  lgbr %r1,%r1
>            00000000002b0366: e32003900004 lg %r2,912
>            00000000002b036c: e3c280000090 llgc %r12,0(%r2,%r8)
>            00000000002b0372: b90600ac  lgbr %r10,%r12
>            00000000002b0376: b904002a  lgr %r2,%r10
> Call Trace:
> ([<0000000000000000>]           (null))
>  [<0000000000300abc>] new_slab+0x35c/0x628
>  [<000000000030740c>] __kmem_cache_create+0x33c/0x638
>  [<0000000000e99c0e>] create_boot_cache+0xae/0xe0
>  [<0000000000e9e12c>] kmem_cache_init+0x5c/0x138
>  [<0000000000e7999c>] start_kernel+0x24c/0x440
>  [<0000000000100020>] _stext+0x20/0x80
> Last Breaking-Event-Address:
>  [<0000000000300ab6>] new_slab+0x356/0x628

FWIW, it looks like your patch only triggers a bug that was introduced with
a different change that somehow messes around with the pages used to setup
the kernel page tables. I'll look into this.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-05-31 11:39       ` Heiko Carstens
  0 siblings, 0 replies; 62+ messages in thread
From: Heiko Carstens @ 2017-05-31 11:39 UTC (permalink / raw)
  To: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, linux-s390

On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
> On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> > To re-implement slab cache vs. page cache balancing, we'll need the
> > slab counters at the lruvec level, which, ever since lru reclaim was
> > moved from the zone to the node, is the intersection of the node, not
> > the zone, and the memcg.
> > 
> > We could retain the per-zone counters for when the page allocator
> > dumps its memory information on failures, and have counters on both
> > levels - which on all but NUMA node 0 is usually redundant. But let's
> > keep it simple for now and just move them. If anybody complains we can
> > restore the per-zone counters.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> This patch causes an early boot crash on s390 (linux-next as of today).
> CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
> further into this yet, maybe you have an idea?
> 
> Kernel BUG at 00000000002b0362 [verbose debug info unavailable]
> addressing exception: 0005 ilc:3 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16
> Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
> task: 0000000000d75d00 task.stack: 0000000000d60000
> Krnl PSW : 0404200180000000 00000000002b0362 (mod_node_page_state+0x62/0x158)
>            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000001 000000003d81f000 0000000000000000 0000000000000006
>            0000000000000001 0000000000f29b52 0000000000000041 0000000000000000
>            0000000000000007 0000000000000040 000000003fe81000 000003d100ffa000
>            0000000000ee1cd0 0000000000979040 0000000000300abc 0000000000d63c90
> Krnl Code: 00000000002b0350: e31003900004 lg %r1,912
>            00000000002b0356: e320f0a80004 lg %r2,168(%r15)
>           #00000000002b035c: e31120000090 llgc %r1,0(%r1,%r2)
>           >00000000002b0362: b9060011  lgbr %r1,%r1
>            00000000002b0366: e32003900004 lg %r2,912
>            00000000002b036c: e3c280000090 llgc %r12,0(%r2,%r8)
>            00000000002b0372: b90600ac  lgbr %r10,%r12
>            00000000002b0376: b904002a  lgr %r2,%r10
> Call Trace:
> ([<0000000000000000>]           (null))
>  [<0000000000300abc>] new_slab+0x35c/0x628
>  [<000000000030740c>] __kmem_cache_create+0x33c/0x638
>  [<0000000000e99c0e>] create_boot_cache+0xae/0xe0
>  [<0000000000e9e12c>] kmem_cache_init+0x5c/0x138
>  [<0000000000e7999c>] start_kernel+0x24c/0x440
>  [<0000000000100020>] _stext+0x20/0x80
> Last Breaking-Event-Address:
>  [<0000000000300ab6>] new_slab+0x356/0x628

FWIW, it looks like your patch only triggers a bug that was introduced with
a different change that somehow messes around with the pages used to setup
the kernel page tables. I'll look into this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-05-31 11:39       ` Heiko Carstens
@ 2017-05-31 17:11         ` Yury Norov
  -1 siblings, 0 replies; 62+ messages in thread
From: Yury Norov @ 2017-05-31 17:11 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, linux-s390

On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote:
> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> > > To re-implement slab cache vs. page cache balancing, we'll need the
> > > slab counters at the lruvec level, which, ever since lru reclaim was
> > > moved from the zone to the node, is the intersection of the node, not
> > > the zone, and the memcg.
> > > 
> > > We could retain the per-zone counters for when the page allocator
> > > dumps its memory information on failures, and have counters on both
> > > levels - which on all but NUMA node 0 is usually redundant. But let's
> > > keep it simple for now and just move them. If anybody complains we can
> > > restore the per-zone counters.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > This patch causes an early boot crash on s390 (linux-next as of today).
> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
> > further into this yet, maybe you have an idea?

The same on arm64.

Yury

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-05-31 17:11         ` Yury Norov
  0 siblings, 0 replies; 62+ messages in thread
From: Yury Norov @ 2017-05-31 17:11 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, linux-s390

On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote:
> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> > > To re-implement slab cache vs. page cache balancing, we'll need the
> > > slab counters at the lruvec level, which, ever since lru reclaim was
> > > moved from the zone to the node, is the intersection of the node, not
> > > the zone, and the memcg.
> > > 
> > > We could retain the per-zone counters for when the page allocator
> > > dumps its memory information on failures, and have counters on both
> > > levels - which on all but NUMA node 0 is usually redundant. But let's
> > > keep it simple for now and just move them. If anybody complains we can
> > > restore the per-zone counters.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > This patch causes an early boot crash on s390 (linux-next as of today).
> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
> > further into this yet, maybe you have an idea?

The same on arm64.

Yury

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-05-31 17:14     ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-31 17:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Andrew, the 0day tester found a crash with this when special pages get
faulted. They're not charged to any cgroup and we'll deref NULL.

Can you include the following fix on top of this patch please? Thanks!

---

>From 0ea9bdb1b425a6c943a65c02164d4ca51815fdc4 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Wed, 31 May 2017 12:57:28 -0400
Subject: [PATCH] mm: memcontrol: per-lruvec stats infrastructure fix

Fix the following crash in the new cgroup stat keeping code:

Freeing unused kernel memory: 856K
Write protecting the kernel read-only data: 8192k
Freeing unused kernel memory: 1104K
Freeing unused kernel memory: 588K
page:ffffea000005d8c0 count:2 mapcount:1 mapping:          (null) index:0x0
flags: 0x800000000000801(locked|reserved)
raw: 0800000000000801 0000000000000000 0000000000000000 0000000200000000
raw: ffffea000005d8e0 ffffea000005d8e0 0000000000000000 0000000000000000
page dumped because: not cgrouped, will crash
BUG: unable to handle kernel NULL pointer dereference at 00000000000004d8
IP: page_add_file_rmap+0x56/0xf0
PGD 0
P4D 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 1 Comm: init Not tainted 4.12.0-rc2-00065-g390160f076be-dirty #326
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
task: ffff88007d380000 task.stack: ffffc9000031c000
RIP: 0010:page_add_file_rmap+0x56/0xf0
RSP: 0000:ffffc9000031fd88 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffea000005d8c0 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007ffde000
RBP: ffffc9000031fd98 R08: 0000000000000003 R09: 0000000000000000
R10: ffffc9000031fd18 R11: 0000000000000000 R12: ffff88007ffdfab8
R13: ffffea000005d8c0 R14: ffff88007c76d508 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000004d8 CR3: 000000007c76c000 CR4: 00000000000006b0
Call Trace:
 alloc_set_pte+0xb5/0x2f0
 finish_fault+0x2b/0x50
 __handle_mm_fault+0x3e5/0xb90
 handle_mm_fault+0x284/0x340
 __do_page_fault+0x1fb/0x410
 do_page_fault+0xc/0x10
 page_fault+0x22/0x30

This is a special page being faulted, and these will never be charged
to a cgroup. Assume the root cgroup for uncharged pages to fix this.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a282eb2a6cc3..bea6f08e9e16 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -585,18 +585,26 @@ static inline void mod_lruvec_state(struct lruvec *lruvec,
 static inline void __mod_lruvec_page_state(struct page *page,
 					   enum node_stat_item idx, int val)
 {
+	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 
-	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	/* Special pages in the VM aren't charged, use root */
+	memcg = page->mem_cgroup ? : root_mem_cgroup;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
 	__mod_lruvec_state(lruvec, idx, val);
 }
 
 static inline void mod_lruvec_page_state(struct page *page,
 					 enum node_stat_item idx, int val)
 {
+	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 
-	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	/* Special pages in the VM aren't charged, use root */
+	memcg = page->mem_cgroup ? : root_mem_cgroup;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
 	mod_lruvec_state(lruvec, idx, val);
 }
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-05-31 17:14     ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-05-31 17:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team

Andrew, the 0day tester found a crash with this when special pages get
faulted. They're not charged to any cgroup and we'll deref NULL.

Can you include the following fix on top of this patch please? Thanks!

---

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-05-31 17:14     ` Johannes Weiner
@ 2017-05-31 18:18       ` Andrew Morton
  -1 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2017-05-31 18:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team, Tony Lindgren,
	Russell King, Yury Norov, Stephen Rothwell

On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Andrew, the 0day tester found a crash with this when special pages get
> faulted. They're not charged to any cgroup and we'll deref NULL.
> 
> Can you include the following fix on top of this patch please? Thanks!

OK.  But this won't fix the init ordering crash which the arm folks are
seeing?

I'm wondering if we should ask Stephen to drop

mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
mm-memcontrol-use-the-node-native-slab-memory-counters.patch
mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
mm-memcontrol-per-lruvec-stats-infrastructure.patch
mm-memcontrol-account-slab-stats-per-lruvec.patch

until that is sorted?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-05-31 18:18       ` Andrew Morton
  0 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2017-05-31 18:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team, Tony Lindgren,
	Russell King, Yury Norov, Stephen Rothwell

On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Andrew, the 0day tester found a crash with this when special pages get
> faulted. They're not charged to any cgroup and we'll deref NULL.
> 
> Can you include the following fix on top of this patch please? Thanks!

OK.  But this won't fix the init ordering crash which the arm folks are
seeing?

I'm wondering if we should ask Stephen to drop

mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
mm-memcontrol-use-the-node-native-slab-memory-counters.patch
mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
mm-memcontrol-per-lruvec-stats-infrastructure.patch
mm-memcontrol-account-slab-stats-per-lruvec.patch

until that is sorted?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-05-31 18:18       ` Andrew Morton
@ 2017-05-31 19:02         ` Tony Lindgren
  -1 siblings, 0 replies; 62+ messages in thread
From: Tony Lindgren @ 2017-05-31 19:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team,
	Russell King, Yury Norov, Stephen Rothwell

* Andrew Morton <akpm@linux-foundation.org> [170531 11:21]:
> On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Andrew, the 0day tester found a crash with this when special pages get
> > faulted. They're not charged to any cgroup and we'll deref NULL.
> > 
> > Can you include the following fix on top of this patch please? Thanks!
> 
> OK.  But this won't fix the init ordering crash which the arm folks are
> seeing?

That's correct, the ordering crash is a separate problem.

> I'm wondering if we should ask Stephen to drop
> 
> mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> mm-memcontrol-per-lruvec-stats-infrastructure.patch
> mm-memcontrol-account-slab-stats-per-lruvec.patch
> 
> until that is sorted?

Seems like a good idea.

Regards,

Tony

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-05-31 19:02         ` Tony Lindgren
  0 siblings, 0 replies; 62+ messages in thread
From: Tony Lindgren @ 2017-05-31 19:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team,
	Russell King, Yury Norov, Stephen Rothwell

* Andrew Morton <akpm@linux-foundation.org> [170531 11:21]:
> On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Andrew, the 0day tester found a crash with this when special pages get
> > faulted. They're not charged to any cgroup and we'll deref NULL.
> > 
> > Can you include the following fix on top of this patch please? Thanks!
> 
> OK.  But this won't fix the init ordering crash which the arm folks are
> seeing?

That's correct, the ordering crash is a separate problem.

> I'm wondering if we should ask Stephen to drop
> 
> mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> mm-memcontrol-per-lruvec-stats-infrastructure.patch
> mm-memcontrol-account-slab-stats-per-lruvec.patch
> 
> until that is sorted?

Seems like a good idea.

Regards,

Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-05-31 19:02         ` Tony Lindgren
@ 2017-05-31 22:03           ` Stephen Rothwell
  -1 siblings, 0 replies; 62+ messages in thread
From: Stephen Rothwell @ 2017-05-31 22:03 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Andrew Morton, Johannes Weiner, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, Russell King, Yury Norov

Hi Tony, Andrew,

On Wed, 31 May 2017 12:02:10 -0700 Tony Lindgren <tony@atomide.com> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> [170531 11:21]:
> > On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> >   
> > > Andrew, the 0day tester found a crash with this when special pages get
> > > faulted. They're not charged to any cgroup and we'll deref NULL.
> > > 
> > > Can you include the following fix on top of this patch please? Thanks!  
> > 
> > OK.  But this won't fix the init ordering crash which the arm folks are
> > seeing?  
> 
> That's correct, the ordering crash is a separate problem.
> 
> > I'm wondering if we should ask Stephen to drop
> > 
> > mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> > mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> > mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> > mm-memcontrol-per-lruvec-stats-infrastructure.patch
> > mm-memcontrol-account-slab-stats-per-lruvec.patch
> > 
> > until that is sorted?  
> 
> Seems like a good idea.

OK, I have removed them.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-05-31 22:03           ` Stephen Rothwell
  0 siblings, 0 replies; 62+ messages in thread
From: Stephen Rothwell @ 2017-05-31 22:03 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Andrew Morton, Johannes Weiner, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, Russell King, Yury Norov

Hi Tony, Andrew,

On Wed, 31 May 2017 12:02:10 -0700 Tony Lindgren <tony@atomide.com> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> [170531 11:21]:
> > On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> >   
> > > Andrew, the 0day tester found a crash with this when special pages get
> > > faulted. They're not charged to any cgroup and we'll deref NULL.
> > > 
> > > Can you include the following fix on top of this patch please? Thanks!  
> > 
> > OK.  But this won't fix the init ordering crash which the arm folks are
> > seeing?  
> 
> That's correct, the ordering crash is a separate problem.
> 
> > I'm wondering if we should ask Stephen to drop
> > 
> > mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> > mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> > mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> > mm-memcontrol-per-lruvec-stats-infrastructure.patch
> > mm-memcontrol-account-slab-stats-per-lruvec.patch
> > 
> > until that is sorted?  
> 
> Seems like a good idea.

OK, I have removed them.

-- 
Cheers,
Stephen Rothwell

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-05-31 18:18       ` Andrew Morton
@ 2017-06-01  1:44         ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-01  1:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team, Tony Lindgren,
	Russell King, Yury Norov, Stephen Rothwell

On Wed, May 31, 2017 at 11:18:21AM -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Andrew, the 0day tester found a crash with this when special pages get
> > faulted. They're not charged to any cgroup and we'll deref NULL.
> > 
> > Can you include the following fix on top of this patch please? Thanks!
> 
> OK.  But this won't fix the init ordering crash which the arm folks are
> seeing?
> 
> I'm wondering if we should ask Stephen to drop
> 
> mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> mm-memcontrol-per-lruvec-stats-infrastructure.patch
> mm-memcontrol-account-slab-stats-per-lruvec.patch

Sorry about the wreckage.

Dropping these makes sense to me for the time being.

I'll fix up the init ordering issue.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-06-01  1:44         ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-01  1:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel,
	linux-mm, cgroups, linux-kernel, kernel-team, Tony Lindgren,
	Russell King, Yury Norov, Stephen Rothwell

On Wed, May 31, 2017 at 11:18:21AM -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Andrew, the 0day tester found a crash with this when special pages get
> > faulted. They're not charged to any cgroup and we'll deref NULL.
> > 
> > Can you include the following fix on top of this patch please? Thanks!
> 
> OK.  But this won't fix the init ordering crash which the arm folks are
> seeing?
> 
> I'm wondering if we should ask Stephen to drop
> 
> mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> mm-memcontrol-per-lruvec-stats-infrastructure.patch
> mm-memcontrol-account-slab-stats-per-lruvec.patch

Sorry about the wreckage.

Dropping these makes sense to me for the time being.

I'll fix up the init ordering issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-05-31 17:11         ` Yury Norov
@ 2017-06-01 10:07           ` Michael Ellerman
  -1 siblings, 0 replies; 62+ messages in thread
From: Michael Ellerman @ 2017-06-01 10:07 UTC (permalink / raw)
  To: Yury Norov, Heiko Carstens
  Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, linux-s390

Yury Norov <ynorov@caviumnetworks.com> writes:

> On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote:
>> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
>> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
>> > > To re-implement slab cache vs. page cache balancing, we'll need the
>> > > slab counters at the lruvec level, which, ever since lru reclaim was
>> > > moved from the zone to the node, is the intersection of the node, not
>> > > the zone, and the memcg.
>> > > 
>> > > We could retain the per-zone counters for when the page allocator
>> > > dumps its memory information on failures, and have counters on both
>> > > levels - which on all but NUMA node 0 is usually redundant. But let's
>> > > keep it simple for now and just move them. If anybody complains we can
>> > > restore the per-zone counters.
>> > > 
>> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>> > 
>> > This patch causes an early boot crash on s390 (linux-next as of today).
>> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
>> > further into this yet, maybe you have an idea?
>
> The same on arm64.

And powerpc.

cheers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-06-01 10:07           ` Michael Ellerman
  0 siblings, 0 replies; 62+ messages in thread
From: Michael Ellerman @ 2017-06-01 10:07 UTC (permalink / raw)
  To: Yury Norov, Heiko Carstens
  Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel,
	kernel-team, linux-s390

Yury Norov <ynorov@caviumnetworks.com> writes:

> On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote:
>> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
>> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
>> > > To re-implement slab cache vs. page cache balancing, we'll need the
>> > > slab counters at the lruvec level, which, ever since lru reclaim was
>> > > moved from the zone to the node, is the intersection of the node, not
>> > > the zone, and the memcg.
>> > > 
>> > > We could retain the per-zone counters for when the page allocator
>> > > dumps its memory information on failures, and have counters on both
>> > > levels - which on all but NUMA node 0 is usually redundant. But let's
>> > > keep it simple for now and just move them. If anybody complains we can
>> > > restore the per-zone counters.
>> > > 
>> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>> > 
>> > This patch causes an early boot crash on s390 (linux-next as of today).
>> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
>> > further into this yet, maybe you have an idea?
>
> The same on arm64.

And powerpc.

cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: memcontrol: use the node-native slab memory counters
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-06-03 17:39     ` Vladimir Davydov
  -1 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:21PM -0400, Johannes Weiner wrote:
> Now that the slab counters are moved from the zone to the node level
> we can drop the private memcg node stats and use the official ones.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 2 --
>  mm/memcontrol.c            | 8 ++++----
>  mm/slab.h                  | 4 ++--
>  3 files changed, 6 insertions(+), 8 deletions(-)

Not sure if moving slab stats from zone to node is such a good idea,
because they may be useful for identifying the reason of OOM, especially
on 32 bit hosts, but provided the previous patch is accepted, this one
looks good to me.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 3/6] mm: memcontrol: use the node-native slab memory counters
@ 2017-06-03 17:39     ` Vladimir Davydov
  0 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:21PM -0400, Johannes Weiner wrote:
> Now that the slab counters are moved from the zone to the node level
> we can drop the private memcg node stats and use the official ones.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 2 --
>  mm/memcontrol.c            | 8 ++++----
>  mm/slab.h                  | 4 ++--
>  3 files changed, 6 insertions(+), 8 deletions(-)

Not sure if moving slab stats from zone to node is such a good idea,
because they may be useful for identifying the reason of OOM, especially
on 32 bit hosts, but provided the previous patch is accepted, this one
looks good to me.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] mm: memcontrol: use generic mod_memcg_page_state for kmem pages
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-06-03 17:40     ` Vladimir Davydov
  -1 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:40 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:22PM -0400, Johannes Weiner wrote:
> The kmem-specific functions do the same thing. Switch and drop.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 17 -----------------
>  kernel/fork.c              |  8 ++++----
>  mm/slab.h                  | 16 ++++++++--------
>  3 files changed, 12 insertions(+), 29 deletions(-)

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] mm: memcontrol: use generic mod_memcg_page_state for kmem pages
@ 2017-06-03 17:40     ` Vladimir Davydov
  0 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:40 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:22PM -0400, Johannes Weiner wrote:
> The kmem-specific functions do the same thing. Switch and drop.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 17 -----------------
>  kernel/fork.c              |  8 ++++----
>  mm/slab.h                  | 16 ++++++++--------
>  3 files changed, 12 insertions(+), 29 deletions(-)

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-06-03 17:50     ` Vladimir Davydov
  -1 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:23PM -0400, Johannes Weiner wrote:
> lruvecs are at the intersection of the NUMA node and memcg, which is
> the scope for most paging activity.
> 
> Introduce a convenient accounting infrastructure that maintains
> statistics per node, per memcg, and the lruvec itself.
> 
> Then convert over accounting sites for statistics that are already
> tracked in both nodes and memcgs and can be easily switched.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
>  include/linux/vmstat.h     |   1 -
>  mm/memcontrol.c            |   6 ++
>  mm/page-writeback.c        |  15 +--
>  mm/rmap.c                  |   8 +-
>  mm/workingset.c            |   9 +-
>  6 files changed, 225 insertions(+), 52 deletions(-)
> 
...
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9c68a40c83e3..e37908606c0f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	if (!pn)
>  		return 1;
>  
> +	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
> +	if (!pn->lruvec_stat) {
> +		kfree(pn);
> +		return 1;
> +	}
> +
>  	lruvec_init(&pn->lruvec);
>  	pn->usage_in_excess = 0;
>  	pn->on_tree = false;

I don't see the matching free_percpu() anywhere, forget to patch
free_mem_cgroup_per_node_info()?

Other than that and with the follow-up fix applied, this patch
is good IMO.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-06-03 17:50     ` Vladimir Davydov
  0 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:23PM -0400, Johannes Weiner wrote:
> lruvecs are at the intersection of the NUMA node and memcg, which is
> the scope for most paging activity.
> 
> Introduce a convenient accounting infrastructure that maintains
> statistics per node, per memcg, and the lruvec itself.
> 
> Then convert over accounting sites for statistics that are already
> tracked in both nodes and memcgs and can be easily switched.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
>  include/linux/vmstat.h     |   1 -
>  mm/memcontrol.c            |   6 ++
>  mm/page-writeback.c        |  15 +--
>  mm/rmap.c                  |   8 +-
>  mm/workingset.c            |   9 +-
>  6 files changed, 225 insertions(+), 52 deletions(-)
> 
...
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9c68a40c83e3..e37908606c0f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	if (!pn)
>  		return 1;
>  
> +	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
> +	if (!pn->lruvec_stat) {
> +		kfree(pn);
> +		return 1;
> +	}
> +
>  	lruvec_init(&pn->lruvec);
>  	pn->usage_in_excess = 0;
>  	pn->on_tree = false;

I don't see the matching free_percpu() anywhere, forget to patch
free_mem_cgroup_per_node_info()?

Other than that and with the follow-up fix applied, this patch
is good IMO.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: memcontrol: account slab stats per lruvec
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-06-03 17:54     ` Vladimir Davydov
  -1 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:24PM -0400, Johannes Weiner wrote:
> Josef's redesign of the balancing between slab caches and the page
> cache requires slab cache statistics at the lruvec level.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/slab.c | 12 ++++--------
>  mm/slab.h | 18 +-----------------
>  mm/slub.c |  4 ++--
>  3 files changed, 7 insertions(+), 27 deletions(-)

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] mm: memcontrol: account slab stats per lruvec
@ 2017-06-03 17:54     ` Vladimir Davydov
  0 siblings, 0 replies; 62+ messages in thread
From: Vladimir Davydov @ 2017-06-03 17:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:24PM -0400, Johannes Weiner wrote:
> Josef's redesign of the balancing between slab caches and the page
> cache requires slab cache statistics at the lruvec level.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/slab.c | 12 ++++--------
>  mm/slab.h | 18 +-----------------
>  mm/slub.c |  4 ++--
>  3 files changed, 7 insertions(+), 27 deletions(-)

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [6/6] mm: memcontrol: account slab stats per lruvec
  2017-05-30 18:17   ` Johannes Weiner
@ 2017-06-05 16:52     ` Guenter Roeck
  -1 siblings, 0 replies; 62+ messages in thread
From: Guenter Roeck @ 2017-06-05 16:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:24PM -0400, Johannes Weiner wrote:
> Josef's redesign of the balancing between slab caches and the page
> cache requires slab cache statistics at the lruvec level.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

Presumably this is already known, but a remarkable number of crashes
in next-20170605 bisects to this patch.

Guenter

---
Qemu test results:
	total: 122 pass: 51 fail: 71
Failed tests:
	arm:vexpress-a9:vexpress_defconfig:vexpress-v2p-ca9
	arm:vexpress-a15:vexpress_defconfig:vexpress-v2p-ca15-tc1
	arm:kzm:imx_v6_v7_defconfig
	arm:sabrelite:imx_v6_v7_defconfig:imx6dl-sabrelite
	arm:beagle:multi_v7_defconfig:omap3-beagle
	arm:beaglexm:multi_v7_defconfig:omap3-beagle-xm
	arm:overo:multi_v7_defconfig:omap3-overo-tobi
	arm:sabrelite:multi_v7_defconfig:imx6dl-sabrelite
	arm:vexpress-a9:multi_v7_defconfig:vexpress-v2p-ca9
	arm:vexpress-a15:multi_v7_defconfig:vexpress-v2p-ca15-tc1
	arm:vexpress-a15-a7:multi_v7_defconfig:vexpress-v2p-ca15_a7
	arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zc702
	arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zc706
	arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zed
	arm:midway:multi_v7_defconfig:ecx-2000
	arm:smdkc210:multi_v7_defconfig:exynos4210-smdkv310
	arm:smdkc210:exynos_defconfig:exynos4210-smdkv310
	arm:beagle:omap2plus_defconfig:omap3-beagle
	arm:beaglexm:omap2plus_defconfig:omap3-beagle-xm
	arm:overo:omap2plus_defconfig:omap3-overo-tobi
	arm:realview-pb-a8:realview_defconfig:arm-realview-pba8
	arm:realview-pbx-a9:realview_defconfig:arm-realview-pbx-a9
	arm:realview-eb:realview_defconfig:arm-realview-eb
	arm:realview-eb-mpcore:realview_defconfig:arm-realview-eb-11mp-ctrevb
	arm64:virt:smp:defconfig
	arm64:xlnx-ep108:smp:defconfig:zynqmp-ep108
	arm64:virt:nosmp:defconfig
	arm64:xlnx-ep108:nosmp:defconfig:zynqmp-ep108
	mips:malta_defconfig:smp
	mipsel:24Kf:malta_defconfig:smp
	powerpc:mac99:nosmp:ppc_book3s_defconfig
	powerpc:g3beige:nosmp:ppc_book3s_defconfig
	powerpc:mac99:smp:ppc_book3s_defconfig
	powerpc:mpc8548cds:smpdev:85xx/mpc85xx_cds_defconfig
	powerpc:mac99:ppc64_book3s_defconfig:nosmp
	powerpc:mac99:ppc64_book3s_defconfig:smp4
	powerpc:pseries:pseries_defconfig
	powerpc:mpc8544ds:ppc64_e5500_defconfig:smp
	sparc32:SPARCClassic:smp:sparc32_defconfig
	sparc32:SPARCbook:smp:sparc32_defconfig
	sparc32:SS-4:smp:sparc32_defconfig
	sparc32:SS-5:smp:sparc32_defconfig
	sparc32:SS-10:smp:sparc32_defconfig
	sparc32:SS-20:smp:sparc32_defconfig
	sparc32:SS-600MP:smp:sparc32_defconfig
	sparc32:LX:smp:sparc32_defconfig
	sparc32:Voyager:smp:sparc32_defconfig
	x86:Broadwell:q35:x86_pc_defconfig
	x86:Skylake-Client:q35:x86_pc_defconfig
	x86:SandyBridge:q35:x86_pc_defconfig
	x86:Haswell:pc:x86_pc_defconfig
	x86:Nehalem:q35:x86_pc_defconfig
	x86:phenom:pc:x86_pc_defconfig
	x86:core2duo:q35:x86_pc_nosmp_defconfig
	x86:Conroe:isapc:x86_pc_nosmp_defconfig
	x86:Opteron_G1:pc:x86_pc_nosmp_defconfig
	x86:n270:isapc:x86_pc_nosmp_defconfig
	x86_64:q35:Broadwell-noTSX:x86_64_pc_defconfig
	x86_64:q35:IvyBridge:x86_64_pc_defconfig
	x86_64:q35:SandyBridge:x86_64_pc_defconfig
	x86_64:q35:Haswell:x86_64_pc_defconfig
	x86_64:pc:core2duo:x86_64_pc_defconfig
	x86_64:q35:Nehalem:x86_64_pc_defconfig
	x86_64:pc:phenom:x86_64_pc_defconfig
	x86_64:q35:Opteron_G1:x86_64_pc_defconfig
	x86_64:pc:Opteron_G4:x86_64_pc_nosmp_defconfig
	x86_64:q35:IvyBridge:x86_64_pc_nosmp_defconfig
	xtensa:dc232b:lx60:generic_kc705_defconfig
	xtensa:dc232b:kc705:generic_kc705_defconfig
	xtensa:dc233c:ml605:generic_kc705_defconfig
	xtensa:dc233c:kc705:generic_kc705_defconfig

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [6/6] mm: memcontrol: account slab stats per lruvec
@ 2017-06-05 16:52     ` Guenter Roeck
  0 siblings, 0 replies; 62+ messages in thread
From: Guenter Roeck @ 2017-06-05 16:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team

On Tue, May 30, 2017 at 02:17:24PM -0400, Johannes Weiner wrote:
> Josef's redesign of the balancing between slab caches and the page
> cache requires slab cache statistics at the lruvec level.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

Presumably this is already known, but a remarkable number of crashes
in next-20170605 bisects to this patch.

Guenter

---
Qemu test results:
	total: 122 pass: 51 fail: 71
Failed tests:
	arm:vexpress-a9:vexpress_defconfig:vexpress-v2p-ca9
	arm:vexpress-a15:vexpress_defconfig:vexpress-v2p-ca15-tc1
	arm:kzm:imx_v6_v7_defconfig
	arm:sabrelite:imx_v6_v7_defconfig:imx6dl-sabrelite
	arm:beagle:multi_v7_defconfig:omap3-beagle
	arm:beaglexm:multi_v7_defconfig:omap3-beagle-xm
	arm:overo:multi_v7_defconfig:omap3-overo-tobi
	arm:sabrelite:multi_v7_defconfig:imx6dl-sabrelite
	arm:vexpress-a9:multi_v7_defconfig:vexpress-v2p-ca9
	arm:vexpress-a15:multi_v7_defconfig:vexpress-v2p-ca15-tc1
	arm:vexpress-a15-a7:multi_v7_defconfig:vexpress-v2p-ca15_a7
	arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zc702
	arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zc706
	arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zed
	arm:midway:multi_v7_defconfig:ecx-2000
	arm:smdkc210:multi_v7_defconfig:exynos4210-smdkv310
	arm:smdkc210:exynos_defconfig:exynos4210-smdkv310
	arm:beagle:omap2plus_defconfig:omap3-beagle
	arm:beaglexm:omap2plus_defconfig:omap3-beagle-xm
	arm:overo:omap2plus_defconfig:omap3-overo-tobi
	arm:realview-pb-a8:realview_defconfig:arm-realview-pba8
	arm:realview-pbx-a9:realview_defconfig:arm-realview-pbx-a9
	arm:realview-eb:realview_defconfig:arm-realview-eb
	arm:realview-eb-mpcore:realview_defconfig:arm-realview-eb-11mp-ctrevb
	arm64:virt:smp:defconfig
	arm64:xlnx-ep108:smp:defconfig:zynqmp-ep108
	arm64:virt:nosmp:defconfig
	arm64:xlnx-ep108:nosmp:defconfig:zynqmp-ep108
	mips:malta_defconfig:smp
	mipsel:24Kf:malta_defconfig:smp
	powerpc:mac99:nosmp:ppc_book3s_defconfig
	powerpc:g3beige:nosmp:ppc_book3s_defconfig
	powerpc:mac99:smp:ppc_book3s_defconfig
	powerpc:mpc8548cds:smpdev:85xx/mpc85xx_cds_defconfig
	powerpc:mac99:ppc64_book3s_defconfig:nosmp
	powerpc:mac99:ppc64_book3s_defconfig:smp4
	powerpc:pseries:pseries_defconfig
	powerpc:mpc8544ds:ppc64_e5500_defconfig:smp
	sparc32:SPARCClassic:smp:sparc32_defconfig
	sparc32:SPARCbook:smp:sparc32_defconfig
	sparc32:SS-4:smp:sparc32_defconfig
	sparc32:SS-5:smp:sparc32_defconfig
	sparc32:SS-10:smp:sparc32_defconfig
	sparc32:SS-20:smp:sparc32_defconfig
	sparc32:SS-600MP:smp:sparc32_defconfig
	sparc32:LX:smp:sparc32_defconfig
	sparc32:Voyager:smp:sparc32_defconfig
	x86:Broadwell:q35:x86_pc_defconfig
	x86:Skylake-Client:q35:x86_pc_defconfig
	x86:SandyBridge:q35:x86_pc_defconfig
	x86:Haswell:pc:x86_pc_defconfig
	x86:Nehalem:q35:x86_pc_defconfig
	x86:phenom:pc:x86_pc_defconfig
	x86:core2duo:q35:x86_pc_nosmp_defconfig
	x86:Conroe:isapc:x86_pc_nosmp_defconfig
	x86:Opteron_G1:pc:x86_pc_nosmp_defconfig
	x86:n270:isapc:x86_pc_nosmp_defconfig
	x86_64:q35:Broadwell-noTSX:x86_64_pc_defconfig
	x86_64:q35:IvyBridge:x86_64_pc_defconfig
	x86_64:q35:SandyBridge:x86_64_pc_defconfig
	x86_64:q35:Haswell:x86_64_pc_defconfig
	x86_64:pc:core2duo:x86_64_pc_defconfig
	x86_64:q35:Nehalem:x86_64_pc_defconfig
	x86_64:pc:phenom:x86_64_pc_defconfig
	x86_64:q35:Opteron_G1:x86_64_pc_defconfig
	x86_64:pc:Opteron_G4:x86_64_pc_nosmp_defconfig
	x86_64:q35:IvyBridge:x86_64_pc_nosmp_defconfig
	xtensa:dc232b:lx60:generic_kc705_defconfig
	xtensa:dc232b:kc705:generic_kc705_defconfig
	xtensa:dc233c:ml605:generic_kc705_defconfig
	xtensa:dc233c:kc705:generic_kc705_defconfig

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [6/6] mm: memcontrol: account slab stats per lruvec
  2017-06-05 16:52     ` Guenter Roeck
@ 2017-06-05 17:52       ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-05 17:52 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team

On Mon, Jun 05, 2017 at 09:52:03AM -0700, Guenter Roeck wrote:
> On Tue, May 30, 2017 at 02:17:24PM -0400, Johannes Weiner wrote:
> > Josef's redesign of the balancing between slab caches and the page
> > cache requires slab cache statistics at the lruvec level.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
> 
> Presumably this is already known, but a remarkable number of crashes
> in next-20170605 bisects to this patch.

Thanks Guenter.

Can you test if the fix below resolves the problem?

---

>From 47007dfcd7873cb93d11466a93b1f41f6a7a434f Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Sun, 4 Jun 2017 07:02:44 -0400
Subject: [PATCH] mm: memcontrol: per-lruvec stats infrastructure fix 2

Even with the previous fix routing !page->mem_cgroup stats to the root
cgroup, we still see crashes in certain configurations as the root is
not initialized for the earliest possible accounting sites in certain
configurations.

Don't track uncharged pages at all, not even in the root. This takes
care of early accounting as well as special pages that aren't tracked.

Because we still need to account at the pgdat level, we can no longer
implement the lruvec_page_state functions on top of the lruvec_state
ones. But that's okay. It was a little silly to look up the nodeinfo
and descend to the lruvec, only to container_of() back to the nodeinfo
where the lruvec_stat structure is sitting.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bea6f08e9e16..da9360885260 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -585,27 +585,27 @@ static inline void mod_lruvec_state(struct lruvec *lruvec,
 static inline void __mod_lruvec_page_state(struct page *page,
 					   enum node_stat_item idx, int val)
 {
-	struct mem_cgroup *memcg;
-	struct lruvec *lruvec;
-
-	/* Special pages in the VM aren't charged, use root */
-	memcg = page->mem_cgroup ? : root_mem_cgroup;
+	struct mem_cgroup_per_node *pn;
 
-	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
-	__mod_lruvec_state(lruvec, idx, val);
+	__mod_node_page_state(page_pgdat(page), idx, val);
+	if (mem_cgroup_disabled() || !page->mem_cgroup)
+		return;
+	__mod_memcg_state(page->mem_cgroup, idx, val);
+	pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
+	__this_cpu_add(pn->lruvec_stat->count[idx], val);
 }
 
 static inline void mod_lruvec_page_state(struct page *page,
 					 enum node_stat_item idx, int val)
 {
-	struct mem_cgroup *memcg;
-	struct lruvec *lruvec;
-
-	/* Special pages in the VM aren't charged, use root */
-	memcg = page->mem_cgroup ? : root_mem_cgroup;
+	struct mem_cgroup_per_node *pn;
 
-	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
-	mod_lruvec_state(lruvec, idx, val);
+	mod_node_page_state(page_pgdat(page), idx, val);
+	if (mem_cgroup_disabled() || !page->mem_cgroup)
+		return;
+	mod_memcg_state(page->mem_cgroup, idx, val);
+	pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
+	this_cpu_add(pn->lruvec_stat->count[idx], val);
 }
 
 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [6/6] mm: memcontrol: account slab stats per lruvec
@ 2017-06-05 17:52       ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-05 17:52 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team

On Mon, Jun 05, 2017 at 09:52:03AM -0700, Guenter Roeck wrote:
> On Tue, May 30, 2017 at 02:17:24PM -0400, Johannes Weiner wrote:
> > Josef's redesign of the balancing between slab caches and the page
> > cache requires slab cache statistics at the lruvec level.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
> 
> Presumably this is already known, but a remarkable number of crashes
> in next-20170605 bisects to this patch.

Thanks Guenter.

Can you test if the fix below resolves the problem?

---

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
  2017-06-03 17:50     ` Vladimir Davydov
@ 2017-06-05 17:53       ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-05 17:53 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Sat, Jun 03, 2017 at 08:50:02PM +0300, Vladimir Davydov wrote:
> On Tue, May 30, 2017 at 02:17:23PM -0400, Johannes Weiner wrote:
> > lruvecs are at the intersection of the NUMA node and memcg, which is
> > the scope for most paging activity.
> > 
> > Introduce a convenient accounting infrastructure that maintains
> > statistics per node, per memcg, and the lruvec itself.
> > 
> > Then convert over accounting sites for statistics that are already
> > tracked in both nodes and memcgs and can be easily switched.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
> >  include/linux/vmstat.h     |   1 -
> >  mm/memcontrol.c            |   6 ++
> >  mm/page-writeback.c        |  15 +--
> >  mm/rmap.c                  |   8 +-
> >  mm/workingset.c            |   9 +-
> >  6 files changed, 225 insertions(+), 52 deletions(-)
> > 
> ...
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 9c68a40c83e3..e37908606c0f 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> >  	if (!pn)
> >  		return 1;
> >  
> > +	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
> > +	if (!pn->lruvec_stat) {
> > +		kfree(pn);
> > +		return 1;
> > +	}
> > +
> >  	lruvec_init(&pn->lruvec);
> >  	pn->usage_in_excess = 0;
> >  	pn->on_tree = false;
> 
> I don't see the matching free_percpu() anywhere, forget to patch
> free_mem_cgroup_per_node_info()?

Yes, I missed that.

---

>From 4d09a522a2182acae4e36ded4211d05defd75b74 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 5 Jun 2017 10:59:41 -0400
Subject: [PATCH] mm: memcontrol: per-lruvec stats infrastructure fix 3

As pointed out, there is a missing free_percpu() for the lruvec_stat
object in the memcg's per node info. Add this.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e37908606c0f..093fe7e06e51 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4139,7 +4139,10 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 
 static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
-	kfree(memcg->nodeinfo[node]);
+	struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
+
+	free_percpu(pn->lruvec_stat);
+	kfree(pn);
 }
 
 static void __mem_cgroup_free(struct mem_cgroup *memcg)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure
@ 2017-06-05 17:53       ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-05 17:53 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Josef Bacik, Michal Hocko, Andrew Morton, Rik van Riel, linux-mm,
	cgroups, linux-kernel, kernel-team

On Sat, Jun 03, 2017 at 08:50:02PM +0300, Vladimir Davydov wrote:
> On Tue, May 30, 2017 at 02:17:23PM -0400, Johannes Weiner wrote:
> > lruvecs are at the intersection of the NUMA node and memcg, which is
> > the scope for most paging activity.
> > 
> > Introduce a convenient accounting infrastructure that maintains
> > statistics per node, per memcg, and the lruvec itself.
> > 
> > Then convert over accounting sites for statistics that are already
> > tracked in both nodes and memcgs and can be easily switched.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
> >  include/linux/vmstat.h     |   1 -
> >  mm/memcontrol.c            |   6 ++
> >  mm/page-writeback.c        |  15 +--
> >  mm/rmap.c                  |   8 +-
> >  mm/workingset.c            |   9 +-
> >  6 files changed, 225 insertions(+), 52 deletions(-)
> > 
> ...
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 9c68a40c83e3..e37908606c0f 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> >  	if (!pn)
> >  		return 1;
> >  
> > +	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
> > +	if (!pn->lruvec_stat) {
> > +		kfree(pn);
> > +		return 1;
> > +	}
> > +
> >  	lruvec_init(&pn->lruvec);
> >  	pn->usage_in_excess = 0;
> >  	pn->on_tree = false;
> 
> I don't see the matching free_percpu() anywhere, forget to patch
> free_mem_cgroup_per_node_info()?

Yes, I missed that.

---

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-06-01 10:07           ` Michael Ellerman
@ 2017-06-05 18:35             ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-05 18:35 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Thu, Jun 01, 2017 at 08:07:28PM +1000, Michael Ellerman wrote:
> Yury Norov <ynorov@caviumnetworks.com> writes:
> 
> > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote:
> >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
> >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> >> > > To re-implement slab cache vs. page cache balancing, we'll need the
> >> > > slab counters at the lruvec level, which, ever since lru reclaim was
> >> > > moved from the zone to the node, is the intersection of the node, not
> >> > > the zone, and the memcg.
> >> > > 
> >> > > We could retain the per-zone counters for when the page allocator
> >> > > dumps its memory information on failures, and have counters on both
> >> > > levels - which on all but NUMA node 0 is usually redundant. But let's
> >> > > keep it simple for now and just move them. If anybody complains we can
> >> > > restore the per-zone counters.
> >> > > 
> >> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> >> > 
> >> > This patch causes an early boot crash on s390 (linux-next as of today).
> >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
> >> > further into this yet, maybe you have an idea?
> >
> > The same on arm64.
> 
> And powerpc.

It looks like we need the following on top. I can't reproduce the
crash, but it's verifiable with WARN_ONs in the vmstat functions that
the nodestat array isn't properly initialized when slab bootstraps:

---

>From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 5 Jun 2017 14:12:15 -0400
Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters
 fix

Unable to handle kernel paging request at virtual address 2e116007
pgd = c0004000
[2e116007] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200
Hardware name: Generic DRA74X (Flattened Device Tree)
task: c0d0adc0 task.stack: c0d00000
PC is at __mod_node_page_state+0x2c/0xc8
LR is at __per_cpu_offset+0x0/0x8
pc : [<c0271de8>]    lr : [<c0d07da4>]    psr: 600000d3
sp : c0d01eec  ip : 00000000  fp : c15782f4
r10: 00000000  r9 : c1591280  r8 : 00004000
r7 : 00000001  r6 : 00000006  r5 : 2e116000  r4 : 00000007
r3 : 00000007  r2 : 00000001  r1 : 00000006  r0 : c0dc27c0
Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 8000406a  DAC: 00000051
Process swapper (pid: 0, stack limit = 0xc0d00218)
Stack: (0xc0d01eec to 0xc0d02000)
1ee0:                            600000d3 c0dc27c0 c0271efc 00000001 c0d58864
1f00: ef470000 00008000 00004000 c029fbb0 01000000 c1572b5c 00002000 00000000
1f20: 00000001 00000001 00008000 c029f584 00000000 c0d58864 00008000 00008000
1f40: 01008000 c0c23790 c15782f4 a00000d3 c0d58864 c02a0364 00000000 c0819388
1f60: c0d58864 000000c0 01000000 c1572a58 c0aa57a4 00000080 00002000 c0dca000
1f80: efffe980 c0c53a48 00000000 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c
1fa0: c0dca000 c0c257a4 00000000 ffffffff c0dca000 c0d07940 c0dca000 c0c00a9c
1fc0: ffffffff ffffffff 00000000 c0c00680 00000000 c0c53a48 c0dca214 c0d07958
1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 00000000 8000807c 00000000 00000000
[<c0271de8>] (__mod_node_page_state) from [<c0271efc>] (mod_node_page_state+0x2c/0x4c)
[<c0271efc>] (mod_node_page_state) from [<c029fbb0>] (cache_alloc_refill+0x5b8/0x828)
[<c029fbb0>] (cache_alloc_refill) from [<c02a0364>] (kmem_cache_alloc+0x24c/0x2d0)
[<c02a0364>] (kmem_cache_alloc) from [<c0c23790>] (create_kmalloc_cache+0x20/0x8c)
[<c0c23790>] (create_kmalloc_cache) from [<c0c257a4>] (kmem_cache_init+0xac/0x11c)
[<c0c257a4>] (kmem_cache_init) from [<c0c00a9c>] (start_kernel+0x1b8/0x3c0)
[<c0c00a9c>] (start_kernel) from [<8000807c>] (0x8000807c)
Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5)
---[ end trace 0000000000000000 ]---

The zone counters work earlier than the node counters because the
zones have special boot pagesets, whereas the nodes do not.

Add boot nodestats against which we account until the dynamic per-cpu
allocator is available.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5f89cfaddc4b..7f341f84b587 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat)
  */
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
+static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
 static void setup_zone_pageset(struct zone *zone);
 
 /*
@@ -6010,6 +6011,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 	spin_lock_init(&pgdat->lru_lock);
 	lruvec_init(node_lruvec(pgdat));
 
+	pgdat->per_cpu_nodestats = &boot_nodestats;
+
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize, freesize, memmap_pages;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-06-05 18:35             ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-05 18:35 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Thu, Jun 01, 2017 at 08:07:28PM +1000, Michael Ellerman wrote:
> Yury Norov <ynorov@caviumnetworks.com> writes:
> 
> > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote:
> >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote:
> >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote:
> >> > > To re-implement slab cache vs. page cache balancing, we'll need the
> >> > > slab counters at the lruvec level, which, ever since lru reclaim was
> >> > > moved from the zone to the node, is the intersection of the node, not
> >> > > the zone, and the memcg.
> >> > > 
> >> > > We could retain the per-zone counters for when the page allocator
> >> > > dumps its memory information on failures, and have counters on both
> >> > > levels - which on all but NUMA node 0 is usually redundant. But let's
> >> > > keep it simple for now and just move them. If anybody complains we can
> >> > > restore the per-zone counters.
> >> > > 
> >> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> >> > 
> >> > This patch causes an early boot crash on s390 (linux-next as of today).
> >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any
> >> > further into this yet, maybe you have an idea?
> >
> > The same on arm64.
> 
> And powerpc.

It looks like we need the following on top. I can't reproduce the
crash, but it's verifiable with WARN_ONs in the vmstat functions that
the nodestat array isn't properly initialized when slab bootstraps:

---

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-06-05 18:35             ` Johannes Weiner
@ 2017-06-05 21:38               ` Andrew Morton
  -1 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2017-06-05 21:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michael Ellerman, Yury Norov, Heiko Carstens, Josef Bacik,
	Michal Hocko, Vladimir Davydov, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat)
>   */
>  static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
> +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
>  static void setup_zone_pageset(struct zone *zone);

There's a few kb there.  It just sits evermore unused after boot?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-06-05 21:38               ` Andrew Morton
  0 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2017-06-05 21:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michael Ellerman, Yury Norov, Heiko Carstens, Josef Bacik,
	Michal Hocko, Vladimir Davydov, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat)
>   */
>  static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
> +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
>  static void setup_zone_pageset(struct zone *zone);

There's a few kb there.  It just sits evermore unused after boot?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-06-05 18:35             ` Johannes Weiner
@ 2017-06-06  4:31               ` Michael Ellerman
  -1 siblings, 0 replies; 62+ messages in thread
From: Michael Ellerman @ 2017-06-06  4:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

Johannes Weiner <hannes@cmpxchg.org> writes:
> From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 5 Jun 2017 14:12:15 -0400
> Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters
>  fix
>
> Unable to handle kernel paging request at virtual address 2e116007
> pgd = c0004000
> [2e116007] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200
> Hardware name: Generic DRA74X (Flattened Device Tree)
> task: c0d0adc0 task.stack: c0d00000
> PC is at __mod_node_page_state+0x2c/0xc8
> LR is at __per_cpu_offset+0x0/0x8
> pc : [<c0271de8>]    lr : [<c0d07da4>]    psr: 600000d3
> sp : c0d01eec  ip : 00000000  fp : c15782f4
> r10: 00000000  r9 : c1591280  r8 : 00004000
> r7 : 00000001  r6 : 00000006  r5 : 2e116000  r4 : 00000007
> r3 : 00000007  r2 : 00000001  r1 : 00000006  r0 : c0dc27c0
> Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 8000406a  DAC: 00000051
> Process swapper (pid: 0, stack limit = 0xc0d00218)
> Stack: (0xc0d01eec to 0xc0d02000)
> 1ee0:                            600000d3 c0dc27c0 c0271efc 00000001 c0d58864
> 1f00: ef470000 00008000 00004000 c029fbb0 01000000 c1572b5c 00002000 00000000
> 1f20: 00000001 00000001 00008000 c029f584 00000000 c0d58864 00008000 00008000
> 1f40: 01008000 c0c23790 c15782f4 a00000d3 c0d58864 c02a0364 00000000 c0819388
> 1f60: c0d58864 000000c0 01000000 c1572a58 c0aa57a4 00000080 00002000 c0dca000
> 1f80: efffe980 c0c53a48 00000000 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c
> 1fa0: c0dca000 c0c257a4 00000000 ffffffff c0dca000 c0d07940 c0dca000 c0c00a9c
> 1fc0: ffffffff ffffffff 00000000 c0c00680 00000000 c0c53a48 c0dca214 c0d07958
> 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 00000000 8000807c 00000000 00000000
> [<c0271de8>] (__mod_node_page_state) from [<c0271efc>] (mod_node_page_state+0x2c/0x4c)
> [<c0271efc>] (mod_node_page_state) from [<c029fbb0>] (cache_alloc_refill+0x5b8/0x828)
> [<c029fbb0>] (cache_alloc_refill) from [<c02a0364>] (kmem_cache_alloc+0x24c/0x2d0)
> [<c02a0364>] (kmem_cache_alloc) from [<c0c23790>] (create_kmalloc_cache+0x20/0x8c)
> [<c0c23790>] (create_kmalloc_cache) from [<c0c257a4>] (kmem_cache_init+0xac/0x11c)
> [<c0c257a4>] (kmem_cache_init) from [<c0c00a9c>] (start_kernel+0x1b8/0x3c0)
> [<c0c00a9c>] (start_kernel) from [<8000807c>] (0x8000807c)
> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5)
> ---[ end trace 0000000000000000 ]---

Just to be clear that's not my call trace.

> The zone counters work earlier than the node counters because the
> zones have special boot pagesets, whereas the nodes do not.
>
> Add boot nodestats against which we account until the dynamic per-cpu
> allocator is available.

This isn't working for me. I applied it on top of next-20170605, I still
get an oops:

  $ qemu-system-ppc64 -M pseries -m 1G  -kernel build/vmlinux -vga none -nographic
  SLOF **********************************************************************
  QEMU Starting
  ...
  Linux version 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty (michael@ka3.ozlabs.ibm.com) (gcc version 5.4.1 20170214 (Custom 2af61cd06c9fd8f5) ) #352 SMP Tue Jun 6 14:09:57 AEST 2017
  ...
  PID hash table entries: 4096 (order: -1, 32768 bytes)
  Memory: 1014592K/1048576K available (9920K kernel code, 1536K rwdata, 2608K rodata, 832K init, 1420K bss, 33984K reserved, 0K cma-reserved)
  Unable to handle kernel paging request for data at address 0x00000338
  Faulting instruction address: 0xc0000000002cf338
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 
  NUMA 
  pSeries
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty #352
  task: c000000000d11080 task.stack: c000000000e24000
  NIP: c0000000002cf338 LR: c0000000002cf0dc CTR: 0000000000000000
  REGS: c000000000e279a0 TRAP: 0380   Not tainted  (4.12.0-rc3-gcc-5.4.1-next-20170605-dirty)
  MSR: 8000000002001033 <SF,VEC,ME,IR,DR,RI,LE>
    CR: 22482242  XER: 00000000
  CFAR: c0000000002cf6a0 SOFTE: 0 
  GPR00: c0000000002cf0dc c000000000e27c20 c000000000e28300 c00000003ffc6300 
  GPR04: c000000000e556f8 0000000000000000 000000003f120000 0000000000000000 
  GPR08: c000000000ed3058 0000000000000330 0000000000000000 ffffffffffffff80 
  GPR12: 0000000028402824 c00000000fd40000 0000000000000060 0000000000f540a8 
  GPR16: 0000000000f540d8 fffffffffffffffd 000000003dc54ee0 0000000000000014 
  GPR20: c000000000b90e60 c000000000b90e90 0000000000002000 0000000000000000 
  GPR24: 0000000000000401 0000000000000000 0000000000000001 c00000003e000000 
  GPR28: 0000000080010400 f0000000000f8000 0000000000000006 c000000000cb4270 
  NIP [c0000000002cf338] new_slab+0x338/0x770
  LR [c0000000002cf0dc] new_slab+0xdc/0x770
  Call Trace:
  [c000000000e27c20] [c0000000002cf0dc] new_slab+0xdc/0x770 (unreliable)
  [c000000000e27cf0] [c0000000002d6bb4] __kmem_cache_create+0x1a4/0x6a0
  [c000000000e27e00] [c000000000c73098] create_boot_cache+0x98/0xdc
  [c000000000e27e80] [c000000000c77608] kmem_cache_init+0x5c/0x160
  [c000000000e27f00] [c000000000c43ec8] start_kernel+0x290/0x51c
  [c000000000e27f90] [c00000000000b070] start_here_common+0x1c/0x4ac
  Instruction dump:
  419e0388 893d0007 3d02000b 3908ad58 79291f24 7c68482a 60000000 3d230001 
  e9299a42 39290066 79291f24 7d2a4a14 <eb890008> e93c0080 7fa34800 409e03b0 
  ---[ end trace 0000000000000000 ]---


cheers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-06-06  4:31               ` Michael Ellerman
  0 siblings, 0 replies; 62+ messages in thread
From: Michael Ellerman @ 2017-06-06  4:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

Johannes Weiner <hannes@cmpxchg.org> writes:
> From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 5 Jun 2017 14:12:15 -0400
> Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters
>  fix
>
> Unable to handle kernel paging request at virtual address 2e116007
> pgd = c0004000
> [2e116007] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200
> Hardware name: Generic DRA74X (Flattened Device Tree)
> task: c0d0adc0 task.stack: c0d00000
> PC is at __mod_node_page_state+0x2c/0xc8
> LR is at __per_cpu_offset+0x0/0x8
> pc : [<c0271de8>]    lr : [<c0d07da4>]    psr: 600000d3
> sp : c0d01eec  ip : 00000000  fp : c15782f4
> r10: 00000000  r9 : c1591280  r8 : 00004000
> r7 : 00000001  r6 : 00000006  r5 : 2e116000  r4 : 00000007
> r3 : 00000007  r2 : 00000001  r1 : 00000006  r0 : c0dc27c0
> Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 8000406a  DAC: 00000051
> Process swapper (pid: 0, stack limit = 0xc0d00218)
> Stack: (0xc0d01eec to 0xc0d02000)
> 1ee0:                            600000d3 c0dc27c0 c0271efc 00000001 c0d58864
> 1f00: ef470000 00008000 00004000 c029fbb0 01000000 c1572b5c 00002000 00000000
> 1f20: 00000001 00000001 00008000 c029f584 00000000 c0d58864 00008000 00008000
> 1f40: 01008000 c0c23790 c15782f4 a00000d3 c0d58864 c02a0364 00000000 c0819388
> 1f60: c0d58864 000000c0 01000000 c1572a58 c0aa57a4 00000080 00002000 c0dca000
> 1f80: efffe980 c0c53a48 00000000 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c
> 1fa0: c0dca000 c0c257a4 00000000 ffffffff c0dca000 c0d07940 c0dca000 c0c00a9c
> 1fc0: ffffffff ffffffff 00000000 c0c00680 00000000 c0c53a48 c0dca214 c0d07958
> 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 00000000 8000807c 00000000 00000000
> [<c0271de8>] (__mod_node_page_state) from [<c0271efc>] (mod_node_page_state+0x2c/0x4c)
> [<c0271efc>] (mod_node_page_state) from [<c029fbb0>] (cache_alloc_refill+0x5b8/0x828)
> [<c029fbb0>] (cache_alloc_refill) from [<c02a0364>] (kmem_cache_alloc+0x24c/0x2d0)
> [<c02a0364>] (kmem_cache_alloc) from [<c0c23790>] (create_kmalloc_cache+0x20/0x8c)
> [<c0c23790>] (create_kmalloc_cache) from [<c0c257a4>] (kmem_cache_init+0xac/0x11c)
> [<c0c257a4>] (kmem_cache_init) from [<c0c00a9c>] (start_kernel+0x1b8/0x3c0)
> [<c0c00a9c>] (start_kernel) from [<8000807c>] (0x8000807c)
> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5)
> ---[ end trace 0000000000000000 ]---

Just to be clear that's not my call trace.

> The zone counters work earlier than the node counters because the
> zones have special boot pagesets, whereas the nodes do not.
>
> Add boot nodestats against which we account until the dynamic per-cpu
> allocator is available.

This isn't working for me. I applied it on top of next-20170605, I still
get an oops:

  $ qemu-system-ppc64 -M pseries -m 1G  -kernel build/vmlinux -vga none -nographic
  SLOF **********************************************************************
  QEMU Starting
  ...
  Linux version 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty (michael@ka3.ozlabs.ibm.com) (gcc version 5.4.1 20170214 (Custom 2af61cd06c9fd8f5) ) #352 SMP Tue Jun 6 14:09:57 AEST 2017
  ...
  PID hash table entries: 4096 (order: -1, 32768 bytes)
  Memory: 1014592K/1048576K available (9920K kernel code, 1536K rwdata, 2608K rodata, 832K init, 1420K bss, 33984K reserved, 0K cma-reserved)
  Unable to handle kernel paging request for data at address 0x00000338
  Faulting instruction address: 0xc0000000002cf338
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 
  NUMA 
  pSeries
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty #352
  task: c000000000d11080 task.stack: c000000000e24000
  NIP: c0000000002cf338 LR: c0000000002cf0dc CTR: 0000000000000000
  REGS: c000000000e279a0 TRAP: 0380   Not tainted  (4.12.0-rc3-gcc-5.4.1-next-20170605-dirty)
  MSR: 8000000002001033 <SF,VEC,ME,IR,DR,RI,LE>
    CR: 22482242  XER: 00000000
  CFAR: c0000000002cf6a0 SOFTE: 0 
  GPR00: c0000000002cf0dc c000000000e27c20 c000000000e28300 c00000003ffc6300 
  GPR04: c000000000e556f8 0000000000000000 000000003f120000 0000000000000000 
  GPR08: c000000000ed3058 0000000000000330 0000000000000000 ffffffffffffff80 
  GPR12: 0000000028402824 c00000000fd40000 0000000000000060 0000000000f540a8 
  GPR16: 0000000000f540d8 fffffffffffffffd 000000003dc54ee0 0000000000000014 
  GPR20: c000000000b90e60 c000000000b90e90 0000000000002000 0000000000000000 
  GPR24: 0000000000000401 0000000000000000 0000000000000001 c00000003e000000 
  GPR28: 0000000080010400 f0000000000f8000 0000000000000006 c000000000cb4270 
  NIP [c0000000002cf338] new_slab+0x338/0x770
  LR [c0000000002cf0dc] new_slab+0xdc/0x770
  Call Trace:
  [c000000000e27c20] [c0000000002cf0dc] new_slab+0xdc/0x770 (unreliable)
  [c000000000e27cf0] [c0000000002d6bb4] __kmem_cache_create+0x1a4/0x6a0
  [c000000000e27e00] [c000000000c73098] create_boot_cache+0x98/0xdc
  [c000000000e27e80] [c000000000c77608] kmem_cache_init+0x5c/0x160
  [c000000000e27f00] [c000000000c43ec8] start_kernel+0x290/0x51c
  [c000000000e27f90] [c00000000000b070] start_here_common+0x1c/0x4ac
  Instruction dump:
  419e0388 893d0007 3d02000b 3908ad58 79291f24 7c68482a 60000000 3d230001 
  e9299a42 39290066 79291f24 7d2a4a14 <eb890008> e93c0080 7fa34800 409e03b0 
  ---[ end trace 0000000000000000 ]---


cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-06-06  4:31               ` Michael Ellerman
@ 2017-06-06 11:15                 ` Michael Ellerman
  -1 siblings, 0 replies; 62+ messages in thread
From: Michael Ellerman @ 2017-06-06 11:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

Michael Ellerman <mpe@ellerman.id.au> writes:

> Johannes Weiner <hannes@cmpxchg.org> writes:
>> From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001
>> From: Johannes Weiner <hannes@cmpxchg.org>
>> Date: Mon, 5 Jun 2017 14:12:15 -0400
>> Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters
>>  fix
>>
>> Unable to handle kernel paging request at virtual address 2e116007
>> pgd = c0004000
>> [2e116007] *pgd=00000000
>> Internal error: Oops: 5 [#1] SMP ARM
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200
>> Hardware name: Generic DRA74X (Flattened Device Tree)
>> task: c0d0adc0 task.stack: c0d00000
>> PC is at __mod_node_page_state+0x2c/0xc8
>> LR is at __per_cpu_offset+0x0/0x8
>> pc : [<c0271de8>]    lr : [<c0d07da4>]    psr: 600000d3
>> sp : c0d01eec  ip : 00000000  fp : c15782f4
>> r10: 00000000  r9 : c1591280  r8 : 00004000
>> r7 : 00000001  r6 : 00000006  r5 : 2e116000  r4 : 00000007
>> r3 : 00000007  r2 : 00000001  r1 : 00000006  r0 : c0dc27c0
>> Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
>> Control: 10c5387d  Table: 8000406a  DAC: 00000051
>> Process swapper (pid: 0, stack limit = 0xc0d00218)
>> Stack: (0xc0d01eec to 0xc0d02000)
>> 1ee0:                            600000d3 c0dc27c0 c0271efc 00000001 c0d58864
>> 1f00: ef470000 00008000 00004000 c029fbb0 01000000 c1572b5c 00002000 00000000
>> 1f20: 00000001 00000001 00008000 c029f584 00000000 c0d58864 00008000 00008000
>> 1f40: 01008000 c0c23790 c15782f4 a00000d3 c0d58864 c02a0364 00000000 c0819388
>> 1f60: c0d58864 000000c0 01000000 c1572a58 c0aa57a4 00000080 00002000 c0dca000
>> 1f80: efffe980 c0c53a48 00000000 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c
>> 1fa0: c0dca000 c0c257a4 00000000 ffffffff c0dca000 c0d07940 c0dca000 c0c00a9c
>> 1fc0: ffffffff ffffffff 00000000 c0c00680 00000000 c0c53a48 c0dca214 c0d07958
>> 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 00000000 8000807c 00000000 00000000
>> [<c0271de8>] (__mod_node_page_state) from [<c0271efc>] (mod_node_page_state+0x2c/0x4c)
>> [<c0271efc>] (mod_node_page_state) from [<c029fbb0>] (cache_alloc_refill+0x5b8/0x828)
>> [<c029fbb0>] (cache_alloc_refill) from [<c02a0364>] (kmem_cache_alloc+0x24c/0x2d0)
>> [<c02a0364>] (kmem_cache_alloc) from [<c0c23790>] (create_kmalloc_cache+0x20/0x8c)
>> [<c0c23790>] (create_kmalloc_cache) from [<c0c257a4>] (kmem_cache_init+0xac/0x11c)
>> [<c0c257a4>] (kmem_cache_init) from [<c0c00a9c>] (start_kernel+0x1b8/0x3c0)
>> [<c0c00a9c>] (start_kernel) from [<8000807c>] (0x8000807c)
>> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5)
>> ---[ end trace 0000000000000000 ]---
>
> Just to be clear that's not my call trace.
>
>> The zone counters work earlier than the node counters because the
>> zones have special boot pagesets, whereas the nodes do not.
>>
>> Add boot nodestats against which we account until the dynamic per-cpu
>> allocator is available.
>
> This isn't working for me. I applied it on top of next-20170605, I still
> get an oops:

But today's linux-next is OK. So I must have missed a fix when testing
this in isolation.

commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a
Author:     Johannes Weiner <hannes@cmpxchg.org>
AuthorDate: Tue Jun 6 09:19:50 2017 +1000
Commit:     Stephen Rothwell <sfr@canb.auug.org.au>
CommitDate: Tue Jun 6 09:19:50 2017 +1000

    mm: vmstat: move slab statistics from zone to node counters fix
    
    Unable to handle kernel paging request at virtual address 2e116007
    pgd = c0004000
    [2e116007] *pgd=00000000
    Internal error: Oops: 5 [#1] SMP ARM

...

Booted to userspace:

$ uname -a
Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux


cheers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-06-06 11:15                 ` Michael Ellerman
  0 siblings, 0 replies; 62+ messages in thread
From: Michael Ellerman @ 2017-06-06 11:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

Michael Ellerman <mpe@ellerman.id.au> writes:

> Johannes Weiner <hannes@cmpxchg.org> writes:
>> From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001
>> From: Johannes Weiner <hannes@cmpxchg.org>
>> Date: Mon, 5 Jun 2017 14:12:15 -0400
>> Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters
>>  fix
>>
>> Unable to handle kernel paging request at virtual address 2e116007
>> pgd = c0004000
>> [2e116007] *pgd=00000000
>> Internal error: Oops: 5 [#1] SMP ARM
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200
>> Hardware name: Generic DRA74X (Flattened Device Tree)
>> task: c0d0adc0 task.stack: c0d00000
>> PC is at __mod_node_page_state+0x2c/0xc8
>> LR is at __per_cpu_offset+0x0/0x8
>> pc : [<c0271de8>]    lr : [<c0d07da4>]    psr: 600000d3
>> sp : c0d01eec  ip : 00000000  fp : c15782f4
>> r10: 00000000  r9 : c1591280  r8 : 00004000
>> r7 : 00000001  r6 : 00000006  r5 : 2e116000  r4 : 00000007
>> r3 : 00000007  r2 : 00000001  r1 : 00000006  r0 : c0dc27c0
>> Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
>> Control: 10c5387d  Table: 8000406a  DAC: 00000051
>> Process swapper (pid: 0, stack limit = 0xc0d00218)
>> Stack: (0xc0d01eec to 0xc0d02000)
>> 1ee0:                            600000d3 c0dc27c0 c0271efc 00000001 c0d58864
>> 1f00: ef470000 00008000 00004000 c029fbb0 01000000 c1572b5c 00002000 00000000
>> 1f20: 00000001 00000001 00008000 c029f584 00000000 c0d58864 00008000 00008000
>> 1f40: 01008000 c0c23790 c15782f4 a00000d3 c0d58864 c02a0364 00000000 c0819388
>> 1f60: c0d58864 000000c0 01000000 c1572a58 c0aa57a4 00000080 00002000 c0dca000
>> 1f80: efffe980 c0c53a48 00000000 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c
>> 1fa0: c0dca000 c0c257a4 00000000 ffffffff c0dca000 c0d07940 c0dca000 c0c00a9c
>> 1fc0: ffffffff ffffffff 00000000 c0c00680 00000000 c0c53a48 c0dca214 c0d07958
>> 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 00000000 8000807c 00000000 00000000
>> [<c0271de8>] (__mod_node_page_state) from [<c0271efc>] (mod_node_page_state+0x2c/0x4c)
>> [<c0271efc>] (mod_node_page_state) from [<c029fbb0>] (cache_alloc_refill+0x5b8/0x828)
>> [<c029fbb0>] (cache_alloc_refill) from [<c02a0364>] (kmem_cache_alloc+0x24c/0x2d0)
>> [<c02a0364>] (kmem_cache_alloc) from [<c0c23790>] (create_kmalloc_cache+0x20/0x8c)
>> [<c0c23790>] (create_kmalloc_cache) from [<c0c257a4>] (kmem_cache_init+0xac/0x11c)
>> [<c0c257a4>] (kmem_cache_init) from [<c0c00a9c>] (start_kernel+0x1b8/0x3c0)
>> [<c0c00a9c>] (start_kernel) from [<8000807c>] (0x8000807c)
>> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5)
>> ---[ end trace 0000000000000000 ]---
>
> Just to be clear that's not my call trace.
>
>> The zone counters work earlier than the node counters because the
>> zones have special boot pagesets, whereas the nodes do not.
>>
>> Add boot nodestats against which we account until the dynamic per-cpu
>> allocator is available.
>
> This isn't working for me. I applied it on top of next-20170605, I still
> get an oops:

But today's linux-next is OK. So I must have missed a fix when testing
this in isolation.

commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a
Author:     Johannes Weiner <hannes@cmpxchg.org>
AuthorDate: Tue Jun 6 09:19:50 2017 +1000
Commit:     Stephen Rothwell <sfr@canb.auug.org.au>
CommitDate: Tue Jun 6 09:19:50 2017 +1000

    mm: vmstat: move slab statistics from zone to node counters fix
    
    Unable to handle kernel paging request at virtual address 2e116007
    pgd = c0004000
    [2e116007] *pgd=00000000
    Internal error: Oops: 5 [#1] SMP ARM

...

Booted to userspace:

$ uname -a
Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux


cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-06-06 11:15                 ` Michael Ellerman
@ 2017-06-06 14:33                   ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-06 14:33 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Tue, Jun 06, 2017 at 09:15:48PM +1000, Michael Ellerman wrote:
> But today's linux-next is OK. So I must have missed a fix when testing
> this in isolation.
> 
> commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a
> Author:     Johannes Weiner <hannes@cmpxchg.org>
> AuthorDate: Tue Jun 6 09:19:50 2017 +1000
> Commit:     Stephen Rothwell <sfr@canb.auug.org.au>
> CommitDate: Tue Jun 6 09:19:50 2017 +1000
> 
>     mm: vmstat: move slab statistics from zone to node counters fix
>     
>     Unable to handle kernel paging request at virtual address 2e116007
>     pgd = c0004000
>     [2e116007] *pgd=00000000
>     Internal error: Oops: 5 [#1] SMP ARM
> 
> ...
> 
> Booted to userspace:
> 
> $ uname -a
> Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux

Thanks for verifying!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-06-06 14:33                   ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-06 14:33 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Tue, Jun 06, 2017 at 09:15:48PM +1000, Michael Ellerman wrote:
> But today's linux-next is OK. So I must have missed a fix when testing
> this in isolation.
> 
> commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a
> Author:     Johannes Weiner <hannes@cmpxchg.org>
> AuthorDate: Tue Jun 6 09:19:50 2017 +1000
> Commit:     Stephen Rothwell <sfr@canb.auug.org.au>
> CommitDate: Tue Jun 6 09:19:50 2017 +1000
> 
>     mm: vmstat: move slab statistics from zone to node counters fix
>     
>     Unable to handle kernel paging request at virtual address 2e116007
>     pgd = c0004000
>     [2e116007] *pgd=00000000
>     Internal error: Oops: 5 [#1] SMP ARM
> 
> ...
> 
> Booted to userspace:
> 
> $ uname -a
> Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux

Thanks for verifying!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
  2017-06-05 21:38               ` Andrew Morton
@ 2017-06-07 16:20                 ` Johannes Weiner
  -1 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-07 16:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael Ellerman, Yury Norov, Heiko Carstens, Josef Bacik,
	Michal Hocko, Vladimir Davydov, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Mon, Jun 05, 2017 at 02:38:31PM -0700, Andrew Morton wrote:
> On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat)
> >   */
> >  static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> >  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
> > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
> >  static void setup_zone_pageset(struct zone *zone);
> 
> There's a few kb there.  It just sits evermore unused after boot?

It's not the greatest, but it's nothing new. All the node stats we
have now used to be in the zone, i.e. the then bigger boot_pageset,
before we moved them to the node level. It just re-adds static boot
time space for them now.

Of course, if somebody has an idea on how to elegantly reuse that
memory after boot, that'd be cool. But we've lived with that footprint
for the longest time, so I don't think it's a showstopper.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters
@ 2017-06-07 16:20                 ` Johannes Weiner
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Weiner @ 2017-06-07 16:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael Ellerman, Yury Norov, Heiko Carstens, Josef Bacik,
	Michal Hocko, Vladimir Davydov, Rik van Riel, linux-mm, cgroups,
	linux-kernel, kernel-team, linux-s390

On Mon, Jun 05, 2017 at 02:38:31PM -0700, Andrew Morton wrote:
> On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat)
> >   */
> >  static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> >  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
> > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
> >  static void setup_zone_pageset(struct zone *zone);
> 
> There's a few kb there.  It just sits evermore unused after boot?

It's not the greatest, but it's nothing new. All the node stats we
have now used to be in the zone, i.e. the then bigger boot_pageset,
before we moved them to the node level. It just re-adds static boot
time space for them now.

Of course, if somebody has an idea on how to elegantly reuse that
memory after boot, that'd be cool. But we've lived with that footprint
for the longest time, so I don't think it's a showstopper.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2017-06-07 16:20 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-30 18:17 [PATCH 0/6] mm: per-lruvec slab stats Johannes Weiner
2017-05-30 18:17 ` Johannes Weiner
2017-05-30 18:17 ` [PATCH 1/6] mm: vmscan: delete unused pgdat_reclaimable_pages() Johannes Weiner
2017-05-30 18:17   ` Johannes Weiner
2017-05-30 21:50   ` Andrew Morton
2017-05-30 21:50     ` Andrew Morton
2017-05-30 22:02     ` Johannes Weiner
2017-05-30 22:02       ` Johannes Weiner
2017-05-30 18:17 ` [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters Johannes Weiner
2017-05-30 18:17   ` Johannes Weiner
2017-05-31  9:12   ` Heiko Carstens
2017-05-31  9:12     ` Heiko Carstens
2017-05-31 11:39     ` Heiko Carstens
2017-05-31 11:39       ` Heiko Carstens
2017-05-31 17:11       ` Yury Norov
2017-05-31 17:11         ` Yury Norov
2017-06-01 10:07         ` Michael Ellerman
2017-06-01 10:07           ` Michael Ellerman
2017-06-05 18:35           ` Johannes Weiner
2017-06-05 18:35             ` Johannes Weiner
2017-06-05 21:38             ` Andrew Morton
2017-06-05 21:38               ` Andrew Morton
2017-06-07 16:20               ` Johannes Weiner
2017-06-07 16:20                 ` Johannes Weiner
2017-06-06  4:31             ` Michael Ellerman
2017-06-06  4:31               ` Michael Ellerman
2017-06-06 11:15               ` Michael Ellerman
2017-06-06 11:15                 ` Michael Ellerman
2017-06-06 14:33                 ` Johannes Weiner
2017-06-06 14:33                   ` Johannes Weiner
2017-05-30 18:17 ` [PATCH 3/6] mm: memcontrol: use the node-native slab memory counters Johannes Weiner
2017-05-30 18:17   ` Johannes Weiner
2017-06-03 17:39   ` Vladimir Davydov
2017-06-03 17:39     ` Vladimir Davydov
2017-05-30 18:17 ` [PATCH 4/6] mm: memcontrol: use generic mod_memcg_page_state for kmem pages Johannes Weiner
2017-05-30 18:17   ` Johannes Weiner
2017-06-03 17:40   ` Vladimir Davydov
2017-06-03 17:40     ` Vladimir Davydov
2017-05-30 18:17 ` [PATCH 5/6] mm: memcontrol: per-lruvec stats infrastructure Johannes Weiner
2017-05-30 18:17   ` Johannes Weiner
2017-05-31 17:14   ` Johannes Weiner
2017-05-31 17:14     ` Johannes Weiner
2017-05-31 18:18     ` Andrew Morton
2017-05-31 18:18       ` Andrew Morton
2017-05-31 19:02       ` Tony Lindgren
2017-05-31 19:02         ` Tony Lindgren
2017-05-31 22:03         ` Stephen Rothwell
2017-05-31 22:03           ` Stephen Rothwell
2017-06-01  1:44       ` Johannes Weiner
2017-06-01  1:44         ` Johannes Weiner
2017-06-03 17:50   ` Vladimir Davydov
2017-06-03 17:50     ` Vladimir Davydov
2017-06-05 17:53     ` Johannes Weiner
2017-06-05 17:53       ` Johannes Weiner
2017-05-30 18:17 ` [PATCH 6/6] mm: memcontrol: account slab stats per lruvec Johannes Weiner
2017-05-30 18:17   ` Johannes Weiner
2017-06-03 17:54   ` Vladimir Davydov
2017-06-03 17:54     ` Vladimir Davydov
2017-06-05 16:52   ` [6/6] " Guenter Roeck
2017-06-05 16:52     ` Guenter Roeck
2017-06-05 17:52     ` Johannes Weiner
2017-06-05 17:52       ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.