[5/6] mm: memcontrol: per-lruvec stats infrastructure
diff mbox series

Message ID 20170530181724.27197-6-hannes@cmpxchg.org
State New, archived
Headers show
Series
  • mm: per-lruvec slab stats
Related show

Commit Message

Johannes Weiner May 30, 2017, 6:17 p.m. UTC
lruvecs are at the intersection of the NUMA node and memcg, which is
the scope for most paging activity.

Introduce a convenient accounting infrastructure that maintains
statistics per node, per memcg, and the lruvec itself.

Then convert over accounting sites for statistics that are already
tracked in both nodes and memcgs and can be easily switched.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
 include/linux/vmstat.h     |   1 -
 mm/memcontrol.c            |   6 ++
 mm/page-writeback.c        |  15 +--
 mm/rmap.c                  |   8 +-
 mm/workingset.c            |   9 +-
 6 files changed, 225 insertions(+), 52 deletions(-)

Comments

Johannes Weiner May 31, 2017, 5:14 p.m. UTC | #1
Andrew, the 0day tester found a crash with this when special pages get
faulted. They're not charged to any cgroup and we'll deref NULL.

Can you include the following fix on top of this patch please? Thanks!

---

>From 0ea9bdb1b425a6c943a65c02164d4ca51815fdc4 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Wed, 31 May 2017 12:57:28 -0400
Subject: [PATCH] mm: memcontrol: per-lruvec stats infrastructure fix

Fix the following crash in the new cgroup stat keeping code:

Freeing unused kernel memory: 856K
Write protecting the kernel read-only data: 8192k
Freeing unused kernel memory: 1104K
Freeing unused kernel memory: 588K
page:ffffea000005d8c0 count:2 mapcount:1 mapping:          (null) index:0x0
flags: 0x800000000000801(locked|reserved)
raw: 0800000000000801 0000000000000000 0000000000000000 0000000200000000
raw: ffffea000005d8e0 ffffea000005d8e0 0000000000000000 0000000000000000
page dumped because: not cgrouped, will crash
BUG: unable to handle kernel NULL pointer dereference at 00000000000004d8
IP: page_add_file_rmap+0x56/0xf0
PGD 0
P4D 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 1 Comm: init Not tainted 4.12.0-rc2-00065-g390160f076be-dirty #326
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
task: ffff88007d380000 task.stack: ffffc9000031c000
RIP: 0010:page_add_file_rmap+0x56/0xf0
RSP: 0000:ffffc9000031fd88 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffea000005d8c0 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007ffde000
RBP: ffffc9000031fd98 R08: 0000000000000003 R09: 0000000000000000
R10: ffffc9000031fd18 R11: 0000000000000000 R12: ffff88007ffdfab8
R13: ffffea000005d8c0 R14: ffff88007c76d508 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000004d8 CR3: 000000007c76c000 CR4: 00000000000006b0
Call Trace:
 alloc_set_pte+0xb5/0x2f0
 finish_fault+0x2b/0x50
 __handle_mm_fault+0x3e5/0xb90
 handle_mm_fault+0x284/0x340
 __do_page_fault+0x1fb/0x410
 do_page_fault+0xc/0x10
 page_fault+0x22/0x30

This is a special page being faulted, and these will never be charged
to a cgroup. Assume the root cgroup for uncharged pages to fix this.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a282eb2a6cc3..bea6f08e9e16 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -585,18 +585,26 @@ static inline void mod_lruvec_state(struct lruvec *lruvec,
 static inline void __mod_lruvec_page_state(struct page *page,
 					   enum node_stat_item idx, int val)
 {
+	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 
-	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	/* Special pages in the VM aren't charged, use root */
+	memcg = page->mem_cgroup ? : root_mem_cgroup;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
 	__mod_lruvec_state(lruvec, idx, val);
 }
 
 static inline void mod_lruvec_page_state(struct page *page,
 					 enum node_stat_item idx, int val)
 {
+	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 
-	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	/* Special pages in the VM aren't charged, use root */
+	memcg = page->mem_cgroup ? : root_mem_cgroup;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
 	mod_lruvec_state(lruvec, idx, val);
 }
Andrew Morton May 31, 2017, 6:18 p.m. UTC | #2
On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Andrew, the 0day tester found a crash with this when special pages get
> faulted. They're not charged to any cgroup and we'll deref NULL.
> 
> Can you include the following fix on top of this patch please? Thanks!

OK.  But this won't fix the init ordering crash which the arm folks are
seeing?

I'm wondering if we should ask Stephen to drop

mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
mm-memcontrol-use-the-node-native-slab-memory-counters.patch
mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
mm-memcontrol-per-lruvec-stats-infrastructure.patch
mm-memcontrol-account-slab-stats-per-lruvec.patch

until that is sorted?
Tony Lindgren May 31, 2017, 7:02 p.m. UTC | #3
* Andrew Morton <akpm@linux-foundation.org> [170531 11:21]:
> On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Andrew, the 0day tester found a crash with this when special pages get
> > faulted. They're not charged to any cgroup and we'll deref NULL.
> > 
> > Can you include the following fix on top of this patch please? Thanks!
> 
> OK.  But this won't fix the init ordering crash which the arm folks are
> seeing?

That's correct, the ordering crash is a separate problem.

> I'm wondering if we should ask Stephen to drop
> 
> mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> mm-memcontrol-per-lruvec-stats-infrastructure.patch
> mm-memcontrol-account-slab-stats-per-lruvec.patch
> 
> until that is sorted?

Seems like a good idea.

Regards,

Tony
Stephen Rothwell May 31, 2017, 10:03 p.m. UTC | #4
Hi Tony, Andrew,

On Wed, 31 May 2017 12:02:10 -0700 Tony Lindgren <tony@atomide.com> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> [170531 11:21]:
> > On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> >   
> > > Andrew, the 0day tester found a crash with this when special pages get
> > > faulted. They're not charged to any cgroup and we'll deref NULL.
> > > 
> > > Can you include the following fix on top of this patch please? Thanks!  
> > 
> > OK.  But this won't fix the init ordering crash which the arm folks are
> > seeing?  
> 
> That's correct, the ordering crash is a separate problem.
> 
> > I'm wondering if we should ask Stephen to drop
> > 
> > mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> > mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> > mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> > mm-memcontrol-per-lruvec-stats-infrastructure.patch
> > mm-memcontrol-account-slab-stats-per-lruvec.patch
> > 
> > until that is sorted?  
> 
> Seems like a good idea.

OK, I have removed them.
Johannes Weiner June 1, 2017, 1:44 a.m. UTC | #5
On Wed, May 31, 2017 at 11:18:21AM -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 13:14:50 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Andrew, the 0day tester found a crash with this when special pages get
> > faulted. They're not charged to any cgroup and we'll deref NULL.
> > 
> > Can you include the following fix on top of this patch please? Thanks!
> 
> OK.  But this won't fix the init ordering crash which the arm folks are
> seeing?
> 
> I'm wondering if we should ask Stephen to drop
> 
> mm-vmstat-move-slab-statistics-from-zone-to-node-counters.patch
> mm-memcontrol-use-the-node-native-slab-memory-counters.patch
> mm-memcontrol-use-generic-mod_memcg_page_state-for-kmem-pages.patch
> mm-memcontrol-per-lruvec-stats-infrastructure.patch
> mm-memcontrol-account-slab-stats-per-lruvec.patch

Sorry about the wreckage.

Dropping these makes sense to me for the time being.

I'll fix up the init ordering issue.
Vladimir Davydov June 3, 2017, 5:50 p.m. UTC | #6
On Tue, May 30, 2017 at 02:17:23PM -0400, Johannes Weiner wrote:
> lruvecs are at the intersection of the NUMA node and memcg, which is
> the scope for most paging activity.
> 
> Introduce a convenient accounting infrastructure that maintains
> statistics per node, per memcg, and the lruvec itself.
> 
> Then convert over accounting sites for statistics that are already
> tracked in both nodes and memcgs and can be easily switched.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
>  include/linux/vmstat.h     |   1 -
>  mm/memcontrol.c            |   6 ++
>  mm/page-writeback.c        |  15 +--
>  mm/rmap.c                  |   8 +-
>  mm/workingset.c            |   9 +-
>  6 files changed, 225 insertions(+), 52 deletions(-)
> 
...
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9c68a40c83e3..e37908606c0f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	if (!pn)
>  		return 1;
>  
> +	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
> +	if (!pn->lruvec_stat) {
> +		kfree(pn);
> +		return 1;
> +	}
> +
>  	lruvec_init(&pn->lruvec);
>  	pn->usage_in_excess = 0;
>  	pn->on_tree = false;

I don't see the matching free_percpu() anywhere, forget to patch
free_mem_cgroup_per_node_info()?

Other than that and with the follow-up fix applied, this patch
is good IMO.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Johannes Weiner June 5, 2017, 5:53 p.m. UTC | #7
On Sat, Jun 03, 2017 at 08:50:02PM +0300, Vladimir Davydov wrote:
> On Tue, May 30, 2017 at 02:17:23PM -0400, Johannes Weiner wrote:
> > lruvecs are at the intersection of the NUMA node and memcg, which is
> > the scope for most paging activity.
> > 
> > Introduce a convenient accounting infrastructure that maintains
> > statistics per node, per memcg, and the lruvec itself.
> > 
> > Then convert over accounting sites for statistics that are already
> > tracked in both nodes and memcgs and can be easily switched.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  include/linux/memcontrol.h | 238 +++++++++++++++++++++++++++++++++++++++------
> >  include/linux/vmstat.h     |   1 -
> >  mm/memcontrol.c            |   6 ++
> >  mm/page-writeback.c        |  15 +--
> >  mm/rmap.c                  |   8 +-
> >  mm/workingset.c            |   9 +-
> >  6 files changed, 225 insertions(+), 52 deletions(-)
> > 
> ...
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 9c68a40c83e3..e37908606c0f 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4122,6 +4122,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> >  	if (!pn)
> >  		return 1;
> >  
> > +	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
> > +	if (!pn->lruvec_stat) {
> > +		kfree(pn);
> > +		return 1;
> > +	}
> > +
> >  	lruvec_init(&pn->lruvec);
> >  	pn->usage_in_excess = 0;
> >  	pn->on_tree = false;
> 
> I don't see the matching free_percpu() anywhere, forget to patch
> free_mem_cgroup_per_node_info()?

Yes, I missed that.

---

>From 4d09a522a2182acae4e36ded4211d05defd75b74 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 5 Jun 2017 10:59:41 -0400
Subject: [PATCH] mm: memcontrol: per-lruvec stats infrastructure fix 3

As pointed out, there is a missing free_percpu() for the lruvec_stat
object in the memcg's per node info. Add this.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e37908606c0f..093fe7e06e51 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4139,7 +4139,10 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 
 static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
-	kfree(memcg->nodeinfo[node]);
+	struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
+
+	free_percpu(pn->lruvec_stat);
+	kfree(pn);
 }
 
 static void __mem_cgroup_free(struct mem_cgroup *memcg)

Patch
diff mbox series

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 62139aff6033..a282eb2a6cc3 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -26,7 +26,8 @@ 
 #include <linux/page_counter.h>
 #include <linux/vmpressure.h>
 #include <linux/eventfd.h>
-#include <linux/mmzone.h>
+#include <linux/mm.h>
+#include <linux/vmstat.h>
 #include <linux/writeback.h>
 #include <linux/page-flags.h>
 
@@ -98,11 +99,16 @@  struct mem_cgroup_reclaim_iter {
 	unsigned int generation;
 };
 
+struct lruvec_stat {
+	long count[NR_VM_NODE_STAT_ITEMS];
+};
+
 /*
  * per-zone information in memory controller.
  */
 struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
+	struct lruvec_stat __percpu *lruvec_stat;
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
 
 	struct mem_cgroup_reclaim_iter	iter[DEF_PRIORITY + 1];
@@ -485,23 +491,18 @@  static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 	return val;
 }
 
-static inline void mod_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx, int val)
+static inline void __mod_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx, int val)
 {
 	if (!mem_cgroup_disabled())
-		this_cpu_add(memcg->stat->count[idx], val);
-}
-
-static inline void inc_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
-{
-	mod_memcg_state(memcg, idx, 1);
+		__this_cpu_add(memcg->stat->count[idx], val);
 }
 
-static inline void dec_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void mod_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx, int val)
 {
-	mod_memcg_state(memcg, idx, -1);
+	if (!mem_cgroup_disabled())
+		this_cpu_add(memcg->stat->count[idx], val);
 }
 
 /**
@@ -521,6 +522,13 @@  static inline void dec_memcg_state(struct mem_cgroup *memcg,
  *
  * Kernel pages are an exception to this, since they'll never move.
  */
+static inline void __mod_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx, int val)
+{
+	if (page->mem_cgroup)
+		__mod_memcg_state(page->mem_cgroup, idx, val);
+}
+
 static inline void mod_memcg_page_state(struct page *page,
 					enum memcg_stat_item idx, int val)
 {
@@ -528,16 +536,68 @@  static inline void mod_memcg_page_state(struct page *page,
 		mod_memcg_state(page->mem_cgroup, idx, val);
 }
 
-static inline void inc_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
+					      enum node_stat_item idx)
 {
-	mod_memcg_page_state(page, idx, 1);
+	struct mem_cgroup_per_node *pn;
+	long val = 0;
+	int cpu;
+
+	if (mem_cgroup_disabled())
+		return node_page_state(lruvec_pgdat(lruvec), idx);
+
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	for_each_possible_cpu(cpu)
+		val += per_cpu(pn->lruvec_stat->count[idx], cpu);
+
+	if (val < 0)
+		val = 0;
+
+	return val;
 }
 
-static inline void dec_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline void __mod_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx, int val)
 {
-	mod_memcg_page_state(page, idx, -1);
+	struct mem_cgroup_per_node *pn;
+
+	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	if (mem_cgroup_disabled())
+		return;
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	__mod_memcg_state(pn->memcg, idx, val);
+	__this_cpu_add(pn->lruvec_stat->count[idx], val);
+}
+
+static inline void mod_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx, int val)
+{
+	struct mem_cgroup_per_node *pn;
+
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	if (mem_cgroup_disabled())
+		return;
+	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	mod_memcg_state(pn->memcg, idx, val);
+	this_cpu_add(pn->lruvec_stat->count[idx], val);
+}
+
+static inline void __mod_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx, int val)
+{
+	struct lruvec *lruvec;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	__mod_lruvec_state(lruvec, idx, val);
+}
+
+static inline void mod_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx, int val)
+{
+	struct lruvec *lruvec;
+
+	lruvec = mem_cgroup_lruvec(page_pgdat(page), page->mem_cgroup);
+	mod_lruvec_state(lruvec, idx, val);
 }
 
 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
@@ -743,19 +803,21 @@  static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 	return 0;
 }
 
-static inline void mod_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx,
-				   int nr)
+static inline void __mod_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx,
+				     int nr)
 {
 }
 
-static inline void inc_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void mod_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx,
+				   int nr)
 {
 }
 
-static inline void dec_memcg_state(struct mem_cgroup *memcg,
-				   enum memcg_stat_item idx)
+static inline void __mod_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx,
+					  int nr)
 {
 }
 
@@ -765,14 +827,34 @@  static inline void mod_memcg_page_state(struct page *page,
 {
 }
 
-static inline void inc_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
+					      enum node_stat_item idx)
 {
+	return node_page_state(lruvec_pgdat(lruvec), idx);
 }
 
-static inline void dec_memcg_page_state(struct page *page,
-					enum memcg_stat_item idx)
+static inline void __mod_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx, int val)
+{
+	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+}
+
+static inline void mod_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx, int val)
+{
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+}
+
+static inline void __mod_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx, int val)
+{
+	__mod_node_page_state(page_pgdat(page), idx, val);
+}
+
+static inline void mod_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx, int val)
 {
+	mod_node_page_state(page_pgdat(page), idx, val);
 }
 
 static inline
@@ -793,6 +875,102 @@  void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 }
 #endif /* CONFIG_MEMCG */
 
+static inline void __inc_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx)
+{
+	__mod_memcg_state(memcg, idx, 1);
+}
+
+static inline void __dec_memcg_state(struct mem_cgroup *memcg,
+				     enum memcg_stat_item idx)
+{
+	__mod_memcg_state(memcg, idx, -1);
+}
+
+static inline void __inc_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx)
+{
+	__mod_memcg_page_state(page, idx, 1);
+}
+
+static inline void __dec_memcg_page_state(struct page *page,
+					  enum memcg_stat_item idx)
+{
+	__mod_memcg_page_state(page, idx, -1);
+}
+
+static inline void __inc_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx)
+{
+	__mod_lruvec_state(lruvec, idx, 1);
+}
+
+static inline void __dec_lruvec_state(struct lruvec *lruvec,
+				      enum node_stat_item idx)
+{
+	__mod_lruvec_state(lruvec, idx, -1);
+}
+
+static inline void __inc_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx)
+{
+	__mod_lruvec_page_state(page, idx, 1);
+}
+
+static inline void __dec_lruvec_page_state(struct page *page,
+					   enum node_stat_item idx)
+{
+	__mod_lruvec_page_state(page, idx, -1);
+}
+
+static inline void inc_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx)
+{
+	mod_memcg_state(memcg, idx, 1);
+}
+
+static inline void dec_memcg_state(struct mem_cgroup *memcg,
+				   enum memcg_stat_item idx)
+{
+	mod_memcg_state(memcg, idx, -1);
+}
+
+static inline void inc_memcg_page_state(struct page *page,
+					enum memcg_stat_item idx)
+{
+	mod_memcg_page_state(page, idx, 1);
+}
+
+static inline void dec_memcg_page_state(struct page *page,
+					enum memcg_stat_item idx)
+{
+	mod_memcg_page_state(page, idx, -1);
+}
+
+static inline void inc_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx)
+{
+	mod_lruvec_state(lruvec, idx, 1);
+}
+
+static inline void dec_lruvec_state(struct lruvec *lruvec,
+				    enum node_stat_item idx)
+{
+	mod_lruvec_state(lruvec, idx, -1);
+}
+
+static inline void inc_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx)
+{
+	mod_lruvec_page_state(page, idx, 1);
+}
+
+static inline void dec_lruvec_page_state(struct page *page,
+					 enum node_stat_item idx)
+{
+	mod_lruvec_page_state(page, idx, -1);
+}
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg);
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 613771909b6e..b3d85f30d424 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -3,7 +3,6 @@ 
 
 #include <linux/types.h>
 #include <linux/percpu.h>
-#include <linux/mm.h>
 #include <linux/mmzone.h>
 #include <linux/vm_event_item.h>
 #include <linux/atomic.h>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9c68a40c83e3..e37908606c0f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4122,6 +4122,12 @@  static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	if (!pn)
 		return 1;
 
+	pn->lruvec_stat = alloc_percpu(struct lruvec_stat);
+	if (!pn->lruvec_stat) {
+		kfree(pn);
+		return 1;
+	}
+
 	lruvec_init(&pn->lruvec);
 	pn->usage_in_excess = 0;
 	pn->on_tree = false;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 143c1c25d680..8989eada0ef7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2433,8 +2433,7 @@  void account_page_dirtied(struct page *page, struct address_space *mapping)
 		inode_attach_wb(inode, page);
 		wb = inode_to_wb(inode);
 
-		inc_memcg_page_state(page, NR_FILE_DIRTY);
-		__inc_node_page_state(page, NR_FILE_DIRTY);
+		__inc_lruvec_page_state(page, NR_FILE_DIRTY);
 		__inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		__inc_node_page_state(page, NR_DIRTIED);
 		__inc_wb_stat(wb, WB_RECLAIMABLE);
@@ -2455,8 +2454,7 @@  void account_page_cleaned(struct page *page, struct address_space *mapping,
 			  struct bdi_writeback *wb)
 {
 	if (mapping_cap_account_dirty(mapping)) {
-		dec_memcg_page_state(page, NR_FILE_DIRTY);
-		dec_node_page_state(page, NR_FILE_DIRTY);
+		dec_lruvec_page_state(page, NR_FILE_DIRTY);
 		dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		dec_wb_stat(wb, WB_RECLAIMABLE);
 		task_io_account_cancelled_write(PAGE_SIZE);
@@ -2712,8 +2710,7 @@  int clear_page_dirty_for_io(struct page *page)
 		 */
 		wb = unlocked_inode_to_wb_begin(inode, &locked);
 		if (TestClearPageDirty(page)) {
-			dec_memcg_page_state(page, NR_FILE_DIRTY);
-			dec_node_page_state(page, NR_FILE_DIRTY);
+			dec_lruvec_page_state(page, NR_FILE_DIRTY);
 			dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 			dec_wb_stat(wb, WB_RECLAIMABLE);
 			ret = 1;
@@ -2759,8 +2756,7 @@  int test_clear_page_writeback(struct page *page)
 		ret = TestClearPageWriteback(page);
 	}
 	if (ret) {
-		dec_memcg_page_state(page, NR_WRITEBACK);
-		dec_node_page_state(page, NR_WRITEBACK);
+		dec_lruvec_page_state(page, NR_WRITEBACK);
 		dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		inc_node_page_state(page, NR_WRITTEN);
 	}
@@ -2814,8 +2810,7 @@  int __test_set_page_writeback(struct page *page, bool keep_write)
 		ret = TestSetPageWriteback(page);
 	}
 	if (!ret) {
-		inc_memcg_page_state(page, NR_WRITEBACK);
-		inc_node_page_state(page, NR_WRITEBACK);
+		inc_lruvec_page_state(page, NR_WRITEBACK);
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
 	unlock_page_memcg(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index d405f0e0ee96..8ee842aa06ee 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1157,8 +1157,7 @@  void page_add_file_rmap(struct page *page, bool compound)
 		if (!atomic_inc_and_test(&page->_mapcount))
 			goto out;
 	}
-	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, nr);
-	mod_memcg_page_state(page, NR_FILE_MAPPED, nr);
+	__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
 out:
 	unlock_page_memcg(page);
 }
@@ -1193,12 +1192,11 @@  static void page_remove_file_rmap(struct page *page, bool compound)
 	}
 
 	/*
-	 * We use the irq-unsafe __{inc|mod}_zone_page_state because
+	 * We use the irq-unsafe __{inc|mod}_lruvec_page_state because
 	 * these counters are not modified in interrupt context, and
 	 * pte lock(a spinlock) is held, which implies preemption disabled.
 	 */
-	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, -nr);
-	mod_memcg_page_state(page, NR_FILE_MAPPED, -nr);
+	__mod_lruvec_page_state(page, NR_FILE_MAPPED, -nr);
 
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);
diff --git a/mm/workingset.c b/mm/workingset.c
index b8c9ab678479..7119cd745ace 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -288,12 +288,10 @@  bool workingset_refault(void *shadow)
 	 */
 	refault_distance = (refault - eviction) & EVICTION_MASK;
 
-	inc_node_state(pgdat, WORKINGSET_REFAULT);
-	inc_memcg_state(memcg, WORKINGSET_REFAULT);
+	inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
 
 	if (refault_distance <= active_file) {
-		inc_node_state(pgdat, WORKINGSET_ACTIVATE);
-		inc_memcg_state(memcg, WORKINGSET_ACTIVATE);
+		inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
 		rcu_read_unlock();
 		return true;
 	}
@@ -474,8 +472,7 @@  static enum lru_status shadow_lru_isolate(struct list_head *item,
 	}
 	if (WARN_ON_ONCE(node->exceptional))
 		goto out_invalid;
-	inc_node_state(page_pgdat(virt_to_page(node)), WORKINGSET_NODERECLAIM);
-	inc_memcg_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
+	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
 	__radix_tree_delete_node(&mapping->page_tree, node,
 				 workingset_update_node, mapping);