All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sha Zhengju <handai.szj@gmail.com>
To: akpm@linux-foundation.org
Cc: mhocko@suse.cz, kamezawa.hiroyu@jp.fujitsu.com,
	gthelen@google.com, linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Sha Zhengju <handai.szj@taobao.com>
Subject: [PATCH 3/4] memcg: add per cgroup writeback pages accounting
Date: Thu, 22 Aug 2013 17:53:10 +0800	[thread overview]
Message-ID: <1377165190-24143-1-git-send-email-handai.szj@taobao.com> (raw)
In-Reply-To: <CAFj3OHXy5XkwhxKk=WNywp2pq__FD7BrSQwFkp+NZj15_k6BEQ@mail.gmail.com>

From: Sha Zhengju <handai.szj@taobao.com>

This patch is to add memcg routines to count writeback pages, later dirty pages will
also be accounted.

After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we can
use 'struct page' flag to test page state instead of per page_cgroup flag. But memcg
has a feature to move a page from a cgroup to another one and may have race between
"move" and "page stat accounting". So in order to avoid the race we have designed a
new lock:

         mem_cgroup_begin_update_page_stat()
         modify page information        -->(a)
         mem_cgroup_update_page_stat()  -->(b)
         mem_cgroup_end_update_page_stat()

It requires both (a) and (b)(writeback pages accounting) to be pretected in
mem_cgroup_{begin/end}_update_page_stat(). It's full no-op for !CONFIG_MEMCG, almost
no-op if memcg is disabled (but compiled in), rcu read lock in the most cases
(no task is moving), and spin_lock_irqsave on top in the slow path.

There're two writeback interfaces to modify: test_{clear/set}_page_writeback().
And the lock order is:
	--> memcg->move_lock
	  --> mapping->tree_lock

Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Greg Thelen <gthelen@google.com>
---
 include/linux/memcontrol.h |    1 +
 mm/memcontrol.c            |   30 +++++++++++++++++++++++-------
 mm/page-writeback.c        |   15 +++++++++++++++
 3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 630bfa1..1878644 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -42,6 +42,7 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_RSS,		/* # of pages charged as anon rss */
 	MEM_CGROUP_STAT_RSS_HUGE,	/* # of pages charged as anon huge */
 	MEM_CGROUP_STAT_FILE_MAPPED,	/* # of pages charged as file rss */
+	MEM_CGROUP_STAT_WRITEBACK,	/* # of pages under writeback */
 	MEM_CGROUP_STAT_SWAP,		/* # of pages, swapped out */
 	MEM_CGROUP_STAT_NSTATS,
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0a50871..869bfaf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -89,6 +89,7 @@ static const char * const mem_cgroup_stat_names[] = {
 	"rss",
 	"rss_huge",
 	"mapped_file",
+	"writeback",
 	"swap",
 };
 
@@ -3675,6 +3676,20 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+static inline
+void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
+					struct mem_cgroup *to,
+					unsigned int nr_pages,
+					enum mem_cgroup_stat_index idx)
+{
+	/* Update stat data for mem_cgroup */
+	preempt_disable();
+	WARN_ON_ONCE(from->stat->count[idx] < nr_pages);
+	__this_cpu_add(from->stat->count[idx], -nr_pages);
+	__this_cpu_add(to->stat->count[idx], nr_pages);
+	preempt_enable();
+}
+
 /**
  * mem_cgroup_move_account - move account of the page
  * @page: the page
@@ -3720,13 +3735,14 @@ static int mem_cgroup_move_account(struct page *page,
 
 	move_lock_mem_cgroup(from, &flags);
 
-	if (!anon && page_mapped(page)) {
-		/* Update mapped_file data for mem_cgroup */
-		preempt_disable();
-		__this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		__this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		preempt_enable();
-	}
+	if (!anon && page_mapped(page))
+		mem_cgroup_move_account_page_stat(from, to, nr_pages,
+			MEM_CGROUP_STAT_FILE_MAPPED);
+
+	if (PageWriteback(page))
+		mem_cgroup_move_account_page_stat(from, to, nr_pages,
+			MEM_CGROUP_STAT_WRITEBACK);
+
 	mem_cgroup_charge_statistics(from, page, anon, -nr_pages);
 
 	/* caller should have done css_get */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ddc56fc..b6ecbc6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1999,11 +1999,17 @@ EXPORT_SYMBOL(account_page_dirtied);
 
 /*
  * Helper function for set_page_writeback family.
+ *
+ * The caller must hold mem_cgroup_begin/end_update_page_stat() lock
+ * while calling this function.
+ * See test_set_page_writeback for example.
+ *
  * NOTE: Unlike account_page_dirtied this does not rely on being atomic
  * wrt interrupts.
  */
 void account_page_writeback(struct page *page)
 {
+	mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_WRITEBACK);
 	inc_zone_page_state(page, NR_WRITEBACK);
 }
 EXPORT_SYMBOL(account_page_writeback);
@@ -2220,7 +2226,10 @@ int test_clear_page_writeback(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 	int ret;
+	bool locked;
+	unsigned long memcg_flags;
 
+	mem_cgroup_begin_update_page_stat(page, &locked, &memcg_flags);
 	if (mapping) {
 		struct backing_dev_info *bdi = mapping->backing_dev_info;
 		unsigned long flags;
@@ -2241,9 +2250,11 @@ int test_clear_page_writeback(struct page *page)
 		ret = TestClearPageWriteback(page);
 	}
 	if (ret) {
+		mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_WRITEBACK);
 		dec_zone_page_state(page, NR_WRITEBACK);
 		inc_zone_page_state(page, NR_WRITTEN);
 	}
+	mem_cgroup_end_update_page_stat(page, &locked, &memcg_flags);
 	return ret;
 }
 
@@ -2251,7 +2262,10 @@ int test_set_page_writeback(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 	int ret;
+	bool locked;
+	unsigned long memcg_flags;
 
+	mem_cgroup_begin_update_page_stat(page, &locked, &memcg_flags);
 	if (mapping) {
 		struct backing_dev_info *bdi = mapping->backing_dev_info;
 		unsigned long flags;
@@ -2278,6 +2292,7 @@ int test_set_page_writeback(struct page *page)
 	}
 	if (!ret)
 		account_page_writeback(page);
+	mem_cgroup_end_update_page_stat(page, &locked, &memcg_flags);
 	return ret;
 
 }
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-08-22  9:53 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-01 11:43 [PATCH V5 0/8] Add memcg dirty/writeback page accounting Sha Zhengju
2013-08-01 11:43 ` Sha Zhengju
2013-08-01 11:44 ` [PATCH 1/8] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju
2013-08-01 11:51 ` [PATCH V5 2/8] fs/ceph: vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem Sha Zhengju
2013-08-01 15:19   ` Yan, Zheng
2013-08-01 18:27     ` Sage Weil
2013-08-02 10:04       ` Sha Zhengju
     [not found]         ` <CAFj3OHVXvtr5BDMrGatHZi7M9y+dh1ZKRMQZGjZmNBcg3pNQtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-02 20:30           ` Sage Weil
2013-08-02 20:30             ` Sage Weil
2013-08-03  8:58             ` Sha Zhengju
2013-08-02  9:04     ` Sha Zhengju
2013-08-02  9:04       ` Sha Zhengju
2013-08-02 13:11       ` Yan, Zheng
2013-08-01 11:52 ` [PATCH V5 3/8] memcg: check for proper lock held in mem_cgroup_update_page_stat Sha Zhengju
2013-08-01 14:34   ` Michal Hocko
2013-08-01 14:34     ` Michal Hocko
2013-08-04 18:48   ` Greg Thelen
2013-08-04 18:48     ` Greg Thelen
2013-08-01 11:53 ` [PATCH V5 4/8] memcg: add per cgroup dirty pages accounting Sha Zhengju
2013-08-01 11:54 ` [PATCH V5 5/8] memcg: add per cgroup writeback " Sha Zhengju
2013-08-01 14:53   ` Michal Hocko
     [not found]     ` <20130801145302.GJ5198-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-08-03  9:25       ` Sha Zhengju
2013-08-03  9:25         ` Sha Zhengju
     [not found]         ` <CAFj3OHV-VCKJfe6bv4UMvv+uj4LELDXsieRZFJD06Yrdyy=XxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-04 10:08           ` Michal Hocko
2013-08-04 10:08             ` Michal Hocko
2013-08-22  9:46             ` Fwd: " Sha Zhengju
2013-08-22  9:46               ` Sha Zhengju
2013-08-22  9:50               ` [PATCH 1/4] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju
2013-08-22  9:52               ` [PATCH 2/4] memcg: check for proper lock held in mem_cgroup_update_page_stat Sha Zhengju
2013-08-22  9:53               ` Sha Zhengju [this message]
2013-08-22 22:40                 ` [PATCH 3/4] memcg: add per cgroup writeback pages accounting Andrew Morton
2013-08-23 16:11                   ` Sha Zhengju
2013-08-23 16:11                     ` Sha Zhengju
2013-08-22  9:53               ` [PATCH 4/4] memcg: Document cgroup dirty/writeback memory statistics Sha Zhengju
2013-08-04 18:51   ` [PATCH V5 5/8] memcg: add per cgroup writeback pages accounting Greg Thelen
2013-08-04 18:51     ` Greg Thelen
2013-08-01 11:55 ` [PATCH V5 6/8] memcg: make nocpu_base available for non-hotplug Sha Zhengju
2013-08-01 12:00 ` [PATCH V5 7/8] memcg: don't account root memcg page stats if only root exists Sha Zhengju
2013-08-01 16:20   ` Johannes Weiner
2013-08-02  4:32     ` Sha Zhengju
2013-08-02  4:32       ` Sha Zhengju
2013-08-05 21:58       ` Johannes Weiner
2013-08-05 21:58         ` Johannes Weiner
2013-08-01 12:00 ` [PATCH V5 8/8] memcg: Document cgroup dirty/writeback memory statistics Sha Zhengju
2013-08-01 12:00   ` Sha Zhengju
2013-08-01 14:43 ` [PATCH V5 0/8] Add memcg dirty/writeback page accounting Michal Hocko
2013-08-03  9:30   ` Sha Zhengju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1377165190-24143-1-git-send-email-handai.szj@taobao.com \
    --to=handai.szj@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=handai.szj@taobao.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.