linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Li RongQing <lirongqing@baidu.com>
To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: [PATCH 2/2] fs/writeback: do memory cgroup related writeback firstly
Date: Wed,  1 Aug 2018 18:48:36 +0800	[thread overview]
Message-ID: <1533120516-18279-2-git-send-email-lirongqing@baidu.com> (raw)
In-Reply-To: <1533120516-18279-1-git-send-email-lirongqing@baidu.com>

When a machine has hundreds of memory cgroups, and some cgroups
generate more or less dirty pages, but a cgroup of them has lots
of memory pressure and always tries to reclaim dirty page, then it
will trigger all cgroups to writeback, which is less efficient:

1.if the used memory in a memory cgroup reaches its limit,
it is useless to writeback other cgroups.
2.other cgroups can wait more time to merge write request

so replace the full flush with flushing writeback of memory cgroup
whose tasks tries to reclaim memory and trigger writeback, if
nothing is writeback, then fallback a full flush

After this patch, the writing performance enhance 5% in below setup:
  $mount -t cgroup none -o memory /cgroups/memory/
  $mkdir /cgroups/memory/x1
  $echo $$ > /cgroups/memory/x1/tasks
  $echo 100M > /cgroups/memory/x1/memory.limit_in_bytes
  $cd /cgroups/memory/
  $seq 10000|xargs  mkdir
  $fio -filename=/home/test1 -direct=0 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16k -size=20G
Before:
WRITE: io=20480MB, aggrb=779031KB/s, minb=779031KB/s, maxb=779031KB/s, mint=26920msec, maxt=26920msec
After:
WRITE: io=20480MB, aggrb=831708KB/s, minb=831708KB/s, maxb=831708KB/s, mint=25215msec, maxt=25215msec

And this patch can reduce io util in this condition, like there
is two disks, one disks is used to store all kinds of logs, it
should be less io pressure, and other is used to store hadoop data
which will write lots of data to disk, but both disk io utils are
high in fact, since when hadoop reclaims memory, it will wake all
memory cgroup writeback.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 fs/fs-writeback.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 471d863958bc..475cada5d1cf 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -35,6 +35,11 @@
  */
 #define MIN_WRITEBACK_PAGES	(4096UL >> (PAGE_SHIFT - 10))
 
+/*
+ * if WB cgroup dirty pages is bigger than it, not start a full flush
+ */
+#define MIN_WB_DIRTY_PAGES 64
+
 struct wb_completion {
 	atomic_t		cnt;
 };
@@ -2005,6 +2010,32 @@ void wakeup_flusher_threads(enum wb_reason reason)
 	if (blk_needs_flush_plug(current))
 		blk_schedule_flush_plug(current);
 
+#ifdef CONFIG_CGROUP_WRITEBACK
+	if (reason == WB_REASON_VMSCAN) {
+		unsigned long tmp, pdirty = 0;
+
+		rcu_read_lock();
+		list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
+			struct bdi_writeback *wb = wb_find_current(bdi);
+
+			if (wb) {
+				tmp = mem_cgroup_wb_dirty_stats(wb);
+				if (tmp) {
+					pdirty += tmp;
+					wb_start_writeback(wb, reason);
+
+					if (wb == &bdi->wb)
+						pdirty += MIN_WB_DIRTY_PAGES;
+				}
+			}
+		}
+		rcu_read_unlock();
+
+		if (pdirty > MIN_WB_DIRTY_PAGES)
+			return;
+	}
+#endif
+
 	rcu_read_lock();
 	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)
 		__wakeup_flusher_threads_bdi(bdi, reason);
-- 
2.16.2

  reply	other threads:[~2018-08-01 12:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01 10:48 [PATCH 1/2] mm: add a function to return a bdi_writeback dirty page statistic Li RongQing
2018-08-01 10:48 ` Li RongQing [this message]
2018-08-01 11:16   ` [PATCH 2/2] fs/writeback: do memory cgroup related writeback firstly Michal Hocko
2018-08-01 11:03 ` [PATCH 1/2] mm: add a function to return a bdi_writeback dirty page statistic Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1533120516-18279-2-git-send-email-lirongqing@baidu.com \
    --to=lirongqing@baidu.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).