linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] writeback: refine trace event balance_dirty_pages
       [not found] <20220427093241.108281-1-yongmeixie@hotmail.com>
@ 2022-04-27  9:32 ` Xie Yongmei
  2022-04-27  9:32 ` [PATCH 2/3] writeback: per memcg dirty flush Xie Yongmei
  2022-04-27  9:32 ` [PATCH 3/3] writeback: specify writeback period and expire interval per memcg Xie Yongmei
  2 siblings, 0 replies; 5+ messages in thread
From: Xie Yongmei @ 2022-04-27  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel
  Cc: yongmeixie, Xie Yongmei

Patch set "writeback: cgroup writeback support" supports wb for
cgroups. Since then, writeback code introduces two domains to
control the dirty pages, namely global domain and cgroup domain
via pos_ratio in commit c2aa723a6093 ("writeback: implement memcg
writeback domain based throttling")

When one of domains is over freerun level of pages, it enters the
throttle code. Then it computes the position ratio for each of domains
and use the smaller one as a factor to make sure dirty rate keeping
paces with writeout speed.

Unfortunately, the trace code didn't update correspondingly. They still
use bdi as prefix to describe the part propotionally with writeout speed
(AKA feedback).

No functional change.

Signed-off-by: Xie Yongmei <yongmeixie@hotmail.com>
---
 include/trace/events/writeback.h | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 86b2a82da546..0394f425f832 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -624,8 +624,8 @@ TRACE_EVENT(balance_dirty_pages,
 		 unsigned long thresh,
 		 unsigned long bg_thresh,
 		 unsigned long dirty,
-		 unsigned long bdi_thresh,
-		 unsigned long bdi_dirty,
+		 unsigned long wb_thresh,
+		 unsigned long wb_dirty,
 		 unsigned long dirty_ratelimit,
 		 unsigned long task_ratelimit,
 		 unsigned long dirtied,
@@ -633,7 +633,7 @@ TRACE_EVENT(balance_dirty_pages,
 		 long pause,
 		 unsigned long start_time),
 
-	TP_ARGS(wb, thresh, bg_thresh, dirty, bdi_thresh, bdi_dirty,
+	TP_ARGS(wb, thresh, bg_thresh, dirty, wb_thresh, wb_dirty,
 		dirty_ratelimit, task_ratelimit,
 		dirtied, period, pause, start_time),
 
@@ -642,8 +642,8 @@ TRACE_EVENT(balance_dirty_pages,
 		__field(unsigned long,	limit)
 		__field(unsigned long,	setpoint)
 		__field(unsigned long,	dirty)
-		__field(unsigned long,	bdi_setpoint)
-		__field(unsigned long,	bdi_dirty)
+		__field(unsigned long,	wb_setpoint)
+		__field(unsigned long,	wb_dirty)
 		__field(unsigned long,	dirty_ratelimit)
 		__field(unsigned long,	task_ratelimit)
 		__field(unsigned int,	dirtied)
@@ -663,9 +663,9 @@ TRACE_EVENT(balance_dirty_pages,
 		__entry->setpoint	= (global_wb_domain.dirty_limit +
 						freerun) / 2;
 		__entry->dirty		= dirty;
-		__entry->bdi_setpoint	= __entry->setpoint *
-						bdi_thresh / (thresh + 1);
-		__entry->bdi_dirty	= bdi_dirty;
+		__entry->wb_setpoint	= __entry->setpoint *
+						wb_thresh / (thresh + 1);
+		__entry->wb_dirty	= wb_dirty;
 		__entry->dirty_ratelimit = KBps(dirty_ratelimit);
 		__entry->task_ratelimit	= KBps(task_ratelimit);
 		__entry->dirtied	= dirtied;
@@ -681,16 +681,17 @@ TRACE_EVENT(balance_dirty_pages,
 
 	TP_printk("bdi %s: "
 		  "limit=%lu setpoint=%lu dirty=%lu "
-		  "bdi_setpoint=%lu bdi_dirty=%lu "
+		  "wb_setpoint=%lu wb_dirty=%lu "
 		  "dirty_ratelimit=%lu task_ratelimit=%lu "
 		  "dirtied=%u dirtied_pause=%u "
-		  "paused=%lu pause=%ld period=%lu think=%ld cgroup_ino=%lu",
+		  "paused=%lu pause=%ld period=%lu think=%ld "
+		  "cgroup_ino=%lu",
 		  __entry->bdi,
 		  __entry->limit,
 		  __entry->setpoint,
 		  __entry->dirty,
-		  __entry->bdi_setpoint,
-		  __entry->bdi_dirty,
+		  __entry->wb_setpoint,
+		  __entry->wb_dirty,
 		  __entry->dirty_ratelimit,
 		  __entry->task_ratelimit,
 		  __entry->dirtied,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] writeback: per memcg dirty flush
       [not found] <20220427093241.108281-1-yongmeixie@hotmail.com>
  2022-04-27  9:32 ` [PATCH 1/3] writeback: refine trace event balance_dirty_pages Xie Yongmei
@ 2022-04-27  9:32 ` Xie Yongmei
  2022-04-27 10:35   ` Michal Hocko
  2022-04-27  9:32 ` [PATCH 3/3] writeback: specify writeback period and expire interval per memcg Xie Yongmei
  2 siblings, 1 reply; 5+ messages in thread
From: Xie Yongmei @ 2022-04-27  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel
  Cc: yongmeixie, Xie Yongmei

Currently, dirty writeback is under global control. We can tune it by
parameters in /proc/sys/vm/
  - dirty_expire_centisecs: expire interval in centiseconds
  - dirty_writeback_centisecs: periodcal writeback interval in centiseconds
  - dirty_background_bytes/dirty_background_ratio: async writeback
    threshold
  - dirty_bytes/dirty_ratio: sync writeback threshold

Sometimes, we'd like to specify special wrtiteback policy for user
application, especially for offline application in co-location scenerio.

This patch provides dirty flush policy per memcg, user can specify them
in memcg interface.

Actually, writeback code maintains two dimensions of dirty pages control in
balance_dirty_pages.
   - gdtc for global control
   - mdtc for cgroup control

When dirty pages is under both of control, it leaves the check quickly.
Otherwise, it computes the wb threshold (along with bg_threshold) taking
the writeout bandwidth into consideration. And computes position ratio
against wb_thresh for both global control and cgroup control as well.
After that, it takes the smaller one (IOW the strict one) as the factor
to generate task ratelimit based on wb's dirty_ratelimit.

So far, the writeback code can control the dirty limit for both global
view and cgroup view. That means the framework works well for controlling
cgroup's dirty limit.

This patch only provides an extra interface for memcg to tune writeback
behavior.

Signed-off-by: Xie Yongmei <yongmeixie@hotmail.com>
---
 include/linux/memcontrol.h |  22 ++++++
 init/Kconfig               |   7 ++
 mm/memcontrol.c            | 136 +++++++++++++++++++++++++++++++++++++
 mm/page-writeback.c        |  15 +++-
 4 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a68dce3873fc..386fc9b70c95 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -344,6 +344,11 @@ struct mem_cgroup {
 	struct deferred_split deferred_split_queue;
 #endif
 
+#ifdef CONFIG_CGROUP_WRITEBACK_PARA
+	int dirty_background_ratio;
+	int dirty_ratio;
+#endif
+
 	struct mem_cgroup_per_node *nodeinfo[];
 };
 
@@ -1634,6 +1639,23 @@ static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb)
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
+#ifdef CONFIG_CGROUP_WRITEBACK_PARA
+unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb);
+unsigned int wb_dirty_ratio(struct bdi_writeback *wb);
+#else
+static inline
+unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb)
+{
+	return dirty_background_ratio;
+}
+
+static inline
+unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
+{
+	return vm_dirty_ratio;
+}
+#endif
+
 struct sock;
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
 			     gfp_t gfp_mask);
diff --git a/init/Kconfig b/init/Kconfig
index ddcbefe535e9..0b8152000d6e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -989,6 +989,13 @@ config CGROUP_WRITEBACK
 	depends on MEMCG && BLK_CGROUP
 	default y
 
+config CGROUP_WRITEBACK_PARA
+	bool "Enable setup dirty flush parameters per memcg"
+	depends on CGROUP_WRITEBACK
+	default y
+	help
+	  This feature helps cgroup could specify its own diry wriback policy.
+
 menuconfig CGROUP_SCHED
 	bool "CPU controller"
 	default n
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8922bacfe2a..b1c1b150637a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4822,6 +4822,112 @@ static int mem_cgroup_slab_show(struct seq_file *m, void *p)
 }
 #endif
 
+#ifdef CONFIG_CGROUP_WRITEBACK_PARA
+unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb)
+{
+	struct mem_cgroup *memcg;
+
+	if (mem_cgroup_disabled() || !wb)
+		return dirty_background_ratio;
+
+	memcg = mem_cgroup_from_css(wb->memcg_css);
+	if (memcg == root_mem_cgroup || memcg->dirty_background_ratio < 0)
+		return dirty_background_ratio;
+
+	return memcg->dirty_background_ratio;
+}
+
+unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
+{
+	struct mem_cgroup *memcg;
+
+	if (mem_cgroup_disabled() || !wb)
+		return vm_dirty_ratio;
+
+	memcg = mem_cgroup_from_css(wb->memcg_css);
+	if (memcg == root_mem_cgroup || memcg->dirty_ratio < 0)
+		return vm_dirty_ratio;
+
+	return memcg->dirty_ratio;
+}
+
+static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
+					 struct mem_cgroup *memcg)
+{
+	memcg->dirty_background_ratio = parent->dirty_background_ratio;
+	memcg->dirty_ratio = parent->dirty_ratio;
+}
+
+static void wb_memcg_init(struct mem_cgroup *memcg)
+{
+	memcg->dirty_background_ratio = -1;
+	memcg->dirty_ratio = -1;
+}
+
+static int mem_cgroup_dirty_background_ratio_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+
+	seq_printf(m, "%d\n", memcg->dirty_background_ratio);
+	return 0;
+}
+
+static ssize_t
+mem_cgroup_dirty_background_ratio_write(struct kernfs_open_file *of,
+					char *buf, size_t nbytes,
+					loff_t off)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+	int ret, background_ratio;
+
+	buf = strstrip(buf);
+	ret = kstrtoint(buf, 0, &background_ratio);
+	if (ret)
+		return ret;
+
+	if (background_ratio < -1 || background_ratio > 100)
+		return -EINVAL;
+
+	memcg->dirty_background_ratio = background_ratio;
+	return nbytes;
+}
+
+static int mem_cgroup_dirty_ratio_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+
+	seq_printf(m, "%d\n", memcg->dirty_ratio);
+	return 0;
+}
+
+static ssize_t
+mem_cgroup_dirty_ratio_write(struct kernfs_open_file *of,
+			     char *buf, size_t nbytes, loff_t off)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+	int ret, dirty_ratio;
+
+	buf = strstrip(buf);
+	ret = kstrtoint(buf, 0, &dirty_ratio);
+	if (ret)
+		return ret;
+
+	if (dirty_ratio < -1 || dirty_ratio > 100)
+		return -EINVAL;
+
+	memcg->dirty_ratio = dirty_ratio;
+	return nbytes;
+}
+#else
+static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
+					 struct mem_cgroup *memcg)
+{
+}
+
+static inline void wb_memcg_init(struct mem_cgroup *memcg)
+{
+}
+#endif
 static struct cftype mem_cgroup_legacy_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -4948,6 +5054,20 @@ static struct cftype mem_cgroup_legacy_files[] = {
 		.write = mem_cgroup_reset,
 		.read_u64 = mem_cgroup_read_u64,
 	},
+#ifdef CONFIG_CGROUP_WRITEBACK_PARA
+	{
+		.name = "dirty_background_ratio",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_background_ratio_show,
+		.write = mem_cgroup_dirty_background_ratio_write,
+	},
+	{
+		.name = "dirty_ratio",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_ratio_show,
+		.write = mem_cgroup_dirty_ratio_write,
+	},
+#endif
 	{ },	/* terminate */
 };
 
@@ -5151,11 +5271,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 		page_counter_init(&memcg->swap, &parent->swap);
 		page_counter_init(&memcg->kmem, &parent->kmem);
 		page_counter_init(&memcg->tcpmem, &parent->tcpmem);
+		wb_memcg_inherit_from_parent(parent, memcg);
 	} else {
 		page_counter_init(&memcg->memory, NULL);
 		page_counter_init(&memcg->swap, NULL);
 		page_counter_init(&memcg->kmem, NULL);
 		page_counter_init(&memcg->tcpmem, NULL);
+		wb_memcg_init(memcg);
 
 		root_mem_cgroup = memcg;
 		return &memcg->css;
@@ -6414,6 +6536,20 @@ static struct cftype memory_files[] = {
 		.seq_show = memory_oom_group_show,
 		.write = memory_oom_group_write,
 	},
+#ifdef CONFIG_CGROUP_WRITEBACK_PARA
+	{
+		.name = "dirty_background_ratio",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_background_ratio_show,
+		.write = mem_cgroup_dirty_background_ratio_write,
+	},
+	{
+		.name = "dirty_ratio",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_ratio_show,
+		.write = mem_cgroup_dirty_ratio_write,
+	},
+#endif
 	{ }	/* terminate */
 };
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7e2da284e427..cec2ef032927 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -395,12 +395,23 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
 		 * per-PAGE_SIZE, they can be obtained by dividing bytes by
 		 * number of pages.
 		 */
+#ifdef CONFIG_CGROUP_WRITEBACK_PARA
+		ratio = (wb_dirty_ratio(dtc->wb) * PAGE_SIZE) / 100;
+		bg_ratio = (wb_dirty_background_ratio(dtc->wb) * PAGE_SIZE) / 100;
+		if (!ratio && bytes)
+			ratio = min(DIV_ROUND_UP(bytes, global_avail),
+				    PAGE_SIZE);
+		if (!bg_ratio && bg_bytes)
+			bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
+				       PAGE_SIZE);
+#else
 		if (bytes)
 			ratio = min(DIV_ROUND_UP(bytes, global_avail),
 				    PAGE_SIZE);
 		if (bg_bytes)
 			bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
 				       PAGE_SIZE);
+#endif
 		bytes = bg_bytes = 0;
 	}
 
@@ -418,8 +429,8 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
 		bg_thresh = thresh / 2;
 	tsk = current;
 	if (rt_task(tsk)) {
-		bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
-		thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
+		bg_thresh += bg_thresh / 4 + dtc_dom(dtc)->dirty_limit / 32;
+		thresh += thresh / 4 + dtc_dom(dtc)->dirty_limit / 32;
 	}
 	dtc->thresh = thresh;
 	dtc->bg_thresh = bg_thresh;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] writeback: specify writeback period and expire interval per memcg
       [not found] <20220427093241.108281-1-yongmeixie@hotmail.com>
  2022-04-27  9:32 ` [PATCH 1/3] writeback: refine trace event balance_dirty_pages Xie Yongmei
  2022-04-27  9:32 ` [PATCH 2/3] writeback: per memcg dirty flush Xie Yongmei
@ 2022-04-27  9:32 ` Xie Yongmei
  2022-04-27 10:36   ` Michal Hocko
  2 siblings, 1 reply; 5+ messages in thread
From: Xie Yongmei @ 2022-04-27  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel
  Cc: yongmeixie, Xie Yongmei

dirty_writeback_interval: dirty wakeup period
dirty_expire_interval: expire period

This patch provides per memcg setttings for writeback interval.

Dirty writeback could be triggered in the below ways:
  - mark_inode_dirty: when the first time of dirtying pages for this inode,
		it tries to wakeup the callback hook wb_workfn in
		wakeup period later.
  - wb_workfn: if there're more writeback works to do, it would wakeup the
		callback hook wb_workfn in another wakeup period later.
  - external event: kswad found dirty pages piled up at the end of inactive
		list or desktop mode timer.
  - buffered write context: balance_dirty_pages tries to wake up background
		writeback once dirty pages above freerun level of pages.
  - sync context: sync(fs sync) writeback immediately

No matter how writeback is triggered, wb_workfn is the unique callback hook
to manipulate the flushing things. Actually, wb_check_old_data_flush
handles the period writeback and decides the scope of dirty pages which
have to be written back because they were too old.

Signed-off-by: Xie Yongmei <yongmeixie@hotmail.com>
---
 fs/fs-writeback.c          |  11 ++--
 include/linux/memcontrol.h |  16 ++++++
 mm/backing-dev.c           |   4 +-
 mm/memcontrol.c            | 114 +++++++++++++++++++++++++++++++++++++
 4 files changed, 140 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 591fe9cf1659..f59e4709ec39 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1980,6 +1980,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 	struct inode *inode;
 	long progress;
 	struct blk_plug plug;
+	unsigned int dirty_expire = wb_dirty_expire_interval(wb);
 
 	blk_start_plug(&plug);
 	spin_lock(&wb->list_lock);
@@ -2015,7 +2016,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		 */
 		if (work->for_kupdate) {
 			dirtied_before = jiffies -
-				msecs_to_jiffies(dirty_expire_interval * 10);
+				msecs_to_jiffies(dirty_expire * 10);
 		} else if (work->for_background)
 			dirtied_before = jiffies;
 
@@ -2101,15 +2102,16 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
 {
 	unsigned long expired;
 	long nr_pages;
+	unsigned int writeback_interval = wb_dirty_writeback_interval(wb);
 
 	/*
 	 * When set to zero, disable periodic writeback
 	 */
-	if (!dirty_writeback_interval)
+	if (!writeback_interval)
 		return 0;
 
 	expired = wb->last_old_flush +
-			msecs_to_jiffies(dirty_writeback_interval * 10);
+			msecs_to_jiffies(writeback_interval * 10);
 	if (time_before(jiffies, expired))
 		return 0;
 
@@ -2194,6 +2196,7 @@ void wb_workfn(struct work_struct *work)
 	struct bdi_writeback *wb = container_of(to_delayed_work(work),
 						struct bdi_writeback, dwork);
 	long pages_written;
+	unsigned int writeback_interval = wb_dirty_writeback_interval(wb);
 
 	set_worker_desc("flush-%s", bdi_dev_name(wb->bdi));
 
@@ -2222,7 +2225,7 @@ void wb_workfn(struct work_struct *work)
 
 	if (!list_empty(&wb->work_list))
 		wb_wakeup(wb);
-	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
+	else if (wb_has_dirty_io(wb) && writeback_interval)
 		wb_wakeup_delayed(wb);
 }
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 386fc9b70c95..c1dc88bb8f80 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -347,6 +347,8 @@ struct mem_cgroup {
 #ifdef CONFIG_CGROUP_WRITEBACK_PARA
 	int dirty_background_ratio;
 	int dirty_ratio;
+	int dirty_writeback_interval;
+	int dirty_expire_interval;
 #endif
 
 	struct mem_cgroup_per_node *nodeinfo[];
@@ -1642,6 +1644,8 @@ static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb)
 #ifdef CONFIG_CGROUP_WRITEBACK_PARA
 unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb);
 unsigned int wb_dirty_ratio(struct bdi_writeback *wb);
+unsigned int wb_dirty_writeback_interval(struct bdi_writeback *wb);
+unsigned int wb_dirty_expire_interval(struct bdi_writeback *wb);
 #else
 static inline
 unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb)
@@ -1654,6 +1658,18 @@ unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
 {
 	return vm_dirty_ratio;
 }
+
+static inline
+unsigned int wb_dirty_writeback_interval(struct bdi_writeback *wb)
+{
+	return dirty_writeback_interval;
+}
+
+static inline
+unsigned int wb_dirty_expire_interval(struct bdi_writeback *wb)
+{
+	return dirty_expire_interval;
+}
 #endif
 
 struct sock;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7176af65b103..685558362ad8 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -15,6 +15,7 @@
 #include <linux/writeback.h>
 #include <linux/device.h>
 #include <trace/events/writeback.h>
+#include <linux/memcontrol.h>
 
 struct backing_dev_info noop_backing_dev_info;
 EXPORT_SYMBOL_GPL(noop_backing_dev_info);
@@ -264,8 +265,9 @@ subsys_initcall(default_bdi_init);
 void wb_wakeup_delayed(struct bdi_writeback *wb)
 {
 	unsigned long timeout;
+	unsigned int dirty_interval = wb_dirty_writeback_interval(wb);
 
-	timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
+	timeout = msecs_to_jiffies(dirty_interval * 10);
 	spin_lock_bh(&wb->work_lock);
 	if (test_bit(WB_registered, &wb->state))
 		queue_delayed_work(bdi_wq, &wb->dwork, timeout);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b1c1b150637a..c392aec22e2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4851,17 +4851,49 @@ unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
 	return memcg->dirty_ratio;
 }
 
+unsigned int wb_dirty_writeback_interval(struct bdi_writeback *wb)
+{
+	struct mem_cgroup *memcg;
+
+	if (mem_cgroup_disabled() || !wb)
+		return dirty_writeback_interval;
+
+	memcg = mem_cgroup_from_css(wb->memcg_css);
+	if (memcg == root_mem_cgroup || memcg->dirty_writeback_interval < 0)
+		return dirty_writeback_interval;
+
+	return memcg->dirty_writeback_interval;
+}
+
+unsigned int wb_dirty_expire_interval(struct bdi_writeback *wb)
+{
+	struct mem_cgroup *memcg;
+
+	if (mem_cgroup_disabled() || !wb)
+		return dirty_expire_interval;
+
+	memcg = mem_cgroup_from_css(wb->memcg_css);
+	if (memcg == root_mem_cgroup || memcg->dirty_expire_interval < 0)
+		return dirty_expire_interval;
+
+	return memcg->dirty_expire_interval;
+}
+
 static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
 					 struct mem_cgroup *memcg)
 {
 	memcg->dirty_background_ratio = parent->dirty_background_ratio;
 	memcg->dirty_ratio = parent->dirty_ratio;
+	memcg->dirty_writeback_interval = parent->dirty_writeback_interval;
+	memcg->dirty_expire_interval = parent->dirty_expire_interval;
 }
 
 static void wb_memcg_init(struct mem_cgroup *memcg)
 {
 	memcg->dirty_background_ratio = -1;
 	memcg->dirty_ratio = -1;
+	memcg->dirty_writeback_interval = -1;
+	memcg->dirty_expire_interval = -1;
 }
 
 static int mem_cgroup_dirty_background_ratio_show(struct seq_file *m, void *v)
@@ -4918,6 +4950,64 @@ mem_cgroup_dirty_ratio_write(struct kernfs_open_file *of,
 	memcg->dirty_ratio = dirty_ratio;
 	return nbytes;
 }
+
+static int mem_cgroup_dirty_writeback_interval_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+
+	seq_printf(m, "%d\n", memcg->dirty_writeback_interval);
+	return 0;
+}
+
+static ssize_t
+mem_cgroup_dirty_writeback_interval_write(struct kernfs_open_file *of,
+					  char *buf, size_t nbytes,
+					  loff_t off)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+	int ret, writeback_interval;
+
+	buf = strstrip(buf);
+	ret = kstrtoint(buf, 0, &writeback_interval);
+	if (ret)
+		return ret;
+
+	if (writeback_interval < -1)
+		return -EINVAL;
+
+	if (memcg->dirty_writeback_interval != writeback_interval) {
+		memcg->dirty_writeback_interval = writeback_interval;
+		wakeup_flusher_threads(WB_REASON_PERIODIC);
+	}
+	return nbytes;
+}
+
+static int mem_cgroup_dirty_expire_interval_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+
+	seq_printf(m, "%d\n", memcg->dirty_expire_interval);
+	return 0;
+}
+
+static ssize_t
+mem_cgroup_dirty_expire_interval_write(struct kernfs_open_file *of,
+				       char *buf, size_t nbytes, loff_t off)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+	int ret, expire_interval;
+
+	buf = strstrip(buf);
+	ret = kstrtoint(buf, 0, &expire_interval);
+	if (ret)
+		return ret;
+
+	if (expire_interval < -1)
+		return -EINVAL;
+
+	memcg->dirty_expire_interval = expire_interval;
+	return nbytes;
+}
 #else
 static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
 					 struct mem_cgroup *memcg)
@@ -5067,6 +5157,18 @@ static struct cftype mem_cgroup_legacy_files[] = {
 		.seq_show = mem_cgroup_dirty_ratio_show,
 		.write = mem_cgroup_dirty_ratio_write,
 	},
+	{
+		.name = "dirty_writeback_interval_centisecs",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_writeback_interval_show,
+		.write = mem_cgroup_dirty_writeback_interval_write,
+	},
+	{
+		.name = "dirty_expire_interval_centisecs",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_expire_interval_show,
+		.write = mem_cgroup_dirty_expire_interval_write,
+	},
 #endif
 	{ },	/* terminate */
 };
@@ -6549,6 +6651,18 @@ static struct cftype memory_files[] = {
 		.seq_show = mem_cgroup_dirty_ratio_show,
 		.write = mem_cgroup_dirty_ratio_write,
 	},
+	{
+		.name = "dirty_writeback_interval_centisecs",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_writeback_interval_show,
+		.write = mem_cgroup_dirty_writeback_interval_write,
+	},
+	{
+		.name = "dirty_expire_interval_centisecs",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = mem_cgroup_dirty_expire_interval_show,
+		.write = mem_cgroup_dirty_expire_interval_write,
+	},
 #endif
 	{ }	/* terminate */
 };
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/3] writeback: per memcg dirty flush
  2022-04-27  9:32 ` [PATCH 2/3] writeback: per memcg dirty flush Xie Yongmei
@ 2022-04-27 10:35   ` Michal Hocko
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2022-04-27 10:35 UTC (permalink / raw)
  To: Xie Yongmei
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexander Viro,
	linux-fsdevel, yongmeixie, Johannes Weiner, Roman Gushchin,
	Shakeel Butt, Tejun Heo, linux-api

[CC memcg maintainers and Tejun who has been quite active in the area
as well. Also linux-api ML added - please add this list whenever you are
suggesting user visible API]

On Wed 27-04-22 05:32:40, Xie Yongmei wrote:
> Currently, dirty writeback is under global control. We can tune it by
> parameters in /proc/sys/vm/
>   - dirty_expire_centisecs: expire interval in centiseconds
>   - dirty_writeback_centisecs: periodcal writeback interval in centiseconds
>   - dirty_background_bytes/dirty_background_ratio: async writeback
>     threshold
>   - dirty_bytes/dirty_ratio: sync writeback threshold
> 
> Sometimes, we'd like to specify special wrtiteback policy for user
> application, especially for offline application in co-location scenerio.
> 
> This patch provides dirty flush policy per memcg, user can specify them
> in memcg interface.
> 
> Actually, writeback code maintains two dimensions of dirty pages control in
> balance_dirty_pages.
>    - gdtc for global control
>    - mdtc for cgroup control
> 
> When dirty pages is under both of control, it leaves the check quickly.
> Otherwise, it computes the wb threshold (along with bg_threshold) taking
> the writeout bandwidth into consideration. And computes position ratio
> against wb_thresh for both global control and cgroup control as well.
> After that, it takes the smaller one (IOW the strict one) as the factor
> to generate task ratelimit based on wb's dirty_ratelimit.
> 
> So far, the writeback code can control the dirty limit for both global
> view and cgroup view. That means the framework works well for controlling
> cgroup's dirty limit.
> 
> This patch only provides an extra interface for memcg to tune writeback
> behavior.
> 
> Signed-off-by: Xie Yongmei <yongmeixie@hotmail.com>
> ---
>  include/linux/memcontrol.h |  22 ++++++
>  init/Kconfig               |   7 ++
>  mm/memcontrol.c            | 136 +++++++++++++++++++++++++++++++++++++
>  mm/page-writeback.c        |  15 +++-
>  4 files changed, 178 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index a68dce3873fc..386fc9b70c95 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -344,6 +344,11 @@ struct mem_cgroup {
>  	struct deferred_split deferred_split_queue;
>  #endif
>  
> +#ifdef CONFIG_CGROUP_WRITEBACK_PARA
> +	int dirty_background_ratio;
> +	int dirty_ratio;
> +#endif
> +
>  	struct mem_cgroup_per_node *nodeinfo[];
>  };
>  
> @@ -1634,6 +1639,23 @@ static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb)
>  
>  #endif	/* CONFIG_CGROUP_WRITEBACK */
>  
> +#ifdef CONFIG_CGROUP_WRITEBACK_PARA
> +unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb);
> +unsigned int wb_dirty_ratio(struct bdi_writeback *wb);
> +#else
> +static inline
> +unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb)
> +{
> +	return dirty_background_ratio;
> +}
> +
> +static inline
> +unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
> +{
> +	return vm_dirty_ratio;
> +}
> +#endif
> +
>  struct sock;
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
>  			     gfp_t gfp_mask);
> diff --git a/init/Kconfig b/init/Kconfig
> index ddcbefe535e9..0b8152000d6e 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -989,6 +989,13 @@ config CGROUP_WRITEBACK
>  	depends on MEMCG && BLK_CGROUP
>  	default y
>  
> +config CGROUP_WRITEBACK_PARA
> +	bool "Enable setup dirty flush parameters per memcg"
> +	depends on CGROUP_WRITEBACK
> +	default y
> +	help
> +	  This feature helps cgroup could specify its own diry wriback policy.
> +
>  menuconfig CGROUP_SCHED
>  	bool "CPU controller"
>  	default n
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e8922bacfe2a..b1c1b150637a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4822,6 +4822,112 @@ static int mem_cgroup_slab_show(struct seq_file *m, void *p)
>  }
>  #endif
>  
> +#ifdef CONFIG_CGROUP_WRITEBACK_PARA
> +unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	if (mem_cgroup_disabled() || !wb)
> +		return dirty_background_ratio;
> +
> +	memcg = mem_cgroup_from_css(wb->memcg_css);
> +	if (memcg == root_mem_cgroup || memcg->dirty_background_ratio < 0)
> +		return dirty_background_ratio;
> +
> +	return memcg->dirty_background_ratio;
> +}
> +
> +unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	if (mem_cgroup_disabled() || !wb)
> +		return vm_dirty_ratio;
> +
> +	memcg = mem_cgroup_from_css(wb->memcg_css);
> +	if (memcg == root_mem_cgroup || memcg->dirty_ratio < 0)
> +		return vm_dirty_ratio;
> +
> +	return memcg->dirty_ratio;
> +}
> +
> +static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
> +					 struct mem_cgroup *memcg)
> +{
> +	memcg->dirty_background_ratio = parent->dirty_background_ratio;
> +	memcg->dirty_ratio = parent->dirty_ratio;
> +}
> +
> +static void wb_memcg_init(struct mem_cgroup *memcg)
> +{
> +	memcg->dirty_background_ratio = -1;
> +	memcg->dirty_ratio = -1;
> +}
> +
> +static int mem_cgroup_dirty_background_ratio_show(struct seq_file *m, void *v)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> +
> +	seq_printf(m, "%d\n", memcg->dirty_background_ratio);
> +	return 0;
> +}
> +
> +static ssize_t
> +mem_cgroup_dirty_background_ratio_write(struct kernfs_open_file *of,
> +					char *buf, size_t nbytes,
> +					loff_t off)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> +	int ret, background_ratio;
> +
> +	buf = strstrip(buf);
> +	ret = kstrtoint(buf, 0, &background_ratio);
> +	if (ret)
> +		return ret;
> +
> +	if (background_ratio < -1 || background_ratio > 100)
> +		return -EINVAL;
> +
> +	memcg->dirty_background_ratio = background_ratio;
> +	return nbytes;
> +}
> +
> +static int mem_cgroup_dirty_ratio_show(struct seq_file *m, void *v)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> +
> +	seq_printf(m, "%d\n", memcg->dirty_ratio);
> +	return 0;
> +}
> +
> +static ssize_t
> +mem_cgroup_dirty_ratio_write(struct kernfs_open_file *of,
> +			     char *buf, size_t nbytes, loff_t off)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> +	int ret, dirty_ratio;
> +
> +	buf = strstrip(buf);
> +	ret = kstrtoint(buf, 0, &dirty_ratio);
> +	if (ret)
> +		return ret;
> +
> +	if (dirty_ratio < -1 || dirty_ratio > 100)
> +		return -EINVAL;
> +
> +	memcg->dirty_ratio = dirty_ratio;
> +	return nbytes;
> +}
> +#else
> +static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
> +					 struct mem_cgroup *memcg)
> +{
> +}
> +
> +static inline void wb_memcg_init(struct mem_cgroup *memcg)
> +{
> +}
> +#endif
>  static struct cftype mem_cgroup_legacy_files[] = {
>  	{
>  		.name = "usage_in_bytes",
> @@ -4948,6 +5054,20 @@ static struct cftype mem_cgroup_legacy_files[] = {
>  		.write = mem_cgroup_reset,
>  		.read_u64 = mem_cgroup_read_u64,
>  	},
> +#ifdef CONFIG_CGROUP_WRITEBACK_PARA
> +	{
> +		.name = "dirty_background_ratio",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_background_ratio_show,
> +		.write = mem_cgroup_dirty_background_ratio_write,
> +	},
> +	{
> +		.name = "dirty_ratio",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_ratio_show,
> +		.write = mem_cgroup_dirty_ratio_write,
> +	},
> +#endif
>  	{ },	/* terminate */
>  };
>  
> @@ -5151,11 +5271,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>  		page_counter_init(&memcg->swap, &parent->swap);
>  		page_counter_init(&memcg->kmem, &parent->kmem);
>  		page_counter_init(&memcg->tcpmem, &parent->tcpmem);
> +		wb_memcg_inherit_from_parent(parent, memcg);
>  	} else {
>  		page_counter_init(&memcg->memory, NULL);
>  		page_counter_init(&memcg->swap, NULL);
>  		page_counter_init(&memcg->kmem, NULL);
>  		page_counter_init(&memcg->tcpmem, NULL);
> +		wb_memcg_init(memcg);
>  
>  		root_mem_cgroup = memcg;
>  		return &memcg->css;
> @@ -6414,6 +6536,20 @@ static struct cftype memory_files[] = {
>  		.seq_show = memory_oom_group_show,
>  		.write = memory_oom_group_write,
>  	},
> +#ifdef CONFIG_CGROUP_WRITEBACK_PARA
> +	{
> +		.name = "dirty_background_ratio",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_background_ratio_show,
> +		.write = mem_cgroup_dirty_background_ratio_write,
> +	},
> +	{
> +		.name = "dirty_ratio",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_ratio_show,
> +		.write = mem_cgroup_dirty_ratio_write,
> +	},
> +#endif
>  	{ }	/* terminate */
>  };
>  
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7e2da284e427..cec2ef032927 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -395,12 +395,23 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
>  		 * per-PAGE_SIZE, they can be obtained by dividing bytes by
>  		 * number of pages.
>  		 */
> +#ifdef CONFIG_CGROUP_WRITEBACK_PARA
> +		ratio = (wb_dirty_ratio(dtc->wb) * PAGE_SIZE) / 100;
> +		bg_ratio = (wb_dirty_background_ratio(dtc->wb) * PAGE_SIZE) / 100;
> +		if (!ratio && bytes)
> +			ratio = min(DIV_ROUND_UP(bytes, global_avail),
> +				    PAGE_SIZE);
> +		if (!bg_ratio && bg_bytes)
> +			bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
> +				       PAGE_SIZE);
> +#else
>  		if (bytes)
>  			ratio = min(DIV_ROUND_UP(bytes, global_avail),
>  				    PAGE_SIZE);
>  		if (bg_bytes)
>  			bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
>  				       PAGE_SIZE);
> +#endif
>  		bytes = bg_bytes = 0;
>  	}
>  
> @@ -418,8 +429,8 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
>  		bg_thresh = thresh / 2;
>  	tsk = current;
>  	if (rt_task(tsk)) {
> -		bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
> -		thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
> +		bg_thresh += bg_thresh / 4 + dtc_dom(dtc)->dirty_limit / 32;
> +		thresh += thresh / 4 + dtc_dom(dtc)->dirty_limit / 32;
>  	}
>  	dtc->thresh = thresh;
>  	dtc->bg_thresh = bg_thresh;
> -- 
> 2.27.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/3] writeback: specify writeback period and expire interval per memcg
  2022-04-27  9:32 ` [PATCH 3/3] writeback: specify writeback period and expire interval per memcg Xie Yongmei
@ 2022-04-27 10:36   ` Michal Hocko
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2022-04-27 10:36 UTC (permalink / raw)
  To: Xie Yongmei
  Cc: Andrew Morton, linux-mm, linux-kernel, Alexander Viro,
	linux-fsdevel, yongmeixie, Johannes Weiner, Roman Gushchin,
	Shakeel Butt, Tejun Heo, linux-api

[updated CC list as per previous patch in the thread]

On Wed 27-04-22 05:32:41, Xie Yongmei wrote:
> dirty_writeback_interval: dirty wakeup period
> dirty_expire_interval: expire period
> 
> This patch provides per memcg setttings for writeback interval.
> 
> Dirty writeback could be triggered in the below ways:
>   - mark_inode_dirty: when the first time of dirtying pages for this inode,
> 		it tries to wakeup the callback hook wb_workfn in
> 		wakeup period later.
>   - wb_workfn: if there're more writeback works to do, it would wakeup the
> 		callback hook wb_workfn in another wakeup period later.
>   - external event: kswad found dirty pages piled up at the end of inactive
> 		list or desktop mode timer.
>   - buffered write context: balance_dirty_pages tries to wake up background
> 		writeback once dirty pages above freerun level of pages.
>   - sync context: sync(fs sync) writeback immediately
> 
> No matter how writeback is triggered, wb_workfn is the unique callback hook
> to manipulate the flushing things. Actually, wb_check_old_data_flush
> handles the period writeback and decides the scope of dirty pages which
> have to be written back because they were too old.
> 
> Signed-off-by: Xie Yongmei <yongmeixie@hotmail.com>
> ---
>  fs/fs-writeback.c          |  11 ++--
>  include/linux/memcontrol.h |  16 ++++++
>  mm/backing-dev.c           |   4 +-
>  mm/memcontrol.c            | 114 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 140 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 591fe9cf1659..f59e4709ec39 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1980,6 +1980,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>  	struct inode *inode;
>  	long progress;
>  	struct blk_plug plug;
> +	unsigned int dirty_expire = wb_dirty_expire_interval(wb);
>  
>  	blk_start_plug(&plug);
>  	spin_lock(&wb->list_lock);
> @@ -2015,7 +2016,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>  		 */
>  		if (work->for_kupdate) {
>  			dirtied_before = jiffies -
> -				msecs_to_jiffies(dirty_expire_interval * 10);
> +				msecs_to_jiffies(dirty_expire * 10);
>  		} else if (work->for_background)
>  			dirtied_before = jiffies;
>  
> @@ -2101,15 +2102,16 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
>  {
>  	unsigned long expired;
>  	long nr_pages;
> +	unsigned int writeback_interval = wb_dirty_writeback_interval(wb);
>  
>  	/*
>  	 * When set to zero, disable periodic writeback
>  	 */
> -	if (!dirty_writeback_interval)
> +	if (!writeback_interval)
>  		return 0;
>  
>  	expired = wb->last_old_flush +
> -			msecs_to_jiffies(dirty_writeback_interval * 10);
> +			msecs_to_jiffies(writeback_interval * 10);
>  	if (time_before(jiffies, expired))
>  		return 0;
>  
> @@ -2194,6 +2196,7 @@ void wb_workfn(struct work_struct *work)
>  	struct bdi_writeback *wb = container_of(to_delayed_work(work),
>  						struct bdi_writeback, dwork);
>  	long pages_written;
> +	unsigned int writeback_interval = wb_dirty_writeback_interval(wb);
>  
>  	set_worker_desc("flush-%s", bdi_dev_name(wb->bdi));
>  
> @@ -2222,7 +2225,7 @@ void wb_workfn(struct work_struct *work)
>  
>  	if (!list_empty(&wb->work_list))
>  		wb_wakeup(wb);
> -	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
> +	else if (wb_has_dirty_io(wb) && writeback_interval)
>  		wb_wakeup_delayed(wb);
>  }
>  
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 386fc9b70c95..c1dc88bb8f80 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -347,6 +347,8 @@ struct mem_cgroup {
>  #ifdef CONFIG_CGROUP_WRITEBACK_PARA
>  	int dirty_background_ratio;
>  	int dirty_ratio;
> +	int dirty_writeback_interval;
> +	int dirty_expire_interval;
>  #endif
>  
>  	struct mem_cgroup_per_node *nodeinfo[];
> @@ -1642,6 +1644,8 @@ static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb)
>  #ifdef CONFIG_CGROUP_WRITEBACK_PARA
>  unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb);
>  unsigned int wb_dirty_ratio(struct bdi_writeback *wb);
> +unsigned int wb_dirty_writeback_interval(struct bdi_writeback *wb);
> +unsigned int wb_dirty_expire_interval(struct bdi_writeback *wb);
>  #else
>  static inline
>  unsigned int wb_dirty_background_ratio(struct bdi_writeback *wb)
> @@ -1654,6 +1658,18 @@ unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
>  {
>  	return vm_dirty_ratio;
>  }
> +
> +static inline
> +unsigned int wb_dirty_writeback_interval(struct bdi_writeback *wb)
> +{
> +	return dirty_writeback_interval;
> +}
> +
> +static inline
> +unsigned int wb_dirty_expire_interval(struct bdi_writeback *wb)
> +{
> +	return dirty_expire_interval;
> +}
>  #endif
>  
>  struct sock;
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7176af65b103..685558362ad8 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -15,6 +15,7 @@
>  #include <linux/writeback.h>
>  #include <linux/device.h>
>  #include <trace/events/writeback.h>
> +#include <linux/memcontrol.h>
>  
>  struct backing_dev_info noop_backing_dev_info;
>  EXPORT_SYMBOL_GPL(noop_backing_dev_info);
> @@ -264,8 +265,9 @@ subsys_initcall(default_bdi_init);
>  void wb_wakeup_delayed(struct bdi_writeback *wb)
>  {
>  	unsigned long timeout;
> +	unsigned int dirty_interval = wb_dirty_writeback_interval(wb);
>  
> -	timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
> +	timeout = msecs_to_jiffies(dirty_interval * 10);
>  	spin_lock_bh(&wb->work_lock);
>  	if (test_bit(WB_registered, &wb->state))
>  		queue_delayed_work(bdi_wq, &wb->dwork, timeout);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b1c1b150637a..c392aec22e2e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4851,17 +4851,49 @@ unsigned int wb_dirty_ratio(struct bdi_writeback *wb)
>  	return memcg->dirty_ratio;
>  }
>  
> +unsigned int wb_dirty_writeback_interval(struct bdi_writeback *wb)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	if (mem_cgroup_disabled() || !wb)
> +		return dirty_writeback_interval;
> +
> +	memcg = mem_cgroup_from_css(wb->memcg_css);
> +	if (memcg == root_mem_cgroup || memcg->dirty_writeback_interval < 0)
> +		return dirty_writeback_interval;
> +
> +	return memcg->dirty_writeback_interval;
> +}
> +
> +unsigned int wb_dirty_expire_interval(struct bdi_writeback *wb)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	if (mem_cgroup_disabled() || !wb)
> +		return dirty_expire_interval;
> +
> +	memcg = mem_cgroup_from_css(wb->memcg_css);
> +	if (memcg == root_mem_cgroup || memcg->dirty_expire_interval < 0)
> +		return dirty_expire_interval;
> +
> +	return memcg->dirty_expire_interval;
> +}
> +
>  static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
>  					 struct mem_cgroup *memcg)
>  {
>  	memcg->dirty_background_ratio = parent->dirty_background_ratio;
>  	memcg->dirty_ratio = parent->dirty_ratio;
> +	memcg->dirty_writeback_interval = parent->dirty_writeback_interval;
> +	memcg->dirty_expire_interval = parent->dirty_expire_interval;
>  }
>  
>  static void wb_memcg_init(struct mem_cgroup *memcg)
>  {
>  	memcg->dirty_background_ratio = -1;
>  	memcg->dirty_ratio = -1;
> +	memcg->dirty_writeback_interval = -1;
> +	memcg->dirty_expire_interval = -1;
>  }
>  
>  static int mem_cgroup_dirty_background_ratio_show(struct seq_file *m, void *v)
> @@ -4918,6 +4950,64 @@ mem_cgroup_dirty_ratio_write(struct kernfs_open_file *of,
>  	memcg->dirty_ratio = dirty_ratio;
>  	return nbytes;
>  }
> +
> +static int mem_cgroup_dirty_writeback_interval_show(struct seq_file *m, void *v)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> +
> +	seq_printf(m, "%d\n", memcg->dirty_writeback_interval);
> +	return 0;
> +}
> +
> +static ssize_t
> +mem_cgroup_dirty_writeback_interval_write(struct kernfs_open_file *of,
> +					  char *buf, size_t nbytes,
> +					  loff_t off)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> +	int ret, writeback_interval;
> +
> +	buf = strstrip(buf);
> +	ret = kstrtoint(buf, 0, &writeback_interval);
> +	if (ret)
> +		return ret;
> +
> +	if (writeback_interval < -1)
> +		return -EINVAL;
> +
> +	if (memcg->dirty_writeback_interval != writeback_interval) {
> +		memcg->dirty_writeback_interval = writeback_interval;
> +		wakeup_flusher_threads(WB_REASON_PERIODIC);
> +	}
> +	return nbytes;
> +}
> +
> +static int mem_cgroup_dirty_expire_interval_show(struct seq_file *m, void *v)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> +
> +	seq_printf(m, "%d\n", memcg->dirty_expire_interval);
> +	return 0;
> +}
> +
> +static ssize_t
> +mem_cgroup_dirty_expire_interval_write(struct kernfs_open_file *of,
> +				       char *buf, size_t nbytes, loff_t off)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> +	int ret, expire_interval;
> +
> +	buf = strstrip(buf);
> +	ret = kstrtoint(buf, 0, &expire_interval);
> +	if (ret)
> +		return ret;
> +
> +	if (expire_interval < -1)
> +		return -EINVAL;
> +
> +	memcg->dirty_expire_interval = expire_interval;
> +	return nbytes;
> +}
>  #else
>  static void wb_memcg_inherit_from_parent(struct mem_cgroup *parent,
>  					 struct mem_cgroup *memcg)
> @@ -5067,6 +5157,18 @@ static struct cftype mem_cgroup_legacy_files[] = {
>  		.seq_show = mem_cgroup_dirty_ratio_show,
>  		.write = mem_cgroup_dirty_ratio_write,
>  	},
> +	{
> +		.name = "dirty_writeback_interval_centisecs",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_writeback_interval_show,
> +		.write = mem_cgroup_dirty_writeback_interval_write,
> +	},
> +	{
> +		.name = "dirty_expire_interval_centisecs",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_expire_interval_show,
> +		.write = mem_cgroup_dirty_expire_interval_write,
> +	},
>  #endif
>  	{ },	/* terminate */
>  };
> @@ -6549,6 +6651,18 @@ static struct cftype memory_files[] = {
>  		.seq_show = mem_cgroup_dirty_ratio_show,
>  		.write = mem_cgroup_dirty_ratio_write,
>  	},
> +	{
> +		.name = "dirty_writeback_interval_centisecs",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_writeback_interval_show,
> +		.write = mem_cgroup_dirty_writeback_interval_write,
> +	},
> +	{
> +		.name = "dirty_expire_interval_centisecs",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = mem_cgroup_dirty_expire_interval_show,
> +		.write = mem_cgroup_dirty_expire_interval_write,
> +	},
>  #endif
>  	{ }	/* terminate */
>  };
> -- 
> 2.27.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-27 10:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220427093241.108281-1-yongmeixie@hotmail.com>
2022-04-27  9:32 ` [PATCH 1/3] writeback: refine trace event balance_dirty_pages Xie Yongmei
2022-04-27  9:32 ` [PATCH 2/3] writeback: per memcg dirty flush Xie Yongmei
2022-04-27 10:35   ` Michal Hocko
2022-04-27  9:32 ` [PATCH 3/3] writeback: specify writeback period and expire interval per memcg Xie Yongmei
2022-04-27 10:36   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).