linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
@ 2012-03-28 12:13 Fengguang Wu
  2012-03-28 12:13 ` [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h Fengguang Wu
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 12:13 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Vivek Goyal, Suresh Jayaraman, Andrea Righi, Jeff Moyer,
	linux-fsdevel, Fengguang Wu, LKML


Here is one possible solution to "buffered write IO controller", based on Linux
v3.3

git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller

Features:
- support blkio.weight
- support blkio.throttle.buffered_write_bps

Possibilities:
- it's trivial to support per-bdi .weight or .buffered_write_bps

Pros:
1) simple
2) virtually no space/time overheads
3) independent of the block layer and IO schedulers, hence
3.1) supports all filesystems/storages, eg. NFS/pNFS, CIFS, sshfs, ...
3.2) supports all IO schedulers. One may use noop for SSDs, inside virtual machines, over iSCSI, etc.

Cons:
1) don't try to smooth bursty IO submission in the flusher thread (*)
2) don't support IOPS based throttling
3) introduces semantic differences to blkio.weight, which will be
   - working by "bandwidth" for buffered writes
   - working by "device time" for direct IO

(*) Maybe not a big concern, since the bursties are limited to 500ms: if one dd
is throttled to 50% disk bandwidth, the flusher thread will be waking up on
every 1 second, keep the disk busy for 500ms and then go idle for 500ms; if
throttled to 10% disk bandwidth, the flusher thread will wake up on every 5s,
keep busy for 500ms and stay idle for 4.5s.

The test results included in the last patch look pretty good in despite of the
simple implementation.

 [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
 [PATCH 2/6] blk-cgroup: account dirtied pages
 [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
 [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
 [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
 [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace

The changeset is dominated by the blk-cgroup.h move.
The core changes (to page-writeback.c) are merely 77 lines.

 block/blk-cgroup.c               |   27 +
 block/blk-cgroup.h               |  364 --------------------------
 block/blk-throttle.c             |    2 
 block/cfq.h                      |    2 
 include/linux/blk-cgroup.h       |  396 +++++++++++++++++++++++++++++
 include/trace/events/writeback.h |   34 ++
 mm/page-writeback.c              |   77 +++++
 7 files changed, 530 insertions(+), 372 deletions(-)

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
@ 2012-03-28 12:13 ` Fengguang Wu
  2012-03-28 12:13 ` [PATCH 2/6] blk-cgroup: account dirtied pages Fengguang Wu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 12:13 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Vivek Goyal, Andrea Righi, Wu Fengguang, Suresh Jayaraman,
	Andrea Righi, Jeff Moyer, linux-fsdevel, LKML

[-- Attachment #1: blk-cgroup-move-blk-cgroup.h-in-include-linux-blk-cgroup.h.patch --]
[-- Type: text/plain, Size: 25771 bytes --]

From: Andrea Righi <arighi@develer.com>

Move blk-cgroup.h in include/linux for generic usage.

Signed-off-by: Andrea Righi <arighi@develer.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 block/blk-cgroup.c         |    2 
 block/blk-cgroup.h         |  364 -----------------------------------
 block/blk-throttle.c       |    2 
 block/cfq.h                |    2 
 include/linux/blk-cgroup.h |  364 +++++++++++++++++++++++++++++++++++
 5 files changed, 367 insertions(+), 367 deletions(-)
 delete mode 100644 block/blk-cgroup.h
 create mode 100644 include/linux/blk-cgroup.h

--- linux-next.orig/block/blk-cgroup.c	2012-03-28 12:11:27.982345310 +0800
+++ linux-next/block/blk-cgroup.c	2012-03-28 12:53:10.086293964 +0800
@@ -17,7 +17,7 @@
 #include <linux/err.h>
 #include <linux/blkdev.h>
 #include <linux/slab.h>
-#include "blk-cgroup.h"
+#include <linux/blk-cgroup.h>
 #include <linux/genhd.h>
 
 #define MAX_KEY_LEN 100
--- linux-next.orig/block/blk-throttle.c	2012-03-28 12:11:27.982345310 +0800
+++ linux-next/block/blk-throttle.c	2012-03-28 12:11:31.066345249 +0800
@@ -9,7 +9,7 @@
 #include <linux/blkdev.h>
 #include <linux/bio.h>
 #include <linux/blktrace_api.h>
-#include "blk-cgroup.h"
+#include <linux/blk-cgroup.h>
 #include "blk.h"
 
 /* Max dispatch from a group in 1 round */
--- linux-next.orig/block/cfq.h	2012-03-28 12:11:27.986345310 +0800
+++ linux-next/block/cfq.h	2012-03-28 12:11:31.066345249 +0800
@@ -1,6 +1,6 @@
 #ifndef _CFQ_H
 #define _CFQ_H
-#include "blk-cgroup.h"
+#include <linux/blk-cgroup.h>
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
 static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-next/include/linux/blk-cgroup.h	2012-03-28 12:53:24.566293666 +0800
@@ -0,0 +1,364 @@
+#ifndef _BLK_CGROUP_H
+#define _BLK_CGROUP_H
+/*
+ * Common Block IO controller cgroup interface
+ *
+ * Based on ideas and code from CFQ, CFS and BFQ:
+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
+ *
+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
+ *		      Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Copyright (C) 2009 Vivek Goyal <vgoyal@redhat.com>
+ * 	              Nauman Rafique <nauman@google.com>
+ */
+
+#include <linux/cgroup.h>
+#include <linux/u64_stats_sync.h>
+
+enum blkio_policy_id {
+	BLKIO_POLICY_PROP = 0,		/* Proportional Bandwidth division */
+	BLKIO_POLICY_THROTL,		/* Throttling */
+};
+
+/* Max limits for throttle policy */
+#define THROTL_IOPS_MAX		UINT_MAX
+
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
+
+#ifndef CONFIG_BLK_CGROUP
+/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
+extern struct cgroup_subsys blkio_subsys;
+#define blkio_subsys_id blkio_subsys.subsys_id
+#endif
+
+enum stat_type {
+	/* Total time spent (in ns) between request dispatch to the driver and
+	 * request completion for IOs doen by this cgroup. This may not be
+	 * accurate when NCQ is turned on. */
+	BLKIO_STAT_SERVICE_TIME = 0,
+	/* Total time spent waiting in scheduler queue in ns */
+	BLKIO_STAT_WAIT_TIME,
+	/* Number of IOs queued up */
+	BLKIO_STAT_QUEUED,
+	/* All the single valued stats go below this */
+	BLKIO_STAT_TIME,
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	/* Time not charged to this cgroup */
+	BLKIO_STAT_UNACCOUNTED_TIME,
+	BLKIO_STAT_AVG_QUEUE_SIZE,
+	BLKIO_STAT_IDLE_TIME,
+	BLKIO_STAT_EMPTY_TIME,
+	BLKIO_STAT_GROUP_WAIT_TIME,
+	BLKIO_STAT_DEQUEUE
+#endif
+};
+
+/* Per cpu stats */
+enum stat_type_cpu {
+	BLKIO_STAT_CPU_SECTORS,
+	/* Total bytes transferred */
+	BLKIO_STAT_CPU_SERVICE_BYTES,
+	/* Total IOs serviced, post merge */
+	BLKIO_STAT_CPU_SERVICED,
+	/* Number of IOs merged */
+	BLKIO_STAT_CPU_MERGED,
+	BLKIO_STAT_CPU_NR
+};
+
+enum stat_sub_type {
+	BLKIO_STAT_READ = 0,
+	BLKIO_STAT_WRITE,
+	BLKIO_STAT_SYNC,
+	BLKIO_STAT_ASYNC,
+	BLKIO_STAT_TOTAL
+};
+
+/* blkg state flags */
+enum blkg_state_flags {
+	BLKG_waiting = 0,
+	BLKG_idling,
+	BLKG_empty,
+};
+
+/* cgroup files owned by proportional weight policy */
+enum blkcg_file_name_prop {
+	BLKIO_PROP_weight = 1,
+	BLKIO_PROP_weight_device,
+	BLKIO_PROP_io_service_bytes,
+	BLKIO_PROP_io_serviced,
+	BLKIO_PROP_time,
+	BLKIO_PROP_sectors,
+	BLKIO_PROP_unaccounted_time,
+	BLKIO_PROP_io_service_time,
+	BLKIO_PROP_io_wait_time,
+	BLKIO_PROP_io_merged,
+	BLKIO_PROP_io_queued,
+	BLKIO_PROP_avg_queue_size,
+	BLKIO_PROP_group_wait_time,
+	BLKIO_PROP_idle_time,
+	BLKIO_PROP_empty_time,
+	BLKIO_PROP_dequeue,
+};
+
+/* cgroup files owned by throttle policy */
+enum blkcg_file_name_throtl {
+	BLKIO_THROTL_read_bps_device,
+	BLKIO_THROTL_write_bps_device,
+	BLKIO_THROTL_read_iops_device,
+	BLKIO_THROTL_write_iops_device,
+	BLKIO_THROTL_io_service_bytes,
+	BLKIO_THROTL_io_serviced,
+};
+
+struct blkio_cgroup {
+	struct cgroup_subsys_state css;
+	unsigned int weight;
+	spinlock_t lock;
+	struct hlist_head blkg_list;
+	struct list_head policy_list; /* list of blkio_policy_node */
+};
+
+struct blkio_group_stats {
+	/* total disk time and nr sectors dispatched by this group */
+	uint64_t time;
+	uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL];
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	/* Time not charged to this cgroup */
+	uint64_t unaccounted_time;
+
+	/* Sum of number of IOs queued across all samples */
+	uint64_t avg_queue_size_sum;
+	/* Count of samples taken for average */
+	uint64_t avg_queue_size_samples;
+	/* How many times this group has been removed from service tree */
+	unsigned long dequeue;
+
+	/* Total time spent waiting for it to be assigned a timeslice. */
+	uint64_t group_wait_time;
+	uint64_t start_group_wait_time;
+
+	/* Time spent idling for this blkio_group */
+	uint64_t idle_time;
+	uint64_t start_idle_time;
+	/*
+	 * Total time when we have requests queued and do not contain the
+	 * current active queue.
+	 */
+	uint64_t empty_time;
+	uint64_t start_empty_time;
+	uint16_t flags;
+#endif
+};
+
+/* Per cpu blkio group stats */
+struct blkio_group_stats_cpu {
+	uint64_t sectors;
+	uint64_t stat_arr_cpu[BLKIO_STAT_CPU_NR][BLKIO_STAT_TOTAL];
+	struct u64_stats_sync syncp;
+};
+
+struct blkio_group {
+	/* An rcu protected unique identifier for the group */
+	void *key;
+	struct hlist_node blkcg_node;
+	unsigned short blkcg_id;
+	/* Store cgroup path */
+	char path[128];
+	/* The device MKDEV(major, minor), this group has been created for */
+	dev_t dev;
+	/* policy which owns this blk group */
+	enum blkio_policy_id plid;
+
+	/* Need to serialize the stats in the case of reset/update */
+	spinlock_t stats_lock;
+	struct blkio_group_stats stats;
+	/* Per cpu stats pointer */
+	struct blkio_group_stats_cpu __percpu *stats_cpu;
+};
+
+struct blkio_policy_node {
+	struct list_head node;
+	dev_t dev;
+	/* This node belongs to max bw policy or porportional weight policy */
+	enum blkio_policy_id plid;
+	/* cgroup file to which this rule belongs to */
+	int fileid;
+
+	union {
+		unsigned int weight;
+		/*
+		 * Rate read/write in terms of bytes per second
+		 * Whether this rate represents read or write is determined
+		 * by file type "fileid".
+		 */
+		u64 bps;
+		unsigned int iops;
+	} val;
+};
+
+extern unsigned int blkcg_get_weight(struct blkio_cgroup *blkcg,
+				     dev_t dev);
+extern uint64_t blkcg_get_read_bps(struct blkio_cgroup *blkcg,
+				     dev_t dev);
+extern uint64_t blkcg_get_write_bps(struct blkio_cgroup *blkcg,
+				     dev_t dev);
+extern unsigned int blkcg_get_read_iops(struct blkio_cgroup *blkcg,
+				     dev_t dev);
+extern unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg,
+				     dev_t dev);
+
+typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
+
+typedef void (blkio_update_group_weight_fn) (void *key,
+			struct blkio_group *blkg, unsigned int weight);
+typedef void (blkio_update_group_read_bps_fn) (void * key,
+			struct blkio_group *blkg, u64 read_bps);
+typedef void (blkio_update_group_write_bps_fn) (void *key,
+			struct blkio_group *blkg, u64 write_bps);
+typedef void (blkio_update_group_read_iops_fn) (void *key,
+			struct blkio_group *blkg, unsigned int read_iops);
+typedef void (blkio_update_group_write_iops_fn) (void *key,
+			struct blkio_group *blkg, unsigned int write_iops);
+
+struct blkio_policy_ops {
+	blkio_unlink_group_fn *blkio_unlink_group_fn;
+	blkio_update_group_weight_fn *blkio_update_group_weight_fn;
+	blkio_update_group_read_bps_fn *blkio_update_group_read_bps_fn;
+	blkio_update_group_write_bps_fn *blkio_update_group_write_bps_fn;
+	blkio_update_group_read_iops_fn *blkio_update_group_read_iops_fn;
+	blkio_update_group_write_iops_fn *blkio_update_group_write_iops_fn;
+};
+
+struct blkio_policy_type {
+	struct list_head list;
+	struct blkio_policy_ops ops;
+	enum blkio_policy_id plid;
+};
+
+/* Blkio controller policy registration */
+extern void blkio_policy_register(struct blkio_policy_type *);
+extern void blkio_policy_unregister(struct blkio_policy_type *);
+
+static inline char *blkg_path(struct blkio_group *blkg)
+{
+	return blkg->path;
+}
+
+#else
+
+struct blkio_group {
+};
+
+struct blkio_policy_type {
+};
+
+static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { }
+static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
+
+static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
+
+#endif
+
+#define BLKIO_WEIGHT_MIN	10
+#define BLKIO_WEIGHT_MAX	1000
+#define BLKIO_WEIGHT_DEFAULT	500
+
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg);
+void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+				unsigned long dequeue);
+void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg);
+void blkiocg_update_idle_time_stats(struct blkio_group *blkg);
+void blkiocg_set_start_empty_time(struct blkio_group *blkg);
+
+#define BLKG_FLAG_FNS(name)						\
+static inline void blkio_mark_blkg_##name(				\
+		struct blkio_group_stats *stats)			\
+{									\
+	stats->flags |= (1 << BLKG_##name);				\
+}									\
+static inline void blkio_clear_blkg_##name(				\
+		struct blkio_group_stats *stats)			\
+{									\
+	stats->flags &= ~(1 << BLKG_##name);				\
+}									\
+static inline int blkio_blkg_##name(struct blkio_group_stats *stats)	\
+{									\
+	return (stats->flags & (1 << BLKG_##name)) != 0;		\
+}									\
+
+BLKG_FLAG_FNS(waiting)
+BLKG_FLAG_FNS(idling)
+BLKG_FLAG_FNS(empty)
+#undef BLKG_FLAG_FNS
+#else
+static inline void blkiocg_update_avg_queue_size_stats(
+						struct blkio_group *blkg) {}
+static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+						unsigned long dequeue) {}
+static inline void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg)
+{}
+static inline void blkiocg_update_idle_time_stats(struct blkio_group *blkg) {}
+static inline void blkiocg_set_start_empty_time(struct blkio_group *blkg) {}
+#endif
+
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
+extern struct blkio_cgroup blkio_root_cgroup;
+extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
+extern struct blkio_cgroup *task_blkio_cgroup(struct task_struct *tsk);
+extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
+	struct blkio_group *blkg, void *key, dev_t dev,
+	enum blkio_policy_id plid);
+extern int blkio_alloc_blkg_stats(struct blkio_group *blkg);
+extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
+extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
+						void *key);
+void blkiocg_update_timeslice_used(struct blkio_group *blkg,
+					unsigned long time,
+					unsigned long unaccounted_time);
+void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes,
+						bool direction, bool sync);
+void blkiocg_update_completion_stats(struct blkio_group *blkg,
+	uint64_t start_time, uint64_t io_start_time, bool direction, bool sync);
+void blkiocg_update_io_merged_stats(struct blkio_group *blkg, bool direction,
+					bool sync);
+void blkiocg_update_io_add_stats(struct blkio_group *blkg,
+		struct blkio_group *curr_blkg, bool direction, bool sync);
+void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
+					bool direction, bool sync);
+#else
+struct cgroup;
+static inline struct blkio_cgroup *
+cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; }
+static inline struct blkio_cgroup *
+task_blkio_cgroup(struct task_struct *tsk) { return NULL; }
+
+static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
+		struct blkio_group *blkg, void *key, dev_t dev,
+		enum blkio_policy_id plid) {}
+
+static inline int blkio_alloc_blkg_stats(struct blkio_group *blkg) { return 0; }
+
+static inline int
+blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; }
+
+static inline struct blkio_group *
+blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; }
+static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg,
+						unsigned long time,
+						unsigned long unaccounted_time)
+{}
+static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
+				uint64_t bytes, bool direction, bool sync) {}
+static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,
+		uint64_t start_time, uint64_t io_start_time, bool direction,
+		bool sync) {}
+static inline void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
+						bool direction, bool sync) {}
+static inline void blkiocg_update_io_add_stats(struct blkio_group *blkg,
+		struct blkio_group *curr_blkg, bool direction, bool sync) {}
+static inline void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
+						bool direction, bool sync) {}
+#endif
+#endif /* _BLK_CGROUP_H */
--- linux-next.orig/block/blk-cgroup.h	2012-03-28 12:11:27.982345310 +0800
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,364 +0,0 @@
-#ifndef _BLK_CGROUP_H
-#define _BLK_CGROUP_H
-/*
- * Common Block IO controller cgroup interface
- *
- * Based on ideas and code from CFQ, CFS and BFQ:
- * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
- *
- * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
- *		      Paolo Valente <paolo.valente@unimore.it>
- *
- * Copyright (C) 2009 Vivek Goyal <vgoyal@redhat.com>
- * 	              Nauman Rafique <nauman@google.com>
- */
-
-#include <linux/cgroup.h>
-#include <linux/u64_stats_sync.h>
-
-enum blkio_policy_id {
-	BLKIO_POLICY_PROP = 0,		/* Proportional Bandwidth division */
-	BLKIO_POLICY_THROTL,		/* Throttling */
-};
-
-/* Max limits for throttle policy */
-#define THROTL_IOPS_MAX		UINT_MAX
-
-#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
-
-#ifndef CONFIG_BLK_CGROUP
-/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
-extern struct cgroup_subsys blkio_subsys;
-#define blkio_subsys_id blkio_subsys.subsys_id
-#endif
-
-enum stat_type {
-	/* Total time spent (in ns) between request dispatch to the driver and
-	 * request completion for IOs doen by this cgroup. This may not be
-	 * accurate when NCQ is turned on. */
-	BLKIO_STAT_SERVICE_TIME = 0,
-	/* Total time spent waiting in scheduler queue in ns */
-	BLKIO_STAT_WAIT_TIME,
-	/* Number of IOs queued up */
-	BLKIO_STAT_QUEUED,
-	/* All the single valued stats go below this */
-	BLKIO_STAT_TIME,
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-	/* Time not charged to this cgroup */
-	BLKIO_STAT_UNACCOUNTED_TIME,
-	BLKIO_STAT_AVG_QUEUE_SIZE,
-	BLKIO_STAT_IDLE_TIME,
-	BLKIO_STAT_EMPTY_TIME,
-	BLKIO_STAT_GROUP_WAIT_TIME,
-	BLKIO_STAT_DEQUEUE
-#endif
-};
-
-/* Per cpu stats */
-enum stat_type_cpu {
-	BLKIO_STAT_CPU_SECTORS,
-	/* Total bytes transferred */
-	BLKIO_STAT_CPU_SERVICE_BYTES,
-	/* Total IOs serviced, post merge */
-	BLKIO_STAT_CPU_SERVICED,
-	/* Number of IOs merged */
-	BLKIO_STAT_CPU_MERGED,
-	BLKIO_STAT_CPU_NR
-};
-
-enum stat_sub_type {
-	BLKIO_STAT_READ = 0,
-	BLKIO_STAT_WRITE,
-	BLKIO_STAT_SYNC,
-	BLKIO_STAT_ASYNC,
-	BLKIO_STAT_TOTAL
-};
-
-/* blkg state flags */
-enum blkg_state_flags {
-	BLKG_waiting = 0,
-	BLKG_idling,
-	BLKG_empty,
-};
-
-/* cgroup files owned by proportional weight policy */
-enum blkcg_file_name_prop {
-	BLKIO_PROP_weight = 1,
-	BLKIO_PROP_weight_device,
-	BLKIO_PROP_io_service_bytes,
-	BLKIO_PROP_io_serviced,
-	BLKIO_PROP_time,
-	BLKIO_PROP_sectors,
-	BLKIO_PROP_unaccounted_time,
-	BLKIO_PROP_io_service_time,
-	BLKIO_PROP_io_wait_time,
-	BLKIO_PROP_io_merged,
-	BLKIO_PROP_io_queued,
-	BLKIO_PROP_avg_queue_size,
-	BLKIO_PROP_group_wait_time,
-	BLKIO_PROP_idle_time,
-	BLKIO_PROP_empty_time,
-	BLKIO_PROP_dequeue,
-};
-
-/* cgroup files owned by throttle policy */
-enum blkcg_file_name_throtl {
-	BLKIO_THROTL_read_bps_device,
-	BLKIO_THROTL_write_bps_device,
-	BLKIO_THROTL_read_iops_device,
-	BLKIO_THROTL_write_iops_device,
-	BLKIO_THROTL_io_service_bytes,
-	BLKIO_THROTL_io_serviced,
-};
-
-struct blkio_cgroup {
-	struct cgroup_subsys_state css;
-	unsigned int weight;
-	spinlock_t lock;
-	struct hlist_head blkg_list;
-	struct list_head policy_list; /* list of blkio_policy_node */
-};
-
-struct blkio_group_stats {
-	/* total disk time and nr sectors dispatched by this group */
-	uint64_t time;
-	uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL];
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-	/* Time not charged to this cgroup */
-	uint64_t unaccounted_time;
-
-	/* Sum of number of IOs queued across all samples */
-	uint64_t avg_queue_size_sum;
-	/* Count of samples taken for average */
-	uint64_t avg_queue_size_samples;
-	/* How many times this group has been removed from service tree */
-	unsigned long dequeue;
-
-	/* Total time spent waiting for it to be assigned a timeslice. */
-	uint64_t group_wait_time;
-	uint64_t start_group_wait_time;
-
-	/* Time spent idling for this blkio_group */
-	uint64_t idle_time;
-	uint64_t start_idle_time;
-	/*
-	 * Total time when we have requests queued and do not contain the
-	 * current active queue.
-	 */
-	uint64_t empty_time;
-	uint64_t start_empty_time;
-	uint16_t flags;
-#endif
-};
-
-/* Per cpu blkio group stats */
-struct blkio_group_stats_cpu {
-	uint64_t sectors;
-	uint64_t stat_arr_cpu[BLKIO_STAT_CPU_NR][BLKIO_STAT_TOTAL];
-	struct u64_stats_sync syncp;
-};
-
-struct blkio_group {
-	/* An rcu protected unique identifier for the group */
-	void *key;
-	struct hlist_node blkcg_node;
-	unsigned short blkcg_id;
-	/* Store cgroup path */
-	char path[128];
-	/* The device MKDEV(major, minor), this group has been created for */
-	dev_t dev;
-	/* policy which owns this blk group */
-	enum blkio_policy_id plid;
-
-	/* Need to serialize the stats in the case of reset/update */
-	spinlock_t stats_lock;
-	struct blkio_group_stats stats;
-	/* Per cpu stats pointer */
-	struct blkio_group_stats_cpu __percpu *stats_cpu;
-};
-
-struct blkio_policy_node {
-	struct list_head node;
-	dev_t dev;
-	/* This node belongs to max bw policy or porportional weight policy */
-	enum blkio_policy_id plid;
-	/* cgroup file to which this rule belongs to */
-	int fileid;
-
-	union {
-		unsigned int weight;
-		/*
-		 * Rate read/write in terms of bytes per second
-		 * Whether this rate represents read or write is determined
-		 * by file type "fileid".
-		 */
-		u64 bps;
-		unsigned int iops;
-	} val;
-};
-
-extern unsigned int blkcg_get_weight(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern uint64_t blkcg_get_read_bps(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern uint64_t blkcg_get_write_bps(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern unsigned int blkcg_get_read_iops(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-
-typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
-
-typedef void (blkio_update_group_weight_fn) (void *key,
-			struct blkio_group *blkg, unsigned int weight);
-typedef void (blkio_update_group_read_bps_fn) (void * key,
-			struct blkio_group *blkg, u64 read_bps);
-typedef void (blkio_update_group_write_bps_fn) (void *key,
-			struct blkio_group *blkg, u64 write_bps);
-typedef void (blkio_update_group_read_iops_fn) (void *key,
-			struct blkio_group *blkg, unsigned int read_iops);
-typedef void (blkio_update_group_write_iops_fn) (void *key,
-			struct blkio_group *blkg, unsigned int write_iops);
-
-struct blkio_policy_ops {
-	blkio_unlink_group_fn *blkio_unlink_group_fn;
-	blkio_update_group_weight_fn *blkio_update_group_weight_fn;
-	blkio_update_group_read_bps_fn *blkio_update_group_read_bps_fn;
-	blkio_update_group_write_bps_fn *blkio_update_group_write_bps_fn;
-	blkio_update_group_read_iops_fn *blkio_update_group_read_iops_fn;
-	blkio_update_group_write_iops_fn *blkio_update_group_write_iops_fn;
-};
-
-struct blkio_policy_type {
-	struct list_head list;
-	struct blkio_policy_ops ops;
-	enum blkio_policy_id plid;
-};
-
-/* Blkio controller policy registration */
-extern void blkio_policy_register(struct blkio_policy_type *);
-extern void blkio_policy_unregister(struct blkio_policy_type *);
-
-static inline char *blkg_path(struct blkio_group *blkg)
-{
-	return blkg->path;
-}
-
-#else
-
-struct blkio_group {
-};
-
-struct blkio_policy_type {
-};
-
-static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { }
-static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
-
-static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
-
-#endif
-
-#define BLKIO_WEIGHT_MIN	10
-#define BLKIO_WEIGHT_MAX	1000
-#define BLKIO_WEIGHT_DEFAULT	500
-
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg);
-void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-				unsigned long dequeue);
-void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg);
-void blkiocg_update_idle_time_stats(struct blkio_group *blkg);
-void blkiocg_set_start_empty_time(struct blkio_group *blkg);
-
-#define BLKG_FLAG_FNS(name)						\
-static inline void blkio_mark_blkg_##name(				\
-		struct blkio_group_stats *stats)			\
-{									\
-	stats->flags |= (1 << BLKG_##name);				\
-}									\
-static inline void blkio_clear_blkg_##name(				\
-		struct blkio_group_stats *stats)			\
-{									\
-	stats->flags &= ~(1 << BLKG_##name);				\
-}									\
-static inline int blkio_blkg_##name(struct blkio_group_stats *stats)	\
-{									\
-	return (stats->flags & (1 << BLKG_##name)) != 0;		\
-}									\
-
-BLKG_FLAG_FNS(waiting)
-BLKG_FLAG_FNS(idling)
-BLKG_FLAG_FNS(empty)
-#undef BLKG_FLAG_FNS
-#else
-static inline void blkiocg_update_avg_queue_size_stats(
-						struct blkio_group *blkg) {}
-static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-						unsigned long dequeue) {}
-static inline void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg)
-{}
-static inline void blkiocg_update_idle_time_stats(struct blkio_group *blkg) {}
-static inline void blkiocg_set_start_empty_time(struct blkio_group *blkg) {}
-#endif
-
-#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
-extern struct blkio_cgroup blkio_root_cgroup;
-extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
-extern struct blkio_cgroup *task_blkio_cgroup(struct task_struct *tsk);
-extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-	struct blkio_group *blkg, void *key, dev_t dev,
-	enum blkio_policy_id plid);
-extern int blkio_alloc_blkg_stats(struct blkio_group *blkg);
-extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
-extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
-						void *key);
-void blkiocg_update_timeslice_used(struct blkio_group *blkg,
-					unsigned long time,
-					unsigned long unaccounted_time);
-void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes,
-						bool direction, bool sync);
-void blkiocg_update_completion_stats(struct blkio_group *blkg,
-	uint64_t start_time, uint64_t io_start_time, bool direction, bool sync);
-void blkiocg_update_io_merged_stats(struct blkio_group *blkg, bool direction,
-					bool sync);
-void blkiocg_update_io_add_stats(struct blkio_group *blkg,
-		struct blkio_group *curr_blkg, bool direction, bool sync);
-void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-					bool direction, bool sync);
-#else
-struct cgroup;
-static inline struct blkio_cgroup *
-cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; }
-static inline struct blkio_cgroup *
-task_blkio_cgroup(struct task_struct *tsk) { return NULL; }
-
-static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-		struct blkio_group *blkg, void *key, dev_t dev,
-		enum blkio_policy_id plid) {}
-
-static inline int blkio_alloc_blkg_stats(struct blkio_group *blkg) { return 0; }
-
-static inline int
-blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; }
-
-static inline struct blkio_group *
-blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; }
-static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg,
-						unsigned long time,
-						unsigned long unaccounted_time)
-{}
-static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-				uint64_t bytes, bool direction, bool sync) {}
-static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,
-		uint64_t start_time, uint64_t io_start_time, bool direction,
-		bool sync) {}
-static inline void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-						bool direction, bool sync) {}
-static inline void blkiocg_update_io_add_stats(struct blkio_group *blkg,
-		struct blkio_group *curr_blkg, bool direction, bool sync) {}
-static inline void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-						bool direction, bool sync) {}
-#endif
-#endif /* _BLK_CGROUP_H */



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/6] blk-cgroup: account dirtied pages
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
  2012-03-28 12:13 ` [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h Fengguang Wu
@ 2012-03-28 12:13 ` Fengguang Wu
  2012-03-28 12:13 ` [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight Fengguang Wu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 12:13 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Vivek Goyal, Wu Fengguang, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

[-- Attachment #1: blk-cgroup-nr-dirtied.patch --]
[-- Type: text/plain, Size: 2043 bytes --]


Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 block/blk-cgroup.c         |    4 ++++
 include/linux/blk-cgroup.h |    1 +
 mm/page-writeback.c        |    6 ++++++
 3 files changed, 11 insertions(+)

--- linux-next.orig/block/blk-cgroup.c	2012-03-28 14:55:47.522142976 +0800
+++ linux-next/block/blk-cgroup.c	2012-03-28 15:39:46.722088815 +0800
@@ -1594,6 +1594,7 @@ static void blkiocg_destroy(struct cgrou
 
 	free_css_id(&blkio_subsys, &blkcg->css);
 	rcu_read_unlock();
+	percpu_counter_destroy(&blkcg->nr_dirtied);
 	if (blkcg != &blkio_root_cgroup)
 		kfree(blkcg);
 }
@@ -1619,6 +1620,9 @@ done:
 	INIT_HLIST_HEAD(&blkcg->blkg_list);
 
 	INIT_LIST_HEAD(&blkcg->policy_list);
+
+	percpu_counter_init(&blkcg->nr_dirtied, 0);
+
 	return &blkcg->css;
 }
 
--- linux-next.orig/include/linux/blk-cgroup.h	2012-03-28 14:55:47.530142977 +0800
+++ linux-next/include/linux/blk-cgroup.h	2012-03-28 15:40:27.754087973 +0800
@@ -117,6 +117,7 @@ struct blkio_cgroup {
 	spinlock_t lock;
 	struct hlist_head blkg_list;
 	struct list_head policy_list; /* list of blkio_policy_node */
+	struct percpu_counter nr_dirtied;
 };
 
 struct blkio_group_stats {
--- linux-next.orig/mm/page-writeback.c	2012-03-28 14:55:47.510142976 +0800
+++ linux-next/mm/page-writeback.c	2012-03-28 15:40:39.366087735 +0800
@@ -34,6 +34,7 @@
 #include <linux/syscalls.h>
 #include <linux/buffer_head.h> /* __set_page_dirty_buffers */
 #include <linux/pagevec.h>
+#include <linux/blk-cgroup.h>
 #include <trace/events/writeback.h>
 
 /*
@@ -1933,6 +1934,11 @@ int __set_page_dirty_no_writeback(struct
 void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
 	if (mapping_cap_account_dirty(mapping)) {
+#ifdef CONFIG_BLK_DEV_THROTTLING
+		struct blkio_cgroup *blkcg = task_blkio_cgroup(current);
+		if (blkcg)
+			__percpu_counter_add(&blkcg->nr_dirtied, 1, BDI_STAT_BATCH);
+#endif
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
 		__inc_zone_page_state(page, NR_DIRTIED);
 		__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
  2012-03-28 12:13 ` [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h Fengguang Wu
  2012-03-28 12:13 ` [PATCH 2/6] blk-cgroup: account dirtied pages Fengguang Wu
@ 2012-03-28 12:13 ` Fengguang Wu
  2012-03-28 12:13 ` [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit Fengguang Wu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 12:13 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Vivek Goyal, Fengguang Wu, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

[-- Attachment #1: writeback-io-controller-weight.patch --]
[-- Type: text/plain, Size: 3223 bytes --]

blkcg->weight can be trivially supported for buffered writes by directly
scaling task_ratelimit in balance_dirty_pages().

However the sementics are not quite the same with direct IO.

- for direct IO, weight is normally applied to disk time
- for buffered writes, weight is applied to dirty rate

Notes about the (balanced_dirty_ratelimit > write_bw) check removal:

When there is only one dd running and its weight is set to
BLKIO_WEIGHT_MIN=10, bdi->dirty_ratelimit will end up balancing around

	write_bw * BLKIO_WEIGHT_DEFAULT / BLKIO_WEIGHT_MIN 
	= write_bw * 50

So the limit should now be raised to (write_bw * 50) to deal with the
above extreme case, which seems too large to be useful for normal cases.
So just remove it.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 include/linux/blk-cgroup.h |   18 ++++++++++++++----
 mm/page-writeback.c        |   10 +++++-----
 2 files changed, 19 insertions(+), 9 deletions(-)

--- linux-next.orig/include/linux/blk-cgroup.h	2012-03-28 13:42:26.686233288 +0800
+++ linux-next/include/linux/blk-cgroup.h	2012-03-28 14:25:16.150180560 +0800
@@ -21,6 +21,10 @@ enum blkio_policy_id {
 	BLKIO_POLICY_THROTL,		/* Throttling */
 };
 
+#define BLKIO_WEIGHT_MIN	10
+#define BLKIO_WEIGHT_MAX	1000
+#define BLKIO_WEIGHT_DEFAULT	500
+
 /* Max limits for throttle policy */
 #define THROTL_IOPS_MAX		UINT_MAX
 
@@ -209,6 +213,11 @@ extern unsigned int blkcg_get_read_iops(
 extern unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg,
 				     dev_t dev);
 
+static inline unsigned int blkcg_weight(struct blkio_cgroup *blkcg)
+{
+	return blkcg->weight;
+}
+
 typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
 
 typedef void (blkio_update_group_weight_fn) (void *key,
@@ -259,11 +268,12 @@ static inline void blkio_policy_unregist
 
 static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
 
-#endif
+static inline unsigned int blkcg_weight(struct blkio_cgroup *blkcg)
+{
+	return BLKIO_WEIGHT_DEFAULT;
+}
 
-#define BLKIO_WEIGHT_MIN	10
-#define BLKIO_WEIGHT_MAX	1000
-#define BLKIO_WEIGHT_DEFAULT	500
+#endif
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg);
--- linux-next.orig/mm/page-writeback.c	2012-03-28 13:42:26.678233289 +0800
+++ linux-next/mm/page-writeback.c	2012-03-28 14:26:02.694179605 +0800
@@ -905,11 +905,6 @@ static void bdi_update_dirty_ratelimit(s
 	 */
 	balanced_dirty_ratelimit = div_u64((u64)task_ratelimit * write_bw,
 					   dirty_rate | 1);
-	/*
-	 * balanced_dirty_ratelimit ~= (write_bw / N) <= write_bw
-	 */
-	if (unlikely(balanced_dirty_ratelimit > write_bw))
-		balanced_dirty_ratelimit = write_bw;
 
 	/*
 	 * We could safely do this and return immediately:
@@ -1263,6 +1258,11 @@ static void balance_dirty_pages(struct a
 					       bdi_thresh, bdi_dirty);
 		task_ratelimit = ((u64)dirty_ratelimit * pos_ratio) >>
 							RATELIMIT_CALC_SHIFT;
+
+		if (blkcg_weight(blkcg) != BLKIO_WEIGHT_DEFAULT)
+			task_ratelimit = (u64)task_ratelimit *
+				blkcg_weight(blkcg) / BLKIO_WEIGHT_DEFAULT;
+
 		max_pause = bdi_max_pause(bdi, bdi_dirty);
 		min_pause = bdi_min_pause(bdi, max_pause,
 					  task_ratelimit, dirty_ratelimit,



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
                   ` (2 preceding siblings ...)
  2012-03-28 12:13 ` [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight Fengguang Wu
@ 2012-03-28 12:13 ` Fengguang Wu
  2012-03-28 12:13 ` [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface Fengguang Wu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 12:13 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Vivek Goyal, Andrea Righi, Wu Fengguang, Suresh Jayaraman,
	Andrea Righi, Jeff Moyer, linux-fsdevel, LKML

[-- Attachment #1: writeback-io-controller.patch --]
[-- Type: text/plain, Size: 4642 bytes --]

A bare per-cgroup buffered write IO controller.

Basically, when there are N dd tasks running in the blkcg,
blkcg->dirty_ratelimit will be balanced around

	blkcg->buffered_write_bps / N

and each blkcg task will be throttled under

	blkcg->dirty_ratelimit
or 
	min(blkcg->dirty_ratelimit, bdi->dirty_ratelimit)
when there are other dirtier tasks in the system.

CC: Vivek Goyal <vgoyal@redhat.com>
CC: Andrea Righi <arighi@develer.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/blk-cgroup.h |   20 +++++++++++
 mm/page-writeback.c        |   59 +++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

--- linux-next.orig/mm/page-writeback.c	2012-03-28 15:36:16.414093131 +0800
+++ linux-next/mm/page-writeback.c	2012-03-28 15:40:25.446088022 +0800
@@ -1145,6 +1145,54 @@ static long bdi_min_pause(struct backing
 	return pages >= DIRTY_POLL_THRESH ? 1 + t / 2 : t;
 }
 
+#ifdef CONFIG_BLK_DEV_THROTTLING
+static void blkcg_update_dirty_ratelimit(struct blkio_cgroup *blkcg,
+					 unsigned long dirtied,
+					 unsigned long elapsed)
+{
+	unsigned long long bps = blkcg_buffered_write_bps(blkcg);
+	unsigned long long ratelimit;
+	unsigned long dirty_rate;
+
+	dirty_rate = (dirtied - blkcg->dirtied_stamp) * HZ;
+	dirty_rate /= elapsed;
+
+	ratelimit = blkcg->dirty_ratelimit;
+	ratelimit *= div_u64(bps, dirty_rate + 1);
+	ratelimit = min(ratelimit, bps);
+	ratelimit >>= PAGE_SHIFT;
+
+	blkcg->dirty_ratelimit = (blkcg->dirty_ratelimit + ratelimit) / 2 + 1;
+}
+
+void blkcg_update_bandwidth(struct blkio_cgroup *blkcg)
+{
+	unsigned long now = jiffies;
+	unsigned long dirtied;
+	unsigned long elapsed;
+
+	if (!blkcg)
+		return;
+	if (!spin_trylock(&blkcg->lock))
+		return;
+
+	elapsed = now - blkcg->bw_time_stamp;
+	dirtied = percpu_counter_read(&blkcg->nr_dirtied);
+
+	if (elapsed > MAX_PAUSE * 2)
+		goto snapshot;
+	if (elapsed <= MAX_PAUSE)
+		goto unlock;
+
+	blkcg_update_dirty_ratelimit(blkcg, dirtied, elapsed);
+snapshot:
+	blkcg->dirtied_stamp = dirtied;
+	blkcg->bw_time_stamp = now;
+unlock:
+	spin_unlock(&blkcg->lock);
+}
+#endif
+
 /*
  * balance_dirty_pages() must be called by processes which are generating dirty
  * data.  It looks at the number of dirty pages in the machine and will force
@@ -1174,6 +1222,7 @@ static void balance_dirty_pages(struct a
 	unsigned long pos_ratio;
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 	unsigned long start_time = jiffies;
+	struct blkio_cgroup *blkcg = task_blkio_cgroup(current);
 
 	for (;;) {
 		unsigned long now = jiffies;
@@ -1198,6 +1247,8 @@ static void balance_dirty_pages(struct a
 		freerun = dirty_freerun_ceiling(dirty_thresh,
 						background_thresh);
 		if (nr_dirty <= freerun) {
+			if (blkcg_buffered_write_bps(blkcg))
+				goto blkcg_bps;
 			current->dirty_paused_when = now;
 			current->nr_dirtied = 0;
 			current->nr_dirtied_pause =
@@ -1263,6 +1314,14 @@ static void balance_dirty_pages(struct a
 			task_ratelimit = (u64)task_ratelimit *
 				blkcg_weight(blkcg) / BLKIO_WEIGHT_DEFAULT;
 
+		if (blkcg_buffered_write_bps(blkcg) &&
+		    task_ratelimit > blkcg_dirty_ratelimit(blkcg)) {
+blkcg_bps:
+			blkcg_update_bandwidth(blkcg);
+			dirty_ratelimit = blkcg_dirty_ratelimit(blkcg);
+			task_ratelimit = dirty_ratelimit;
+		}
+
 		max_pause = bdi_max_pause(bdi, bdi_dirty);
 		min_pause = bdi_min_pause(bdi, max_pause,
 					  task_ratelimit, dirty_ratelimit,
--- linux-next.orig/include/linux/blk-cgroup.h	2012-03-28 15:36:16.414093131 +0800
+++ linux-next/include/linux/blk-cgroup.h	2012-03-28 15:39:46.730088815 +0800
@@ -122,6 +122,10 @@ struct blkio_cgroup {
 	struct hlist_head blkg_list;
 	struct list_head policy_list; /* list of blkio_policy_node */
 	struct percpu_counter nr_dirtied;
+	unsigned long bw_time_stamp;
+	unsigned long dirtied_stamp;
+	unsigned long dirty_ratelimit;
+	unsigned long long buffered_write_bps;
 };
 
 struct blkio_group_stats {
@@ -217,6 +221,14 @@ static inline unsigned int blkcg_weight(
 {
 	return blkcg->weight;
 }
+static inline uint64_t blkcg_buffered_write_bps(struct blkio_cgroup *blkcg)
+{
+	return blkcg->buffered_write_bps;
+}
+static inline unsigned long blkcg_dirty_ratelimit(struct blkio_cgroup *blkcg)
+{
+	return blkcg->dirty_ratelimit;
+}
 
 typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
 
@@ -272,6 +284,14 @@ static inline unsigned int blkcg_weight(
 {
 	return BLKIO_WEIGHT_DEFAULT;
 }
+static inline uint64_t blkcg_buffered_write_bps(struct blkio_cgroup *blkcg)
+{
+	return 0;
+}
+static inline unsigned long blkcg_dirty_ratelimit(struct blkio_cgroup *blkcg)
+{
+	return 0;
+}
 
 #endif
 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
                   ` (3 preceding siblings ...)
  2012-03-28 12:13 ` [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit Fengguang Wu
@ 2012-03-28 12:13 ` Fengguang Wu
  2012-03-28 12:13 ` [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace Fengguang Wu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 12:13 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Vivek Goyal, Wu Fengguang, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

[-- Attachment #1: writeback-io-controller-interface.patch --]
[-- Type: text/plain, Size: 1799 bytes --]

Add blkio controller interface "throttle.buffered_write_bps".

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 block/blk-cgroup.c         |   21 +++++++++++++++++++++
 include/linux/blk-cgroup.h |    1 +
 2 files changed, 22 insertions(+)

--- linux-next.orig/block/blk-cgroup.c	2012-03-28 15:36:16.402093131 +0800
+++ linux-next/block/blk-cgroup.c	2012-03-28 15:36:44.974092545 +0800
@@ -1355,6 +1355,12 @@ static u64 blkiocg_file_read_u64 (struct
 			return (u64)blkcg->weight;
 		}
 		break;
+	case BLKIO_POLICY_THROTL:
+		switch (name) {
+		case BLKIO_THROTL_buffered_write_bps:
+			return (u64)blkcg->buffered_write_bps;
+		}
+		break;
 	default:
 		BUG();
 	}
@@ -1377,6 +1383,13 @@ blkiocg_file_write_u64(struct cgroup *cg
 			return blkio_weight_write(blkcg, val);
 		}
 		break;
+	case BLKIO_POLICY_THROTL:
+		switch (name) {
+		case BLKIO_THROTL_buffered_write_bps:
+			blkcg->buffered_write_bps = val;
+			return 0;
+		}
+		break;
 	default:
 		BUG();
 	}
@@ -1500,6 +1513,14 @@ struct cftype blkio_files[] = {
 				BLKIO_THROTL_io_serviced),
 		.read_map = blkiocg_file_read_map,
 	},
+	{
+		.name = "throttle.buffered_write_bps",
+		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
+				BLKIO_THROTL_buffered_write_bps),
+		.read_u64 = blkiocg_file_read_u64,
+		.write_u64 = blkiocg_file_write_u64,
+		.max_write_len = 256,
+	},
 #endif /* CONFIG_BLK_DEV_THROTTLING */
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
--- linux-next.orig/include/linux/blk-cgroup.h	2012-03-28 15:36:16.426093131 +0800
+++ linux-next/include/linux/blk-cgroup.h	2012-03-28 15:36:44.974092545 +0800
@@ -113,6 +113,7 @@ enum blkcg_file_name_throtl {
 	BLKIO_THROTL_write_iops_device,
 	BLKIO_THROTL_io_service_bytes,
 	BLKIO_THROTL_io_serviced,
+	BLKIO_THROTL_buffered_write_bps,
 };
 
 struct blkio_cgroup {



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
                   ` (4 preceding siblings ...)
  2012-03-28 12:13 ` [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface Fengguang Wu
@ 2012-03-28 12:13 ` Fengguang Wu
  2012-03-28 21:10 ` [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Vivek Goyal
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 12:13 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Vivek Goyal, Wu Fengguang, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

[-- Attachment #1: writeback-io-controller-trace.patch --]
[-- Type: text/plain, Size: 5568 bytes --]

test-blkio-cgroup.sh

	#!/bin/sh

	mount /dev/sda7 /fs

	echo 1 > /debug/tracing/events/writeback/balance_dirty_pages/enable
	echo 1 > /debug/tracing/events/writeback/blkcg_dirty_ratelimit/enable

	rmdir /cgroup/buffered_write
	mkdir /cgroup/buffered_write
	echo $$ > /cgroup/buffered_write/tasks
	echo $((2<<20)) > /cgroup/buffered_write/blkio.throttle.buffered_write_bps

	dd if=/dev/zero of=/fs/zero1 bs=1M count=100 &
	dd if=/dev/zero of=/fs/zero2 bs=1M count=100 &

run 1:
	104857600 bytes (105 MB) copied, 97.8103 s, 1.1 MB/s
	104857600 bytes (105 MB) copied, 97.9835 s, 1.1 MB/s
run 2:
	104857600 bytes (105 MB) copied, 98.5704 s, 1.1 MB/s
	104857600 bytes (105 MB) copied, 98.6268 s, 1.1 MB/s

average bps:	100MiB / 98.248s = 1.02MiB/s

run 1 trace:
              dd-3485  [000] ....   658.737063: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=1932 dirty_ratelimit=1064 balanced_dirty_ratelimit=1088
              dd-3485  [000] ....   658.976945: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2000 dirty_ratelimit=1076 balanced_dirty_ratelimit=1084
              dd-3485  [000] ....   659.212830: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2440 dirty_ratelimit=992 balanced_dirty_ratelimit=900
              dd-3485  [002] ....   659.470651: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=1860 dirty_ratelimit=1044 balanced_dirty_ratelimit=1088
              dd-3485  [002] ....   659.714535: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2360 dirty_ratelimit=976 balanced_dirty_ratelimit=904
              dd-3485  [002] ....   659.976381: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=1832 dirty_ratelimit=1036 balanced_dirty_ratelimit=1088
              dd-3485  [000] ....   660.222254: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2340 dirty_ratelimit=972 balanced_dirty_ratelimit=904
              dd-3485  [000] ....   660.484089: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=1464 dirty_ratelimit=1164 balanced_dirty_ratelimit=1352
              dd-3485  [000] ....   660.701984: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2640 dirty_ratelimit=1036 balanced_dirty_ratelimit=900
              dd-3485  [000] ....   660.947856: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=1948 dirty_ratelimit=1064 balanced_dirty_ratelimit=1084
              dd-3485  [000] ....   661.187727: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2000 dirty_ratelimit=1076 balanced_dirty_ratelimit=1084
              dd-3485  [000] ....   661.423572: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2440 dirty_ratelimit=992 balanced_dirty_ratelimit=900
              dd-3485  [000] ....   661.681431: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2232 dirty_ratelimit=952 balanced_dirty_ratelimit=908
              dd-3485  [002] ....   661.949290: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=1432 dirty_ratelimit=1156 balanced_dirty_ratelimit=1356
              dd-3485  [002] ....   662.169176: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2616 dirty_ratelimit=1032 balanced_dirty_ratelimit=900
              dd-3485  [000] ....   662.417016: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2320 dirty_ratelimit=972 balanced_dirty_ratelimit=908
              dd-3485  [000] ....   662.678903: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=1464 dirty_ratelimit=1164 balanced_dirty_ratelimit=1352
              dd-3485  [000] ....   662.896764: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2640 dirty_ratelimit=1036 balanced_dirty_ratelimit=900
              dd-3485  [002] ....   663.142644: blkcg_dirty_ratelimit: kbps=2048 dirty_rate=2340 dirty_ratelimit=972 balanced_dirty_ratelimit=904

It looks good enough as a proposal.  Could be made more accurate if necessary.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/trace/events/writeback.h |   34 +++++++++++++++++++++++++++++
 mm/page-writeback.c              |    2 +
 2 files changed, 36 insertions(+)

--- linux-next.orig/mm/page-writeback.c	2012-03-28 15:36:16.426093131 +0800
+++ linux-next/mm/page-writeback.c	2012-03-28 15:36:47.906092485 +0800
@@ -1163,6 +1163,8 @@ static void blkcg_update_dirty_ratelimit
 	ratelimit >>= PAGE_SHIFT;
 
 	blkcg->dirty_ratelimit = (blkcg->dirty_ratelimit + ratelimit) / 2 + 1;
+	trace_blkcg_dirty_ratelimit(bps, dirty_rate,
+				    blkcg->dirty_ratelimit, ratelimit);
 }
 
 void blkcg_update_bandwidth(struct blkio_cgroup *blkcg)
--- linux-next.orig/include/trace/events/writeback.h	2012-03-28 14:25:16.026180561 +0800
+++ linux-next/include/trace/events/writeback.h	2012-03-28 15:36:47.906092485 +0800
@@ -249,6 +249,40 @@ TRACE_EVENT(global_dirty_state,
 
 #define KBps(x)			((x) << (PAGE_SHIFT - 10))
 
+TRACE_EVENT(blkcg_dirty_ratelimit,
+
+	TP_PROTO(unsigned long bps,
+		 unsigned long dirty_rate,
+		 unsigned long dirty_ratelimit,
+		 unsigned long balanced_dirty_ratelimit),
+
+	TP_ARGS(bps, dirty_rate, dirty_ratelimit, balanced_dirty_ratelimit),
+
+	TP_STRUCT__entry(
+		__field(unsigned long,	kbps)
+		__field(unsigned long,	dirty_rate)
+		__field(unsigned long,	dirty_ratelimit)
+		__field(unsigned long,	balanced_dirty_ratelimit)
+	),
+
+	TP_fast_assign(
+		__entry->kbps = bps >> 10;
+		__entry->dirty_rate = KBps(dirty_rate);
+		__entry->dirty_ratelimit = KBps(dirty_ratelimit);
+		__entry->balanced_dirty_ratelimit =
+					  KBps(balanced_dirty_ratelimit);
+	),
+
+	TP_printk("kbps=%lu dirty_rate=%lu "
+		  "dirty_ratelimit=%lu "
+		  "balanced_dirty_ratelimit=%lu",
+		  __entry->kbps,
+		  __entry->dirty_rate,
+		  __entry->dirty_ratelimit,
+		  __entry->balanced_dirty_ratelimit
+	)
+);
+
 TRACE_EVENT(bdi_dirty_ratelimit,
 
 	TP_PROTO(struct backing_dev_info *bdi,



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
                   ` (5 preceding siblings ...)
  2012-03-28 12:13 ` [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace Fengguang Wu
@ 2012-03-28 21:10 ` Vivek Goyal
  2012-03-28 22:35   ` Fengguang Wu
  2012-03-29  2:48   ` Suresh Jayaraman
  2012-03-29  0:34 ` KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 18+ messages in thread
From: Vivek Goyal @ 2012-03-28 21:10 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Linux Memory Management List, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

On Wed, Mar 28, 2012 at 08:13:08PM +0800, Fengguang Wu wrote:
> 
> Here is one possible solution to "buffered write IO controller", based on Linux
> v3.3
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> 
> Features:
> - support blkio.weight

So this does proportional write bandwidth division on bdi for buffered
writes?

> - support blkio.throttle.buffered_write_bps

This is absolute limit systemwide or per bdi?

[..]
> The test results included in the last patch look pretty good in despite of the
> simple implementation.
> 
>  [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
>  [PATCH 2/6] blk-cgroup: account dirtied pages
>  [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
>  [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
>  [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
>  [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
> 

Hi Fengguang,

Only patch 0 and patch 4 have shown up in my mail box. Same seems to be
the case for lkml. I am wondering what happened to rest of the patches.

Will understand the patches better once I have the full set.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-03-28 21:10 ` [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Vivek Goyal
@ 2012-03-28 22:35   ` Fengguang Wu
  2012-03-29  2:48   ` Suresh Jayaraman
  1 sibling, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-28 22:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Linux Memory Management List, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

Hi Vivek,

On Wed, Mar 28, 2012 at 05:10:18PM -0400, Vivek Goyal wrote:
> On Wed, Mar 28, 2012 at 08:13:08PM +0800, Fengguang Wu wrote:
> > 
> > Here is one possible solution to "buffered write IO controller", based on Linux
> > v3.3
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> > 
> > Features:
> > - support blkio.weight
> 
> So this does proportional write bandwidth division on bdi for buffered
> writes?

Right. That is done in patch 3, costing only 3 lines in balance_dirty_pages().

> > - support blkio.throttle.buffered_write_bps
> 
> This is absolute limit systemwide or per bdi?

It's per-blkcg absolute limit. It can be extended to per-blkcg-per-bdi
limits w/o changing the basic algorithms. We only need to change interface and
vectorize the variables:
        struct percpu_counter nr_dirtied;
        unsigned long bw_time_stamp;
        unsigned long dirtied_stamp;
        unsigned long dirty_ratelimit;
        unsigned long long buffered_write_bps;
and add a "bdi" parameter to relevant functions.

> [..]
> > The test results included in the last patch look pretty good in despite of the
> > simple implementation.
> > 
> >  [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
> >  [PATCH 2/6] blk-cgroup: account dirtied pages
> >  [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
> >  [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
> >  [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
> >  [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
> > 
> 
> Hi Fengguang,
> 
> Only patch 0 and patch 4 have shown up in my mail box. Same seems to be
> the case for lkml. I am wondering what happened to rest of the patches.

Sorry I shut down my laptop before all emails are sent out.

> Will understand the patches better once I have the full set.

OK, thanks!

Fengguang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
                   ` (6 preceding siblings ...)
  2012-03-28 21:10 ` [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Vivek Goyal
@ 2012-03-29  0:34 ` KAMEZAWA Hiroyuki
  2012-03-29  1:22   ` Fengguang Wu
  2012-04-01  4:16 ` Suresh Jayaraman
  2012-04-01 20:56 ` Vivek Goyal
  9 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  0:34 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Linux Memory Management List, Vivek Goyal, Suresh Jayaraman,
	Andrea Righi, Jeff Moyer, linux-fsdevel, LKML

(2012/03/28 21:13), Fengguang Wu wrote:

> Here is one possible solution to "buffered write IO controller", based on Linux
> v3.3
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> 
> Features:
> - support blkio.weight
> - support blkio.throttle.buffered_write_bps
> 
> Possibilities:
> - it's trivial to support per-bdi .weight or .buffered_write_bps
> 
> Pros:
> 1) simple
> 2) virtually no space/time overheads
> 3) independent of the block layer and IO schedulers, hence
> 3.1) supports all filesystems/storages, eg. NFS/pNFS, CIFS, sshfs, ...
> 3.2) supports all IO schedulers. One may use noop for SSDs, inside virtual machines, over iSCSI, etc.
> 
> Cons:
> 1) don't try to smooth bursty IO submission in the flusher thread (*)
> 2) don't support IOPS based throttling
> 3) introduces semantic differences to blkio.weight, which will be
>    - working by "bandwidth" for buffered writes
>    - working by "device time" for direct IO
> 
> (*) Maybe not a big concern, since the bursties are limited to 500ms: if one dd
> is throttled to 50% disk bandwidth, the flusher thread will be waking up on
> every 1 second, keep the disk busy for 500ms and then go idle for 500ms; if
> throttled to 10% disk bandwidth, the flusher thread will wake up on every 5s,
> keep busy for 500ms and stay idle for 4.5s.
> 
> The test results included in the last patch look pretty good in despite of the
> simple implementation.
> 

yes, seems very good.

>  [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
>  [PATCH 2/6] blk-cgroup: account dirtied pages
>  [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
>  [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
>  [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
>  [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
> 
> The changeset is dominated by the blk-cgroup.h move.
> The core changes (to page-writeback.c) are merely 77 lines.
> 
>  block/blk-cgroup.c               |   27 +
>  block/blk-cgroup.h               |  364 --------------------------
>  block/blk-throttle.c             |    2 
>  block/cfq.h                      |    2 
>  include/linux/blk-cgroup.h       |  396 +++++++++++++++++++++++++++++
>  include/trace/events/writeback.h |   34 ++
>  mm/page-writeback.c              |   77 +++++
>  7 files changed, 530 insertions(+), 372 deletions(-)
> 


Thank you very much. I like this simple implementation.
I have 3 questions..

- Do you have any plan to enhance this to support hierarchical accounting ?
- Can we get wait-time-for-dirty-pages summary per blkio cgroup ?
- Can we get status (dirty/sec) per blkio cgroup ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-03-29  0:34 ` KAMEZAWA Hiroyuki
@ 2012-03-29  1:22   ` Fengguang Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-03-29  1:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux Memory Management List, Vivek Goyal, Suresh Jayaraman,
	Andrea Righi, Jeff Moyer, linux-fsdevel, LKML

On Thu, Mar 29, 2012 at 09:34:04AM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/03/28 21:13), Fengguang Wu wrote:
> 
> > Here is one possible solution to "buffered write IO controller", based on Linux
> > v3.3
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> > 
> > Features:
> > - support blkio.weight
> > - support blkio.throttle.buffered_write_bps
> > 
> > Possibilities:
> > - it's trivial to support per-bdi .weight or .buffered_write_bps
> > 
> > Pros:
> > 1) simple
> > 2) virtually no space/time overheads
> > 3) independent of the block layer and IO schedulers, hence
> > 3.1) supports all filesystems/storages, eg. NFS/pNFS, CIFS, sshfs, ...
> > 3.2) supports all IO schedulers. One may use noop for SSDs, inside virtual machines, over iSCSI, etc.
> > 
> > Cons:
> > 1) don't try to smooth bursty IO submission in the flusher thread (*)
> > 2) don't support IOPS based throttling
> > 3) introduces semantic differences to blkio.weight, which will be
> >    - working by "bandwidth" for buffered writes
> >    - working by "device time" for direct IO
> > 
> > (*) Maybe not a big concern, since the bursties are limited to 500ms: if one dd
> > is throttled to 50% disk bandwidth, the flusher thread will be waking up on
> > every 1 second, keep the disk busy for 500ms and then go idle for 500ms; if
> > throttled to 10% disk bandwidth, the flusher thread will wake up on every 5s,
> > keep busy for 500ms and stay idle for 4.5s.
> > 
> > The test results included in the last patch look pretty good in despite of the
> > simple implementation.
> > 
> 
> yes, seems very good.
> 
> >  [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
> >  [PATCH 2/6] blk-cgroup: account dirtied pages
> >  [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
> >  [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
> >  [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
> >  [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
> > 
> > The changeset is dominated by the blk-cgroup.h move.
> > The core changes (to page-writeback.c) are merely 77 lines.
> > 
> >  block/blk-cgroup.c               |   27 +
> >  block/blk-cgroup.h               |  364 --------------------------
> >  block/blk-throttle.c             |    2 
> >  block/cfq.h                      |    2 
> >  include/linux/blk-cgroup.h       |  396 +++++++++++++++++++++++++++++
> >  include/trace/events/writeback.h |   34 ++
> >  mm/page-writeback.c              |   77 +++++
> >  7 files changed, 530 insertions(+), 372 deletions(-)
> > 
> 
> 
> Thank you very much. I like this simple implementation.

Thank you :)

> I have 3 questions..
> 
> - Do you have any plan to enhance this to support hierarchical accounting ?

Given hierarchy A/B/C and when throttling a task from C,

- blkio.weight is relatively simple, just scale task_ratelimit by

        C.weight * B.weight * A.weight / BLKIO_WEIGHT_DEFAULT^3

*Optionally*, if there comes heavy use of really deep hierarchy, to
avoid repeated runtime overheads, we may cache the above value inside memcg C.

- blkio.throttle.buffered_write_bps can be carried out by limiting
  task_ratelimit to

        min(C.dirty_throttle, B.dirty_throttle, A.dirty_throttle, bdi.dirty_throttle)

*Optionally*, to avoid repeated runtime overheads of walking the
hierarchy, we may also cache the above value (taking away the bdi one)
inside memcg C, taking advantage of the fact that *.dirty_throttle are
all updated in 200ms intervals.

The dirty count need some special care:
- in account_page_dirtied(), increase dirty count of the task's *direct* attached cgroup
- in blkcg_update_bandwidth(), which runs on every 200ms, compute A's
  hierarchical dirty count as
          A.total_dirtied = A.nr_dirtied + B.nr_dirtied + C.nr_dirtied

> - Can we get wait-time-for-dirty-pages summary per blkio cgroup ?

Sure it's possible. We may export min/max/avg/stddev summaries of the
wait time.

> - Can we get status (dirty/sec) per blkio cgroup ?

It would be trivial to do, too.

For now, the above stats can be derived from the blkcg_dirty_ratelimit
and balance_dirty_pages trace events.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-03-28 21:10 ` [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Vivek Goyal
  2012-03-28 22:35   ` Fengguang Wu
@ 2012-03-29  2:48   ` Suresh Jayaraman
  1 sibling, 0 replies; 18+ messages in thread
From: Suresh Jayaraman @ 2012-03-29  2:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Fengguang Wu, Linux Memory Management List, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

On 03/29/2012 02:40 AM, Vivek Goyal wrote:
> On Wed, Mar 28, 2012 at 08:13:08PM +0800, Fengguang Wu wrote:
>>
>> Here is one possible solution to "buffered write IO controller", based on Linux
>> v3.3
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
>>
>> Features:
>> - support blkio.weight
> 
> So this does proportional write bandwidth division on bdi for buffered
> writes?

yes.

> 
>> - support blkio.throttle.buffered_write_bps
> 
> This is absolute limit systemwide or per bdi?

system-wide and Fengguang thinks that per bdi should be implemented
trivially.

> [..]
>> The test results included in the last patch look pretty good in despite of the
>> simple implementation.
>>
>>  [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
>>  [PATCH 2/6] blk-cgroup: account dirtied pages
>>  [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
>>  [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
>>  [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
>>  [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
>>
> 
> Hi Fengguang,
> 
> Only patch 0 and patch 4 have shown up in my mail box. Same seems to be
> the case for lkml. I am wondering what happened to rest of the patches.

Same here. But, the rest of the patches showed up much later. In any
case you can access the fullset from here

http://git.kernel.org/?p=linux/kernel/git/wfg/linux.git;a=shortlog;h=refs/heads/buffered-write-io-controller


> Will understand the patches better once I have the full set.
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
                   ` (7 preceding siblings ...)
  2012-03-29  0:34 ` KAMEZAWA Hiroyuki
@ 2012-04-01  4:16 ` Suresh Jayaraman
  2012-04-01  8:30   ` Fengguang Wu
  2012-04-01 20:56 ` Vivek Goyal
  9 siblings, 1 reply; 18+ messages in thread
From: Suresh Jayaraman @ 2012-04-01  4:16 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Linux Memory Management List, Vivek Goyal, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

On 03/28/2012 05:43 PM, Fengguang Wu wrote:
> Here is one possible solution to "buffered write IO controller", based on Linux
> v3.3
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> 

The implementation looks unbelievably simple. I ran a few tests
(throttling) and I found it working well generally.

> Features:
> - support blkio.weight
> - support blkio.throttle.buffered_write_bps
> 
> Possibilities:
> - it's trivial to support per-bdi .weight or .buffered_write_bps
> 
> Pros:
> 1) simple
> 2) virtually no space/time overheads
> 3) independent of the block layer and IO schedulers, hence
> 3.1) supports all filesystems/storages, eg. NFS/pNFS, CIFS, sshfs, ...
> 3.2) supports all IO schedulers. One may use noop for SSDs, inside virtual machines, over iSCSI, etc.
> 
> Cons:
> 1) don't try to smooth bursty IO submission in the flusher thread (*)
> 2) don't support IOPS based throttling
> 3) introduces semantic differences to blkio.weight, which will be
>    - working by "bandwidth" for buffered writes
>    - working by "device time" for direct IO

There is a chance that this semantic difference might confuse users.

> (*) Maybe not a big concern, since the bursties are limited to 500ms: if one dd
> is throttled to 50% disk bandwidth, the flusher thread will be waking up on
> every 1 second, keep the disk busy for 500ms and then go idle for 500ms; if
> throttled to 10% disk bandwidth, the flusher thread will wake up on every 5s,
> keep busy for 500ms and stay idle for 4.5s.
> 
> The test results included in the last patch look pretty good in despite of the
> simple implementation.
> 
>  [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
>  [PATCH 2/6] blk-cgroup: account dirtied pages
>  [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
>  [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
>  [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
>  [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
> 

How about a BOF on this topic during LSF/MM as there seems to be enough
interest?


Thanks
Suresh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-04-01  4:16 ` Suresh Jayaraman
@ 2012-04-01  8:30   ` Fengguang Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-04-01  8:30 UTC (permalink / raw)
  To: Suresh Jayaraman
  Cc: Linux Memory Management List, Vivek Goyal, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

On Sun, Apr 01, 2012 at 09:46:06AM +0530, Suresh Jayaraman wrote:
> On 03/28/2012 05:43 PM, Fengguang Wu wrote:
> > Here is one possible solution to "buffered write IO controller", based on Linux
> > v3.3
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> > 
> 
> The implementation looks unbelievably simple. I ran a few tests
> (throttling) and I found it working well generally.

Thanks for test it out :)

> > Features:
> > - support blkio.weight
> > - support blkio.throttle.buffered_write_bps
> > 
> > Possibilities:
> > - it's trivial to support per-bdi .weight or .buffered_write_bps
> > 
> > Pros:
> > 1) simple
> > 2) virtually no space/time overheads
> > 3) independent of the block layer and IO schedulers, hence
> > 3.1) supports all filesystems/storages, eg. NFS/pNFS, CIFS, sshfs, ...
> > 3.2) supports all IO schedulers. One may use noop for SSDs, inside virtual machines, over iSCSI, etc.
> > 
> > Cons:
> > 1) don't try to smooth bursty IO submission in the flusher thread (*)
> > 2) don't support IOPS based throttling
> > 3) introduces semantic differences to blkio.weight, which will be
> >    - working by "bandwidth" for buffered writes
> >    - working by "device time" for direct IO
> 
> There is a chance that this semantic difference might confuse users.

Yeah.

> > (*) Maybe not a big concern, since the bursties are limited to 500ms: if one dd
> > is throttled to 50% disk bandwidth, the flusher thread will be waking up on
> > every 1 second, keep the disk busy for 500ms and then go idle for 500ms; if
> > throttled to 10% disk bandwidth, the flusher thread will wake up on every 5s,
> > keep busy for 500ms and stay idle for 4.5s.
> > 
> > The test results included in the last patch look pretty good in despite of the
> > simple implementation.
> > 
> >  [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h
> >  [PATCH 2/6] blk-cgroup: account dirtied pages
> >  [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight
> >  [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit
> >  [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface
> >  [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace
> > 
> 
> How about a BOF on this topic during LSF/MM as there seems to be enough
> interest?

Sure. I'll talk briefly about the block IO cgroup in the writeback
session. I'm open to more oriented technical discussions in some later
time if necessary.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
                   ` (8 preceding siblings ...)
  2012-04-01  4:16 ` Suresh Jayaraman
@ 2012-04-01 20:56 ` Vivek Goyal
  2012-04-03  8:00   ` Fengguang Wu
  9 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2012-04-01 20:56 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Linux Memory Management List, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML

On Wed, Mar 28, 2012 at 08:13:08PM +0800, Fengguang Wu wrote:
> 
> Here is one possible solution to "buffered write IO controller", based on Linux
> v3.3
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> 
> Features:
> - support blkio.weight
> - support blkio.throttle.buffered_write_bps

Introducing separate knob for buffered write makes sense. It is different
throttling done at block layer.

> 
> Possibilities:
> - it's trivial to support per-bdi .weight or .buffered_write_bps
> 
> Pros:
> 1) simple
> 2) virtually no space/time overheads
> 3) independent of the block layer and IO schedulers, hence
> 3.1) supports all filesystems/storages, eg. NFS/pNFS, CIFS, sshfs, ...
> 3.2) supports all IO schedulers. One may use noop for SSDs, inside virtual machines, over iSCSI, etc.
> 
> Cons:
> 1) don't try to smooth bursty IO submission in the flusher thread (*)

Yes, this is a core limitation of throttling while writing to cache. I think
once we had agreed that IO scheduler in general should be able to handle
burstiness caused by WRITES. CFQ does it well. deadline not so much.

> 2) don't support IOPS based throttling

If need be then you can still support it. Isn't it? Just that it will
require more code in buffered write controller to keep track of number
of operations per second and throttle task if IOPS limit is crossed. So
it does not sound like a limitation of design but just limitation of
current set of patches?

> 3) introduces semantic differences to blkio.weight, which will be
>    - working by "bandwidth" for buffered writes
>    - working by "device time" for direct IO

I think blkio.weight can be thought of a system wide weight of a cgroup
and more than one entity/subsystem should be able to make use of it and
differentiate between IO in its own way. CFQ can decide to do proportional
time division, and buffered write controller should be able to use the
same weight and do write bandwidth differentiation. I think it is better
than introducing another buffered write controller tunable for weight.

Personally, I am not too worried about this point. We can document and
explain it well.


> 
> (*) Maybe not a big concern, since the bursties are limited to 500ms: if one dd
> is throttled to 50% disk bandwidth, the flusher thread will be waking up on
> every 1 second, keep the disk busy for 500ms and then go idle for 500ms; if
> throttled to 10% disk bandwidth, the flusher thread will wake up on every 5s,
> keep busy for 500ms and stay idle for 4.5s.
> 
> The test results included in the last patch look pretty good in despite of the
> simple implementation.

Can you give more details about test results. Did you test throttling or you
tested write speed differentation based on weight too.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-04-01 20:56 ` Vivek Goyal
@ 2012-04-03  8:00   ` Fengguang Wu
  2012-04-03 14:53     ` Vivek Goyal
  0 siblings, 1 reply; 18+ messages in thread
From: Fengguang Wu @ 2012-04-03  8:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Linux Memory Management List, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML, Tejun Heo, Jan Kara,
	KAMEZAWA Hiroyuki, Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 16571 bytes --]

Hi Vivek,

On Sun, Apr 01, 2012 at 04:56:47PM -0400, Vivek Goyal wrote:
> On Wed, Mar 28, 2012 at 08:13:08PM +0800, Fengguang Wu wrote:
> > 
> > Here is one possible solution to "buffered write IO controller", based on Linux
> > v3.3
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux.git  buffered-write-io-controller
> > 
> > Features:
> > - support blkio.weight
> > - support blkio.throttle.buffered_write_bps
> 
> Introducing separate knob for buffered write makes sense. It is different
> throttling done at block layer.

Yeah thanks.

> > Possibilities:
> > - it's trivial to support per-bdi .weight or .buffered_write_bps
> > 
> > Pros:
> > 1) simple
> > 2) virtually no space/time overheads
> > 3) independent of the block layer and IO schedulers, hence
> > 3.1) supports all filesystems/storages, eg. NFS/pNFS, CIFS, sshfs, ...
> > 3.2) supports all IO schedulers. One may use noop for SSDs, inside virtual machines, over iSCSI, etc.
> > 
> > Cons:
> > 1) don't try to smooth bursty IO submission in the flusher thread (*)
> 
> Yes, this is a core limitation of throttling while writing to cache. I think
> once we had agreed that IO scheduler in general should be able to handle
> burstiness caused by WRITES. CFQ does it well. deadline not so much.

Yes I still remember that. It's better for the general kernel to
handle bursty writes just right, rather than to rely on IO controllers
for good interactive read performance.

> > 2) don't support IOPS based throttling
> 
> If need be then you can still support it. Isn't it? Just that it will
> require more code in buffered write controller to keep track of number
> of operations per second and throttle task if IOPS limit is crossed. So
> it does not sound like a limitation of design but just limitation of
> current set of patches?

Sure. By adding some IOPS or "disk time" accounting, more IO metrics
can be supported.

> > 3) introduces semantic differences to blkio.weight, which will be
> >    - working by "bandwidth" for buffered writes
> >    - working by "device time" for direct IO
> 
> I think blkio.weight can be thought of a system wide weight of a cgroup
> and more than one entity/subsystem should be able to make use of it and
> differentiate between IO in its own way. CFQ can decide to do proportional
> time division, and buffered write controller should be able to use the
> same weight and do write bandwidth differentiation. I think it is better
> than introducing another buffered write controller tunable for weight.
> 
> Personally, I am not too worried about this point. We can document and
> explain it well.

Agreed. The throttling may work in *either* bps, IOPS or disk time
modes. In each mode blkio.weight is naturally tied to the
corresponding IO metrics.

> > (*) Maybe not a big concern, since the bursties are limited to 500ms: if one dd
> > is throttled to 50% disk bandwidth, the flusher thread will be waking up on
> > every 1 second, keep the disk busy for 500ms and then go idle for 500ms; if
> > throttled to 10% disk bandwidth, the flusher thread will wake up on every 5s,
> > keep busy for 500ms and stay idle for 4.5s.
> > 
> > The test results included in the last patch look pretty good in despite of the
> > simple implementation.
> 
> Can you give more details about test results. Did you test throttling or you
> tested write speed differentation based on weight too.

Patch 6/6 shows simple test results for bps based throttling.

Since then I've improved the patches to work in a more "contained" way
when blkio.throttle.buffered_write_bps is not set.

The old behavior is, if blkcg A contains 1 dd and blkcg B contains 10
dd tasks and they have equal weight, B will get 10 times bandwidth
than A.

With the below updated core bits, A and B will get equal share of
write bandwidth. The basic idea is to use

        bdi->dirty_ratelimit * blkio.weight

as the throttling bps value if blkio.throttle.buffered_write_bps
is not specified by the user.

Test results are "pretty good looking" :-) The attached graphs
illustrates nice attributes of accuracy, fairness and smoothness
for the following tests.

- bps throttling (1 cp + 2 dd, throttled to 4MB/s and 2MB/s)

        mount /dev/sda5 /fs

        echo > /debug/tracing/trace
        echo 1 > /debug/tracing/events/writeback/balance_dirty_pages/enable
        echo 1 > /debug/tracing/events/writeback/bdi_dirty_ratelimit/enable
        echo 1 > /debug/tracing/events/writeback/task_io/enable

        cat /debug/tracing/trace_pipe | bzip2 > trace.bz2 &

        rmdir /cgroup/cp
        mkdir /cgroup/cp
        echo $$ > /cgroup/cp/tasks
        echo $((4<<20)) > /cgroup/cp/blkio.throttle.buffered_write_bps

        cp /dev/zero /fs/zero &

        echo $$ > /cgroup/tasks

        if true; then
        rmdir /cgroup/dd
        mkdir /cgroup/dd
        echo $$ > /cgroup/dd/tasks
        echo $((2<<20)) > /cgroup/dd/blkio.throttle.buffered_write_bps

        dd if=/dev/zero of=/fs/zero1 bs=64k &
        dd if=/dev/zero of=/fs/zero2 bs=64k &

        fi

        echo $$ > /cgroup/tasks

        sleep 100
        killall dd
        killall cp
        killall cat

- bps proportional (1 cp + 2 dd, with equal weight)

        mount /dev/sda5 /fs

        echo > /debug/tracing/trace
        echo 1 > /debug/tracing/events/writeback/balance_dirty_pages/enable
        echo 1 > /debug/tracing/events/writeback/bdi_dirty_ratelimit/enable
        echo 1 > /debug/tracing/events/writeback/task_io/enable

        cat /debug/tracing/trace_pipe | bzip2 > trace.bz2 &

        rmdir /cgroup/cp
        mkdir /cgroup/cp
        echo $$ > /cgroup/cp/tasks

        cp /dev/zero /fs/zero &

        rmdir /cgroup/dd
        mkdir /cgroup/dd
        echo $$ > /cgroup/dd/tasks

        dd if=/dev/zero of=/fs/zero1 bs=64k &
        dd if=/dev/zero of=/fs/zero2 bs=64k &

        echo $$ > /cgroup/tasks

        sleep 100
        killall dd
        killall cp
        killall cat

- bps proportional (1 cp + 2 dd, with weights 500 and 1000)

Thanks,
Fengguang
---

PS. the new core changes to the dirty throttling code, supporting two
major block IO controller features with only 74 lines of new code.

It asks for more comments and cleanups. So please don't look at it
carefully. It refactors the code to share the blkcg dirty_ratelimit
update code with the existing bdi_update_dirty_ratelimit(), however it
turns out that not many lines are actually shared. So I might revert
to standalone blkcg_update_dirty_ratelimit() scheme in the next post.

 mm/page-writeback.c |  146 +++++++++++++++++++++++++++++++-----------
 1 file changed, 110 insertions(+), 36 deletions(-)

--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -34,6 +34,7 @@
 #include <linux/syscalls.h>
 #include <linux/buffer_head.h> /* __set_page_dirty_buffers */
 #include <linux/pagevec.h>
+#include <linux/blk-cgroup.h>
 #include <trace/events/writeback.h>
 
 /*
@@ -836,35 +837,28 @@ static void global_update_bandwidth(unsigned long thresh,
  * Normal bdi tasks will be curbed at or below it in long term.
  * Obviously it should be around (write_bw / N) when there are N dd tasks.
  */
-static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
+static void bdi_update_dirty_ratelimit(unsigned int blkcg_id,
+				       struct backing_dev_info *bdi,
+				       unsigned long *pdirty_ratelimit,
+				       unsigned long pos_ratio,
+				       unsigned long write_bw,
 				       unsigned long thresh,
 				       unsigned long bg_thresh,
 				       unsigned long dirty,
-				       unsigned long bdi_thresh,
-				       unsigned long bdi_dirty,
-				       unsigned long dirtied,
-				       unsigned long elapsed)
+				       unsigned long dirty_rate)
 {
 	unsigned long freerun = dirty_freerun_ceiling(thresh, bg_thresh);
 	unsigned long limit = hard_dirty_limit(thresh);
 	unsigned long setpoint = (freerun + limit) / 2;
-	unsigned long write_bw = bdi->avg_write_bandwidth;
-	unsigned long dirty_ratelimit = bdi->dirty_ratelimit;
-	unsigned long dirty_rate;
+	unsigned long dirty_ratelimit = *pdirty_ratelimit;
 	unsigned long task_ratelimit;
 	unsigned long balanced_dirty_ratelimit;
-	unsigned long pos_ratio;
 	unsigned long step;
 	unsigned long x;
 
-	/*
-	 * The dirty rate will match the writeout rate in long term, except
-	 * when dirty pages are truncated by userspace or re-dirtied by FS.
-	 */
-	dirty_rate = (dirtied - bdi->dirtied_stamp) * HZ / elapsed;
+	if (!blkcg_id && dirty < freerun)
+		return;
 
-	pos_ratio = bdi_position_ratio(bdi, thresh, bg_thresh, dirty,
-				       bdi_thresh, bdi_dirty);
 	/*
 	 * task_ratelimit reflects each dd's dirty rate for the past 200ms.
 	 */
@@ -904,11 +898,6 @@ static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
 	 */
 	balanced_dirty_ratelimit = div_u64((u64)task_ratelimit * write_bw,
 					   dirty_rate | 1);
-	/*
-	 * balanced_dirty_ratelimit ~= (write_bw / N) <= write_bw
-	 */
-	if (unlikely(balanced_dirty_ratelimit > write_bw))
-		balanced_dirty_ratelimit = write_bw;
 
 	/*
 	 * We could safely do this and return immediately:
@@ -927,6 +916,11 @@ static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
 	 * which reflects the direction and size of dirty position error.
 	 */
 
+	if (blkcg_id) {
+		dirty_ratelimit = (dirty_ratelimit + balanced_dirty_ratelimit) / 2;
+		goto out;
+	}
+
 	/*
 	 * dirty_ratelimit will follow balanced_dirty_ratelimit iff
 	 * task_ratelimit is on the same side of dirty_ratelimit, too.
@@ -946,13 +940,11 @@ static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
 	 */
 	step = 0;
 	if (dirty < setpoint) {
-		x = min(bdi->balanced_dirty_ratelimit,
-			 min(balanced_dirty_ratelimit, task_ratelimit));
+		x = min(balanced_dirty_ratelimit, task_ratelimit);
 		if (dirty_ratelimit < x)
 			step = x - dirty_ratelimit;
 	} else {
-		x = max(bdi->balanced_dirty_ratelimit,
-			 max(balanced_dirty_ratelimit, task_ratelimit));
+		x = max(balanced_dirty_ratelimit, task_ratelimit);
 		if (dirty_ratelimit > x)
 			step = dirty_ratelimit - x;
 	}
@@ -973,10 +965,12 @@ static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
 	else
 		dirty_ratelimit -= step;
 
-	bdi->dirty_ratelimit = max(dirty_ratelimit, 1UL);
-	bdi->balanced_dirty_ratelimit = balanced_dirty_ratelimit;
+out:
+	*pdirty_ratelimit = max(dirty_ratelimit, 1UL);
 
-	trace_bdi_dirty_ratelimit(bdi, dirty_rate, task_ratelimit);
+	trace_bdi_dirty_ratelimit(bdi, write_bw, dirty_rate, dirty_ratelimit,
+				  task_ratelimit, balanced_dirty_ratelimit,
+				  blkcg_id);
 }
 
 void __bdi_update_bandwidth(struct backing_dev_info *bdi,
@@ -985,12 +979,14 @@ void __bdi_update_bandwidth(struct backing_dev_info *bdi,
 			    unsigned long dirty,
 			    unsigned long bdi_thresh,
 			    unsigned long bdi_dirty,
+			    unsigned long pos_ratio,
 			    unsigned long start_time)
 {
 	unsigned long now = jiffies;
 	unsigned long elapsed = now - bdi->bw_time_stamp;
 	unsigned long dirtied;
 	unsigned long written;
+	unsigned long dirty_rate;
 
 	/*
 	 * rate-limit, only update once every 200ms.
@@ -1010,9 +1006,18 @@ void __bdi_update_bandwidth(struct backing_dev_info *bdi,
 
 	if (thresh) {
 		global_update_bandwidth(thresh, dirty, now);
-		bdi_update_dirty_ratelimit(bdi, thresh, bg_thresh, dirty,
-					   bdi_thresh, bdi_dirty,
-					   dirtied, elapsed);
+		/*
+		 * The dirty rate will match the writeout rate in long term,
+		 * except when dirty pages are truncated by userspace or
+		 * re-dirtied by FS.
+		 */
+		dirty_rate = (dirtied - bdi->dirtied_stamp) * HZ / elapsed;
+		bdi_update_dirty_ratelimit(0, bdi,
+					   &bdi->dirty_ratelimit,
+					   pos_ratio,
+					   bdi->avg_write_bandwidth,
+					   thresh, bg_thresh, dirty,
+					   dirty_rate);
 	}
 	bdi_update_write_bandwidth(bdi, elapsed, written);
 
@@ -1028,13 +1033,14 @@ static void bdi_update_bandwidth(struct backing_dev_info *bdi,
 				 unsigned long dirty,
 				 unsigned long bdi_thresh,
 				 unsigned long bdi_dirty,
+				 unsigned long pos_ratio,
 				 unsigned long start_time)
 {
 	if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
 		return;
 	spin_lock(&bdi->wb.list_lock);
 	__bdi_update_bandwidth(bdi, thresh, bg_thresh, dirty,
-			       bdi_thresh, bdi_dirty, start_time);
+			       bdi_thresh, bdi_dirty, pos_ratio, start_time);
 	spin_unlock(&bdi->wb.list_lock);
 }
 
@@ -1149,6 +1155,51 @@ static long bdi_min_pause(struct backing_dev_info *bdi,
 	return pages >= DIRTY_POLL_THRESH ? 1 + t / 2 : t;
 }
 
+static void blkcg_update_bandwidth(struct blkio_cgroup *blkcg,
+				   struct backing_dev_info *bdi,
+				   unsigned long pos_ratio)
+{
+#ifdef CONFIG_BLK_CGROUP
+	unsigned long now = jiffies;
+	unsigned long dirtied;
+	unsigned long elapsed;
+	unsigned long dirty_rate;
+	unsigned long bps = blkcg_buffered_write_bps(blkcg) >>
+							PAGE_CACHE_SHIFT;
+
+	if (!blkcg)
+		return;
+	if (!spin_trylock(&blkcg->lock))
+		return;
+
+	elapsed = now - blkcg->bw_time_stamp;
+	if (elapsed <= MAX_PAUSE)
+		goto unlock;
+
+	dirtied = percpu_counter_read(&blkcg->nr_dirtied);
+
+	if (elapsed > MAX_PAUSE * 2)
+		goto snapshot;
+
+	if (!bps)
+		bps = (u64)bdi->dirty_ratelimit * blkcg_weight(blkcg) /
+							BLKIO_WEIGHT_DEFAULT;
+	else
+		pos_ratio = 1 << RATELIMIT_CALC_SHIFT;
+
+	dirty_rate = (dirtied - blkcg->dirtied_stamp) * HZ / elapsed;
+	blkcg->dirty_rate = (blkcg->dirty_rate * 7 + dirty_rate) / 8;
+	bdi_update_dirty_ratelimit(1, bdi, &blkcg->dirty_ratelimit,
+				   pos_ratio, bps, 0, 0, 0,
+				   blkcg->dirty_rate);
+snapshot:
+	blkcg->dirtied_stamp = dirtied;
+	blkcg->bw_time_stamp = now;
+unlock:
+	spin_unlock(&blkcg->lock);
+#endif
+}
+
 /*
  * balance_dirty_pages() must be called by processes which are generating dirty
  * data.  It looks at the number of dirty pages in the machine and will force
@@ -1178,6 +1229,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 	unsigned long pos_ratio;
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 	unsigned long start_time = jiffies;
+	struct blkio_cgroup *blkcg = task_blkio_cgroup(current);
 
 	for (;;) {
 		unsigned long now = jiffies;
@@ -1202,6 +1254,8 @@ static void balance_dirty_pages(struct address_space *mapping,
 		freerun = dirty_freerun_ceiling(dirty_thresh,
 						background_thresh);
 		if (nr_dirty <= freerun) {
+			if (blkcg && blkcg_buffered_write_bps(blkcg))
+				goto always_throttle;
 			current->dirty_paused_when = now;
 			current->nr_dirtied = 0;
 			current->nr_dirtied_pause =
@@ -1212,6 +1266,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 		if (unlikely(!writeback_in_progress(bdi)))
 			bdi_start_background_writeback(bdi);
 
+always_throttle:
 		/*
 		 * bdi_thresh is not treated as some limiting factor as
 		 * dirty_thresh, due to reasons
@@ -1252,16 +1307,30 @@ static void balance_dirty_pages(struct address_space *mapping,
 		if (dirty_exceeded && !bdi->dirty_exceeded)
 			bdi->dirty_exceeded = 1;
 
+		pos_ratio = bdi_position_ratio(bdi, dirty_thresh,
+					       background_thresh, nr_dirty,
+					       bdi_thresh, bdi_dirty);
+
 		bdi_update_bandwidth(bdi, dirty_thresh, background_thresh,
 				     nr_dirty, bdi_thresh, bdi_dirty,
-				     start_time);
+				     pos_ratio, start_time);
 
 		dirty_ratelimit = bdi->dirty_ratelimit;
-		pos_ratio = bdi_position_ratio(bdi, dirty_thresh,
-					       background_thresh, nr_dirty,
-					       bdi_thresh, bdi_dirty);
+		if (blkcg) {
+			blkcg_update_bandwidth(blkcg, bdi, pos_ratio);
+			if (!blkcg_buffered_write_bps(blkcg))
+				dirty_ratelimit = blkcg_dirty_ratelimit(blkcg);
+		}
+
 		task_ratelimit = ((u64)dirty_ratelimit * pos_ratio) >>
 							RATELIMIT_CALC_SHIFT;
+
+		if (blkcg && blkcg_buffered_write_bps(blkcg) &&
+		    task_ratelimit > blkcg_dirty_ratelimit(blkcg)) {
+			task_ratelimit = blkcg_dirty_ratelimit(blkcg);
+			dirty_ratelimit = task_ratelimit;
+		}
+
 		max_pause = bdi_max_pause(bdi, bdi_dirty);
 		min_pause = bdi_min_pause(bdi, max_pause,
 					  task_ratelimit, dirty_ratelimit,
@@ -1933,6 +2002,11 @@ int __set_page_dirty_no_writeback(struct page *page)
 void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
 	if (mapping_cap_account_dirty(mapping)) {
+#ifdef CONFIG_BLK_DEV_THROTTLING
+		struct blkio_cgroup *blkcg = task_blkio_cgroup(current);
+		if (blkcg)
+			__percpu_counter_add(&blkcg->nr_dirtied, 1, BDI_STAT_BATCH);
+#endif
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
 		__inc_zone_page_state(page, NR_DIRTIED);
 		__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);

[-- Attachment #2: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 47920 bytes --]

[-- Attachment #3: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 61408 bytes --]

[-- Attachment #4: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 58603 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-04-03  8:00   ` Fengguang Wu
@ 2012-04-03 14:53     ` Vivek Goyal
  2012-04-03 23:32       ` Fengguang Wu
  0 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2012-04-03 14:53 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Linux Memory Management List, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML, Tejun Heo, Jan Kara,
	KAMEZAWA Hiroyuki, Linux NFS Mailing List, Jens Axboe

On Tue, Apr 03, 2012 at 01:00:14AM -0700, Fengguang Wu wrote:

[CC Jens]

[..]
> > I think blkio.weight can be thought of a system wide weight of a cgroup
> > and more than one entity/subsystem should be able to make use of it and
> > differentiate between IO in its own way. CFQ can decide to do proportional
> > time division, and buffered write controller should be able to use the
> > same weight and do write bandwidth differentiation. I think it is better
> > than introducing another buffered write controller tunable for weight.
> > 
> > Personally, I am not too worried about this point. We can document and
> > explain it well.
> 
> Agreed. The throttling may work in *either* bps, IOPS or disk time
> modes. In each mode blkio.weight is naturally tied to the
> corresponding IO metrics.

Well, Tejun does not like the idea of sharing config variables among
different policies. So I guess you shall have to come up with your
own configurations variables as desired. As each policy will have its
own configuration and stats, prefixing the vairable/stat name with
policy name will help identify it. Not sure what's a good name for
buffered write policy.

May be

blkio.dirty.weight
blkio.dirty.bps
blkio.buffered_write.* or
blkio.buf_write* or
blkio.dirty_rate.* or

[..]
> 
> Patch 6/6 shows simple test results for bps based throttling.
> 
> Since then I've improved the patches to work in a more "contained" way
> when blkio.throttle.buffered_write_bps is not set.
> 
> The old behavior is, if blkcg A contains 1 dd and blkcg B contains 10
> dd tasks and they have equal weight, B will get 10 times bandwidth
> than A.
> 
> With the below updated core bits, A and B will get equal share of
> write bandwidth. The basic idea is to use

Yes, this new behavior makes more sense. Two equal weight groups get
equal bandwidth irrpesctive of number of tasks in cgroup.

[..]
> Test results are "pretty good looking" :-) The attached graphs
> illustrates nice attributes of accuracy, fairness and smoothness
> for the following tests.

Indeed. These results are pretty cool. It is hard to belive that lines
are so smooth and lines for two tasks are overlapping each other such 
that it is not obivious initially that they are overlapping and dirtying
equal amount of memory. I had to take a second look to figure that out.

Just that results for third graph (weight 500 and 1000 respectively) are
not perfect. I think Ideally all the 3 tasks should have dirtied same
amount of memory. But I think achieving perfection here might not be
easy and may be not many people will care.

Given the fact that you are doing a reasonable job of providing service
differentiation between buffered writers, I am wondering if you should
look at the ioprio of writers with-in cgroup and provide service
differentiation among those too. CFQ has separate queues but it loses
the context information by the time IO is submitted. So you might be
able to do a much better job. Anyway, this is a possible future
enhancement and not necessarily related to this patchset.

Also, we are controlling the rate of dirtying the memory. I am again 
wondering whether these configuration knobs should be part of memory
controller and not block controller. Think of NFS case. There is no
block device or block layer involved but we will control the rate of
dirtying memory. So some control in memory controller might make
sense. And following kind of knobs might make sense there.

memcg.dirty_weight or memcg.dirty.weight
memcg.dirty_bps or memcg.dirty.write_bps

Just that we control not the *absolute amount* of memory but *rate* of
writing to memory and I think that makes it somewhat confusing and
gives the impression that it should be part of block IO controller.

I am kind of split on this (rather little inclined towards memory
controller), so I am raising the question and others can weigh in with
their thoughts on what makes more sense here.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()
  2012-04-03 14:53     ` Vivek Goyal
@ 2012-04-03 23:32       ` Fengguang Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2012-04-03 23:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Linux Memory Management List, Suresh Jayaraman, Andrea Righi,
	Jeff Moyer, linux-fsdevel, LKML, Tejun Heo, Jan Kara,
	KAMEZAWA Hiroyuki, Linux NFS Mailing List, Jens Axboe

On Tue, Apr 03, 2012 at 10:53:01AM -0400, Vivek Goyal wrote:
> On Tue, Apr 03, 2012 at 01:00:14AM -0700, Fengguang Wu wrote:
> 
> [CC Jens]
> 
> [..]
> > > I think blkio.weight can be thought of a system wide weight of a cgroup
> > > and more than one entity/subsystem should be able to make use of it and
> > > differentiate between IO in its own way. CFQ can decide to do proportional
> > > time division, and buffered write controller should be able to use the
> > > same weight and do write bandwidth differentiation. I think it is better
> > > than introducing another buffered write controller tunable for weight.
> > > 
> > > Personally, I am not too worried about this point. We can document and
> > > explain it well.
> > 
> > Agreed. The throttling may work in *either* bps, IOPS or disk time
> > modes. In each mode blkio.weight is naturally tied to the
> > corresponding IO metrics.
> 
> Well, Tejun does not like the idea of sharing config variables among
> different policies. So I guess you shall have to come up with your
> own configurations variables as desired. As each policy will have its
> own configuration and stats, prefixing the vairable/stat name with
> policy name will help identify it. Not sure what's a good name for
> buffered write policy.
> 
> May be
> 
> blkio.dirty.weight
> blkio.dirty.bps
> blkio.buffered_write.* or
> blkio.buf_write* or
> blkio.dirty_rate.* or

OK. dirty.* or buffered_write.*, whatever looks more user friendly will be fine.

> [..]
> > 
> > Patch 6/6 shows simple test results for bps based throttling.
> > 
> > Since then I've improved the patches to work in a more "contained" way
> > when blkio.throttle.buffered_write_bps is not set.
> > 
> > The old behavior is, if blkcg A contains 1 dd and blkcg B contains 10
> > dd tasks and they have equal weight, B will get 10 times bandwidth
> > than A.
> > 
> > With the below updated core bits, A and B will get equal share of
> > write bandwidth. The basic idea is to use
> 
> Yes, this new behavior makes more sense. Two equal weight groups get
> equal bandwidth irrpesctive of number of tasks in cgroup.

Yeah, Andrew Morton reminded me of this during the writeback talk in
google :) Fortunately the current dirty throttling algorithm can
handle it easily. What's more, hierarchical cgroups can be supported
by simply using the parent's blkcg->dirty_ratelimit as the throttling
bps for the child.

> [..]
> > Test results are "pretty good looking" :-) The attached graphs
> > illustrates nice attributes of accuracy, fairness and smoothness
> > for the following tests.
> 
> Indeed. These results are pretty cool. It is hard to belive that lines
> are so smooth and lines for two tasks are overlapping each other such 
> that it is not obivious initially that they are overlapping and dirtying
> equal amount of memory. I had to take a second look to figure that out.

Thanks for noticing this! :)

> Just that results for third graph (weight 500 and 1000 respectively) are
> not perfect. I think Ideally all the 3 tasks should have dirtied same
> amount of memory.

Yeah, but note that it's not the fault of the throttling algorithm.

The unfairness is created at the very beginning ~0.1s, where dirty
pages are far under the dirty limits and the dd tasks are not
throttled at all. Since the first task manages to start 0.1s earlier
than the other two tasks, it manages to dirty at full (memory write)
speed which makes the gap.

Once the dirty throttling mechanism comes into play, you can see that
the lines for the three tasks grow fairly at the same speed/slope.

> But I think achieving perfection here might not be easy and may be
> not many people will care.

The formula itself looks simple, however it does ask for some
debugging/tuning efforts to make it behave well under various
situations.

> Given the fact that you are doing a reasonable job of providing service
> differentiation between buffered writers, I am wondering if you should
> look at the ioprio of writers with-in cgroup and provide service
> differentiation among those too. CFQ has separate queues but it loses
> the context information by the time IO is submitted. So you might be
> able to do a much better job. Anyway, this is a possible future
> enhancement and not necessarily related to this patchset.

Good point. It seems applicable to the general dirty throttling
(not relying on cgroups). It would mainly be a problem of how to map
the priority classes/values to each tasks' throttling weight (or bps).

> Also, we are controlling the rate of dirtying the memory. I am again 
> wondering whether these configuration knobs should be part of memory
> controller and not block controller. Think of NFS case. There is no
> block device or block layer involved but we will control the rate of
> dirtying memory. So some control in memory controller might make
> sense. And following kind of knobs might make sense there.
> 
> memcg.dirty_weight or memcg.dirty.weight
> memcg.dirty_bps or memcg.dirty.write_bps
> 
> Just that we control not the *absolute amount* of memory but *rate* of
> writing to memory and I think that makes it somewhat confusing and
> gives the impression that it should be part of block IO controller.

There is the future prospective of "buffered+direct write bps" interface.
Considering this, I'm a little inclined towards the blkio.* interfaces,
in despite of the fact that it's currently tightly tied to the block layer :)

> I am kind of split on this (rather little inclined towards memory
> controller), so I am raising the question and others can weigh in with
> their thoughts on what makes more sense here.

Yeah, we definitely need more inputs on the "interface" stuff.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-04-03 23:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28 12:13 [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Fengguang Wu
2012-03-28 12:13 ` [PATCH 1/6] blk-cgroup: move blk-cgroup.h in include/linux/blk-cgroup.h Fengguang Wu
2012-03-28 12:13 ` [PATCH 2/6] blk-cgroup: account dirtied pages Fengguang Wu
2012-03-28 12:13 ` [PATCH 3/6] blk-cgroup: buffered write IO controller - bandwidth weight Fengguang Wu
2012-03-28 12:13 ` [PATCH 4/6] blk-cgroup: buffered write IO controller - bandwidth limit Fengguang Wu
2012-03-28 12:13 ` [PATCH 5/6] blk-cgroup: buffered write IO controller - bandwidth limit interface Fengguang Wu
2012-03-28 12:13 ` [PATCH 6/6] blk-cgroup: buffered write IO controller - debug trace Fengguang Wu
2012-03-28 21:10 ` [PATCH 0/6] buffered write IO controller in balance_dirty_pages() Vivek Goyal
2012-03-28 22:35   ` Fengguang Wu
2012-03-29  2:48   ` Suresh Jayaraman
2012-03-29  0:34 ` KAMEZAWA Hiroyuki
2012-03-29  1:22   ` Fengguang Wu
2012-04-01  4:16 ` Suresh Jayaraman
2012-04-01  8:30   ` Fengguang Wu
2012-04-01 20:56 ` Vivek Goyal
2012-04-03  8:00   ` Fengguang Wu
2012-04-03 14:53     ` Vivek Goyal
2012-04-03 23:32       ` Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).