* [patch v2 1/8]block: fiops ioscheduler core
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 7:02 ` [patch v2 2/8]block: fiops read/write request scale Shaohua Li
` (8 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-iosch.patch --]
[-- Type: text/plain, Size: 16453 bytes --]
FIOPS (Fair IOPS) ioscheduler is IOPS based ioscheduler, so only targets
for drive without I/O seek. It's quite similar like CFQ, but the dispatch
decision is made according to IOPS instead of slice.
The algorithm is simple. Drive has a service tree, and each task lives in
the tree. The key into the tree is called vios (virtual I/O). Every request
has vios, which is calculated according to its ioprio, request size and so
on. Task's vios is the sum of vios of all requests it dispatches. FIOPS
always selects task with minimum vios in the service tree and let the task
dispatch request. The dispatched request's vios is then added to the task's
vios and the task is repositioned in the sevice tree.
Unlike CFQ, FIOPS doesn't have separate sync/async queues, because with I/O
less writeback, usually a task can only dispatch either sync or async requests.
Bias read or write request can still be done with read/write scale.
One issue is if workload iodepth is lower than drive queue_depth, IOPS
share of a task might not be strictly according to its priority, request
size and so on. In this case, the drive is in idle actually. Solving the
problem need make drive idle, so impact performance. I believe CFQ isn't
completely fair between tasks in such case too.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/Kconfig.iosched | 12 +
block/Makefile | 1
block/fiops-iosched.c | 556 ++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 569 insertions(+)
Index: linux/block/Makefile
===================================================================
--- linux.orig/block/Makefile 2012-01-30 13:48:43.000000000 +0800
+++ linux/block/Makefile 2012-01-30 14:34:27.000000000 +0800
@@ -15,6 +15,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-
obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
+obj-$(CONFIG_IOSCHED_FIOPS) += fiops-iosched.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += blk-integrity.o
Index: linux/block/fiops-iosched.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux/block/fiops-iosched.c 2012-01-30 14:34:27.000000000 +0800
@@ -0,0 +1,556 @@
+/*
+ * IOPS based IO scheduler. Based on CFQ.
+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
+ * Shaohua Li <shli@kernel.org>
+ */
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/blkdev.h>
+#include <linux/elevator.h>
+#include <linux/jiffies.h>
+#include <linux/rbtree.h>
+#include <linux/ioprio.h>
+#include "blk.h"
+
+#define VIOS_SCALE_SHIFT 10
+#define VIOS_SCALE (1 << VIOS_SCALE_SHIFT)
+
+struct fiops_rb_root {
+ struct rb_root rb;
+ struct rb_node *left;
+ unsigned count;
+
+ u64 min_vios;
+};
+#define FIOPS_RB_ROOT (struct fiops_rb_root) { .rb = RB_ROOT}
+
+struct fiops_data {
+ struct request_queue *queue;
+
+ struct fiops_rb_root service_tree;
+
+ unsigned int busy_queues;
+ unsigned int in_flight[2];
+
+ struct work_struct unplug_work;
+};
+
+struct fiops_ioc {
+ struct io_cq icq;
+
+ unsigned int flags;
+ struct fiops_data *fiopsd;
+ struct rb_node rb_node;
+ u64 vios; /* key in service_tree */
+ struct fiops_rb_root *service_tree;
+
+ unsigned int in_flight;
+
+ struct rb_root sort_list;
+ struct list_head fifo;
+
+ pid_t pid;
+};
+
+#define ioc_service_tree(ioc) (&((ioc)->fiopsd->service_tree))
+#define RQ_CIC(rq) icq_to_cic((rq)->elv.icq)
+
+enum ioc_state_flags {
+ FIOPS_IOC_FLAG_on_rr = 0, /* on round-robin busy list */
+};
+
+#define FIOPS_IOC_FNS(name) \
+static inline void fiops_mark_ioc_##name(struct fiops_ioc *ioc) \
+{ \
+ ioc->flags |= (1 << FIOPS_IOC_FLAG_##name); \
+} \
+static inline void fiops_clear_ioc_##name(struct fiops_ioc *ioc) \
+{ \
+ ioc->flags &= ~(1 << FIOPS_IOC_FLAG_##name); \
+} \
+static inline int fiops_ioc_##name(const struct fiops_ioc *ioc) \
+{ \
+ return ((ioc)->flags & (1 << FIOPS_IOC_FLAG_##name)) != 0; \
+}
+
+FIOPS_IOC_FNS(on_rr);
+#undef FIOPS_IOC_FNS
+
+static inline struct fiops_ioc *icq_to_cic(struct io_cq *icq)
+{
+ /* cic->icq is the first member, %NULL will convert to %NULL */
+ return container_of(icq, struct fiops_ioc, icq);
+}
+
+static inline struct fiops_ioc *fiops_cic_lookup(struct fiops_data *fiopsd,
+ struct io_context *ioc)
+{
+ if (ioc)
+ return icq_to_cic(ioc_lookup_icq(ioc, fiopsd->queue));
+ return NULL;
+}
+
+/*
+ * The below is leftmost cache rbtree addon
+ */
+static struct fiops_ioc *fiops_rb_first(struct fiops_rb_root *root)
+{
+ /* Service tree is empty */
+ if (!root->count)
+ return NULL;
+
+ if (!root->left)
+ root->left = rb_first(&root->rb);
+
+ if (root->left)
+ return rb_entry(root->left, struct fiops_ioc, rb_node);
+
+ return NULL;
+}
+
+static void rb_erase_init(struct rb_node *n, struct rb_root *root)
+{
+ rb_erase(n, root);
+ RB_CLEAR_NODE(n);
+}
+
+static void fiops_rb_erase(struct rb_node *n, struct fiops_rb_root *root)
+{
+ if (root->left == n)
+ root->left = NULL;
+ rb_erase_init(n, &root->rb);
+ --root->count;
+}
+
+static inline u64 max_vios(u64 min_vios, u64 vios)
+{
+ s64 delta = (s64)(vios - min_vios);
+ if (delta > 0)
+ min_vios = vios;
+
+ return min_vios;
+}
+
+static void fiops_update_min_vios(struct fiops_rb_root *service_tree)
+{
+ struct fiops_ioc *ioc;
+
+ ioc = fiops_rb_first(service_tree);
+ if (!ioc)
+ return;
+ service_tree->min_vios = max_vios(service_tree->min_vios, ioc->vios);
+}
+
+/*
+ * The fiopsd->service_trees holds all pending fiops_ioc's that have
+ * requests waiting to be processed. It is sorted in the order that
+ * we will service the queues.
+ */
+static void fiops_service_tree_add(struct fiops_data *fiopsd,
+ struct fiops_ioc *ioc)
+{
+ struct rb_node **p, *parent;
+ struct fiops_ioc *__ioc;
+ struct fiops_rb_root *service_tree = ioc_service_tree(ioc);
+ u64 vios;
+ int left;
+
+ /* New added IOC */
+ if (RB_EMPTY_NODE(&ioc->rb_node))
+ vios = max_vios(service_tree->min_vios, ioc->vios);
+ else {
+ vios = ioc->vios;
+ /* ioc->service_tree might not equal to service_tree */
+ fiops_rb_erase(&ioc->rb_node, ioc->service_tree);
+ ioc->service_tree = NULL;
+ }
+
+ left = 1;
+ parent = NULL;
+ ioc->service_tree = service_tree;
+ p = &service_tree->rb.rb_node;
+ while (*p) {
+ struct rb_node **n;
+
+ parent = *p;
+ __ioc = rb_entry(parent, struct fiops_ioc, rb_node);
+
+ /*
+ * sort by key, that represents service time.
+ */
+ if (vios < __ioc->vios)
+ n = &(*p)->rb_left;
+ else {
+ n = &(*p)->rb_right;
+ left = 0;
+ }
+
+ p = n;
+ }
+
+ if (left)
+ service_tree->left = &ioc->rb_node;
+
+ ioc->vios = vios;
+ rb_link_node(&ioc->rb_node, parent, p);
+ rb_insert_color(&ioc->rb_node, &service_tree->rb);
+ service_tree->count++;
+
+ fiops_update_min_vios(service_tree);
+}
+
+/*
+ * Update ioc's position in the service tree.
+ */
+static void fiops_resort_rr_list(struct fiops_data *fiopsd,
+ struct fiops_ioc *ioc)
+{
+ /*
+ * Resorting requires the ioc to be on the RR list already.
+ */
+ if (fiops_ioc_on_rr(ioc))
+ fiops_service_tree_add(fiopsd, ioc);
+}
+
+/*
+ * add to busy list of queues for service, trying to be fair in ordering
+ * the pending list according to last request service
+ */
+static void fiops_add_ioc_rr(struct fiops_data *fiopsd, struct fiops_ioc *ioc)
+{
+ BUG_ON(fiops_ioc_on_rr(ioc));
+ fiops_mark_ioc_on_rr(ioc);
+
+ fiopsd->busy_queues++;
+
+ fiops_resort_rr_list(fiopsd, ioc);
+}
+
+/*
+ * Called when the ioc no longer has requests pending, remove it from
+ * the service tree.
+ */
+static void fiops_del_ioc_rr(struct fiops_data *fiopsd, struct fiops_ioc *ioc)
+{
+ BUG_ON(!fiops_ioc_on_rr(ioc));
+ fiops_clear_ioc_on_rr(ioc);
+
+ if (!RB_EMPTY_NODE(&ioc->rb_node)) {
+ fiops_rb_erase(&ioc->rb_node, ioc->service_tree);
+ ioc->service_tree = NULL;
+ }
+
+ BUG_ON(!fiopsd->busy_queues);
+ fiopsd->busy_queues--;
+}
+
+/*
+ * rb tree support functions
+ */
+static void fiops_del_rq_rb(struct request *rq)
+{
+ struct fiops_ioc *ioc = RQ_CIC(rq);
+
+ elv_rb_del(&ioc->sort_list, rq);
+}
+
+static void fiops_add_rq_rb(struct request *rq)
+{
+ struct fiops_ioc *ioc = RQ_CIC(rq);
+ struct fiops_data *fiopsd = ioc->fiopsd;
+
+ elv_rb_add(&ioc->sort_list, rq);
+
+ if (!fiops_ioc_on_rr(ioc))
+ fiops_add_ioc_rr(fiopsd, ioc);
+}
+
+static void fiops_reposition_rq_rb(struct fiops_ioc *ioc, struct request *rq)
+{
+ elv_rb_del(&ioc->sort_list, rq);
+ fiops_add_rq_rb(rq);
+}
+
+static void fiops_remove_request(struct request *rq)
+{
+ list_del_init(&rq->queuelist);
+ fiops_del_rq_rb(rq);
+}
+
+static u64 fiops_scaled_vios(struct fiops_data *fiopsd,
+ struct fiops_ioc *ioc, struct request *rq)
+{
+ return VIOS_SCALE;
+}
+
+/* return vios dispatched */
+static u64 fiops_dispatch_request(struct fiops_data *fiopsd,
+ struct fiops_ioc *ioc)
+{
+ struct request *rq;
+ struct request_queue *q = fiopsd->queue;
+
+ rq = rq_entry_fifo(ioc->fifo.next);
+
+ fiops_remove_request(rq);
+ elv_dispatch_add_tail(q, rq);
+
+ fiopsd->in_flight[rq_is_sync(rq)]++;
+ ioc->in_flight++;
+
+ return fiops_scaled_vios(fiopsd, ioc, rq);
+}
+
+static int fiops_forced_dispatch(struct fiops_data *fiopsd)
+{
+ struct fiops_ioc *ioc;
+ int dispatched = 0;
+
+ while ((ioc = fiops_rb_first(&fiopsd->service_tree)) != NULL) {
+ while (!list_empty(&ioc->fifo)) {
+ fiops_dispatch_request(fiopsd, ioc);
+ dispatched++;
+ }
+ if (fiops_ioc_on_rr(ioc))
+ fiops_del_ioc_rr(fiopsd, ioc);
+ }
+ return dispatched;
+}
+
+static struct fiops_ioc *fiops_select_ioc(struct fiops_data *fiopsd)
+{
+ struct fiops_ioc *ioc;
+
+ if (RB_EMPTY_ROOT(&fiopsd->service_tree.rb))
+ return NULL;
+ ioc = fiops_rb_first(&fiopsd->service_tree);
+ return ioc;
+}
+
+static void fiops_charge_vios(struct fiops_data *fiopsd,
+ struct fiops_ioc *ioc, u64 vios)
+{
+ struct fiops_rb_root *service_tree = ioc->service_tree;
+ ioc->vios += vios;
+
+ if (RB_EMPTY_ROOT(&ioc->sort_list))
+ fiops_del_ioc_rr(fiopsd, ioc);
+ else
+ fiops_resort_rr_list(fiopsd, ioc);
+
+ fiops_update_min_vios(service_tree);
+}
+
+static int fiops_dispatch_requests(struct request_queue *q, int force)
+{
+ struct fiops_data *fiopsd = q->elevator->elevator_data;
+ struct fiops_ioc *ioc;
+ u64 vios;
+
+ if (unlikely(force))
+ return fiops_forced_dispatch(fiopsd);
+
+ ioc = fiops_select_ioc(fiopsd);
+ if (!ioc)
+ return 0;
+
+ vios = fiops_dispatch_request(fiopsd, ioc);
+
+ fiops_charge_vios(fiopsd, ioc, vios);
+ return 1;
+}
+
+static void fiops_insert_request(struct request_queue *q, struct request *rq)
+{
+ struct fiops_ioc *ioc = RQ_CIC(rq);
+
+ list_add_tail(&rq->queuelist, &ioc->fifo);
+
+ fiops_add_rq_rb(rq);
+}
+
+/*
+ * scheduler run of queue, if there are requests pending and no one in the
+ * driver that will restart queueing
+ */
+static inline void fiops_schedule_dispatch(struct fiops_data *fiopsd)
+{
+ if (fiopsd->busy_queues)
+ kblockd_schedule_work(fiopsd->queue, &fiopsd->unplug_work);
+}
+
+static void fiops_completed_request(struct request_queue *q, struct request *rq)
+{
+ struct fiops_data *fiopsd = q->elevator->elevator_data;
+ struct fiops_ioc *ioc = RQ_CIC(rq);
+
+ fiopsd->in_flight[rq_is_sync(rq)]--;
+ ioc->in_flight--;
+
+ if (fiopsd->in_flight[0] + fiopsd->in_flight[1] == 0)
+ fiops_schedule_dispatch(fiopsd);
+}
+
+static struct request *
+fiops_find_rq_fmerge(struct fiops_data *fiopsd, struct bio *bio)
+{
+ struct task_struct *tsk = current;
+ struct fiops_ioc *cic;
+
+ cic = fiops_cic_lookup(fiopsd, tsk->io_context);
+
+ if (cic) {
+ sector_t sector = bio->bi_sector + bio_sectors(bio);
+
+ return elv_rb_find(&cic->sort_list, sector);
+ }
+
+ return NULL;
+}
+
+static int fiops_merge(struct request_queue *q, struct request **req,
+ struct bio *bio)
+{
+ struct fiops_data *fiopsd = q->elevator->elevator_data;
+ struct request *__rq;
+
+ __rq = fiops_find_rq_fmerge(fiopsd, bio);
+ if (__rq && elv_rq_merge_ok(__rq, bio)) {
+ *req = __rq;
+ return ELEVATOR_FRONT_MERGE;
+ }
+
+ return ELEVATOR_NO_MERGE;
+}
+
+static void fiops_merged_request(struct request_queue *q, struct request *req,
+ int type)
+{
+ if (type == ELEVATOR_FRONT_MERGE) {
+ struct fiops_ioc *ioc = RQ_CIC(req);
+
+ fiops_reposition_rq_rb(ioc, req);
+ }
+}
+
+static void
+fiops_merged_requests(struct request_queue *q, struct request *rq,
+ struct request *next)
+{
+ struct fiops_ioc *ioc = RQ_CIC(rq);
+ struct fiops_data *fiopsd = q->elevator->elevator_data;
+
+ fiops_remove_request(next);
+
+ ioc = RQ_CIC(next);
+ /*
+ * all requests of this task are merged to other tasks, delete it
+ * from the service tree.
+ */
+ if (fiops_ioc_on_rr(ioc) && RB_EMPTY_ROOT(&ioc->sort_list))
+ fiops_del_ioc_rr(fiopsd, ioc);
+}
+
+static int fiops_allow_merge(struct request_queue *q, struct request *rq,
+ struct bio *bio)
+{
+ struct fiops_data *fiopsd = q->elevator->elevator_data;
+ struct fiops_ioc *cic;
+
+ /*
+ * Lookup the ioc that this bio will be queued with. Allow
+ * merge only if rq is queued there.
+ */
+ cic = fiops_cic_lookup(fiopsd, current->io_context);
+
+ return cic == RQ_CIC(rq);
+}
+
+static void fiops_exit_queue(struct elevator_queue *e)
+{
+ struct fiops_data *fiopsd = e->elevator_data;
+
+ cancel_work_sync(&fiopsd->unplug_work);
+
+ kfree(fiopsd);
+}
+
+static void fiops_kick_queue(struct work_struct *work)
+{
+ struct fiops_data *fiopsd =
+ container_of(work, struct fiops_data, unplug_work);
+ struct request_queue *q = fiopsd->queue;
+
+ spin_lock_irq(q->queue_lock);
+ __blk_run_queue(q);
+ spin_unlock_irq(q->queue_lock);
+}
+
+static void *fiops_init_queue(struct request_queue *q)
+{
+ struct fiops_data *fiopsd;
+
+ fiopsd = kzalloc_node(sizeof(*fiopsd), GFP_KERNEL, q->node);
+ if (!fiopsd)
+ return NULL;
+
+ fiopsd->queue = q;
+
+ fiopsd->service_tree = FIOPS_RB_ROOT;
+
+ INIT_WORK(&fiopsd->unplug_work, fiops_kick_queue);
+
+ return fiopsd;
+}
+
+static void fiops_init_icq(struct io_cq *icq)
+{
+ struct fiops_data *fiopsd = icq->q->elevator->elevator_data;
+ struct fiops_ioc *ioc = icq_to_cic(icq);
+
+ RB_CLEAR_NODE(&ioc->rb_node);
+ INIT_LIST_HEAD(&ioc->fifo);
+ ioc->sort_list = RB_ROOT;
+
+ ioc->fiopsd = fiopsd;
+
+ ioc->pid = current->pid;
+}
+
+static struct elevator_type iosched_fiops = {
+ .ops = {
+ .elevator_merge_fn = fiops_merge,
+ .elevator_merged_fn = fiops_merged_request,
+ .elevator_merge_req_fn = fiops_merged_requests,
+ .elevator_allow_merge_fn = fiops_allow_merge,
+ .elevator_dispatch_fn = fiops_dispatch_requests,
+ .elevator_add_req_fn = fiops_insert_request,
+ .elevator_completed_req_fn = fiops_completed_request,
+ .elevator_former_req_fn = elv_rb_former_request,
+ .elevator_latter_req_fn = elv_rb_latter_request,
+ .elevator_init_icq_fn = fiops_init_icq,
+ .elevator_init_fn = fiops_init_queue,
+ .elevator_exit_fn = fiops_exit_queue,
+ },
+ .icq_size = sizeof(struct fiops_ioc),
+ .icq_align = __alignof__(struct fiops_ioc),
+ .elevator_name = "fiops",
+ .elevator_owner = THIS_MODULE,
+};
+
+static int __init fiops_init(void)
+{
+ return elv_register(&iosched_fiops);
+}
+
+static void __exit fiops_exit(void)
+{
+ elv_unregister(&iosched_fiops);
+}
+
+module_init(fiops_init);
+module_exit(fiops_exit);
+
+MODULE_AUTHOR("Jens Axboe, Shaohua Li <shli@kernel.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("IOPS based IO scheduler");
Index: linux/block/Kconfig.iosched
===================================================================
--- linux.orig/block/Kconfig.iosched 2012-01-30 13:48:43.000000000 +0800
+++ linux/block/Kconfig.iosched 2012-01-30 14:34:27.000000000 +0800
@@ -43,6 +43,14 @@ config CFQ_GROUP_IOSCHED
---help---
Enable group IO scheduling in CFQ.
+config IOSCHED_FIOPS
+ tristate "IOPS based I/O scheduler"
+ default y
+ ---help---
+ This is an IOPS based I/O scheduler. It will try to distribute
+ IOPS equally among all processes in the system. It's mainly for
+ Flash based storage.
+
choice
prompt "Default I/O scheduler"
default DEFAULT_CFQ
@@ -56,6 +64,9 @@ choice
config DEFAULT_CFQ
bool "CFQ" if IOSCHED_CFQ=y
+ config DEFAULT_FIOPS
+ bool "FIOPS" if IOSCHED_FIOPS=y
+
config DEFAULT_NOOP
bool "No-op"
@@ -65,6 +76,7 @@ config DEFAULT_IOSCHED
string
default "deadline" if DEFAULT_DEADLINE
default "cfq" if DEFAULT_CFQ
+ default "fiops" if DEFAULT_FIOPS
default "noop" if DEFAULT_NOOP
endmenu
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch v2 2/8]block: fiops read/write request scale
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
2012-01-30 7:02 ` [patch v2 1/8]block: fiops ioscheduler core Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 7:02 ` [patch v2 3/8]block: fiops sync/async scale Shaohua Li
` (7 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-rw-scale.patch --]
[-- Type: text/plain, Size: 3721 bytes --]
read/write speed of Flash based storage usually is different. For example,
in my SSD maxium thoughput of read is about 3 times faster than that of
write. Add a scale to differenate read and write. Also add a tunable, so
user can assign different scale for read and write.
By default, the scale is 1:1, which means the scale is a noop.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/fiops-iosched.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)
Index: linux/block/fiops-iosched.c
===================================================================
--- linux.orig/block/fiops-iosched.c 2012-01-18 14:33:32.000000000 +0800
+++ linux/block/fiops-iosched.c 2012-01-18 14:33:51.000000000 +0800
@@ -15,6 +15,9 @@
#define VIOS_SCALE_SHIFT 10
#define VIOS_SCALE (1 << VIOS_SCALE_SHIFT)
+#define VIOS_READ_SCALE (1)
+#define VIOS_WRITE_SCALE (1)
+
struct fiops_rb_root {
struct rb_root rb;
struct rb_node *left;
@@ -33,6 +36,9 @@ struct fiops_data {
unsigned int in_flight[2];
struct work_struct unplug_work;
+
+ unsigned int read_scale;
+ unsigned int write_scale;
};
struct fiops_ioc {
@@ -280,7 +286,12 @@ static void fiops_remove_request(struct
static u64 fiops_scaled_vios(struct fiops_data *fiopsd,
struct fiops_ioc *ioc, struct request *rq)
{
- return VIOS_SCALE;
+ int vios = VIOS_SCALE;
+
+ if (rq_data_dir(rq) == WRITE)
+ vios = vios * fiopsd->write_scale / fiopsd->read_scale;
+
+ return vios;
}
/* return vios dispatched */
@@ -500,6 +511,9 @@ static void *fiops_init_queue(struct req
INIT_WORK(&fiopsd->unplug_work, fiops_kick_queue);
+ fiopsd->read_scale = VIOS_READ_SCALE;
+ fiopsd->write_scale = VIOS_WRITE_SCALE;
+
return fiopsd;
}
@@ -517,6 +531,60 @@ static void fiops_init_icq(struct io_cq
ioc->pid = current->pid;
}
+/*
+ * sysfs parts below -->
+ */
+static ssize_t
+fiops_var_show(unsigned int var, char *page)
+{
+ return sprintf(page, "%d\n", var);
+}
+
+static ssize_t
+fiops_var_store(unsigned int *var, const char *page, size_t count)
+{
+ char *p = (char *) page;
+
+ *var = simple_strtoul(p, &p, 10);
+ return count;
+}
+
+#define SHOW_FUNCTION(__FUNC, __VAR) \
+static ssize_t __FUNC(struct elevator_queue *e, char *page) \
+{ \
+ struct fiops_data *fiopsd = e->elevator_data; \
+ return fiops_var_show(__VAR, (page)); \
+}
+SHOW_FUNCTION(fiops_read_scale_show, fiopsd->read_scale);
+SHOW_FUNCTION(fiops_write_scale_show, fiopsd->write_scale);
+#undef SHOW_FUNCTION
+
+#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX) \
+static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count) \
+{ \
+ struct fiops_data *fiopsd = e->elevator_data; \
+ unsigned int __data; \
+ int ret = fiops_var_store(&__data, (page), count); \
+ if (__data < (MIN)) \
+ __data = (MIN); \
+ else if (__data > (MAX)) \
+ __data = (MAX); \
+ *(__PTR) = __data; \
+ return ret; \
+}
+STORE_FUNCTION(fiops_read_scale_store, &fiopsd->read_scale, 1, 100);
+STORE_FUNCTION(fiops_write_scale_store, &fiopsd->write_scale, 1, 100);
+#undef STORE_FUNCTION
+
+#define FIOPS_ATTR(name) \
+ __ATTR(name, S_IRUGO|S_IWUSR, fiops_##name##_show, fiops_##name##_store)
+
+static struct elv_fs_entry fiops_attrs[] = {
+ FIOPS_ATTR(read_scale),
+ FIOPS_ATTR(write_scale),
+ __ATTR_NULL
+};
+
static struct elevator_type iosched_fiops = {
.ops = {
.elevator_merge_fn = fiops_merge,
@@ -534,6 +602,7 @@ static struct elevator_type iosched_fiop
},
.icq_size = sizeof(struct fiops_ioc),
.icq_align = __alignof__(struct fiops_ioc),
+ .elevator_attrs = fiops_attrs,
.elevator_name = "fiops",
.elevator_owner = THIS_MODULE,
};
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch v2 3/8]block: fiops sync/async scale
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
2012-01-30 7:02 ` [patch v2 1/8]block: fiops ioscheduler core Shaohua Li
2012-01-30 7:02 ` [patch v2 2/8]block: fiops read/write request scale Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 7:02 ` [patch v2 4/8]block: fiops add ioprio support Shaohua Li
` (6 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-sync-async-scale.patch --]
[-- Type: text/plain, Size: 2611 bytes --]
CFQ gives 2.5 times more share to sync workload. This matches CFQ.
Note this is different with the read/write scale. We have 3 types of
requests:
1. read
2. sync write
3. write
CFQ doesn't differentitate type 1 and 2, but request cost of 1 and 2
are usually different for flash based storage. So we have both sync/async
and read/write scale here.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/fiops-iosched.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
Index: linux/block/fiops-iosched.c
===================================================================
--- linux.orig/block/fiops-iosched.c 2012-01-18 14:33:51.000000000 +0800
+++ linux/block/fiops-iosched.c 2012-01-18 14:33:59.000000000 +0800
@@ -17,6 +17,8 @@
#define VIOS_READ_SCALE (1)
#define VIOS_WRITE_SCALE (1)
+#define VIOS_SYNC_SCALE (2)
+#define VIOS_ASYNC_SCALE (5)
struct fiops_rb_root {
struct rb_root rb;
@@ -39,6 +41,8 @@ struct fiops_data {
unsigned int read_scale;
unsigned int write_scale;
+ unsigned int sync_scale;
+ unsigned int async_scale;
};
struct fiops_ioc {
@@ -291,6 +295,9 @@ static u64 fiops_scaled_vios(struct fiop
if (rq_data_dir(rq) == WRITE)
vios = vios * fiopsd->write_scale / fiopsd->read_scale;
+ if (!rq_is_sync(rq))
+ vios = vios * fiopsd->async_scale / fiopsd->sync_scale;
+
return vios;
}
@@ -513,6 +520,8 @@ static void *fiops_init_queue(struct req
fiopsd->read_scale = VIOS_READ_SCALE;
fiopsd->write_scale = VIOS_WRITE_SCALE;
+ fiopsd->sync_scale = VIOS_SYNC_SCALE;
+ fiopsd->async_scale = VIOS_ASYNC_SCALE;
return fiopsd;
}
@@ -557,6 +566,8 @@ static ssize_t __FUNC(struct elevator_qu
}
SHOW_FUNCTION(fiops_read_scale_show, fiopsd->read_scale);
SHOW_FUNCTION(fiops_write_scale_show, fiopsd->write_scale);
+SHOW_FUNCTION(fiops_sync_scale_show, fiopsd->sync_scale);
+SHOW_FUNCTION(fiops_async_scale_show, fiopsd->async_scale);
#undef SHOW_FUNCTION
#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX) \
@@ -574,6 +585,8 @@ static ssize_t __FUNC(struct elevator_qu
}
STORE_FUNCTION(fiops_read_scale_store, &fiopsd->read_scale, 1, 100);
STORE_FUNCTION(fiops_write_scale_store, &fiopsd->write_scale, 1, 100);
+STORE_FUNCTION(fiops_sync_scale_store, &fiopsd->sync_scale, 1, 100);
+STORE_FUNCTION(fiops_async_scale_store, &fiopsd->async_scale, 1, 100);
#undef STORE_FUNCTION
#define FIOPS_ATTR(name) \
@@ -582,6 +595,8 @@ STORE_FUNCTION(fiops_write_scale_store,
static struct elv_fs_entry fiops_attrs[] = {
FIOPS_ATTR(read_scale),
FIOPS_ATTR(write_scale),
+ FIOPS_ATTR(sync_scale),
+ FIOPS_ATTR(async_scale),
__ATTR_NULL
};
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch v2 4/8]block: fiops add ioprio support
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
` (2 preceding siblings ...)
2012-01-30 7:02 ` [patch v2 3/8]block: fiops sync/async scale Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 7:02 ` [patch v2 5/8]block: fiops preserve vios key for deep queue depth workload Shaohua Li
` (5 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-priority-support.patch --]
[-- Type: text/plain, Size: 5562 bytes --]
Add CFQ-like ioprio support. Priority A will get 20% more share than priority
A+1, which matches CFQ.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/fiops-iosched.c | 103 ++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 92 insertions(+), 11 deletions(-)
Index: linux/block/fiops-iosched.c
===================================================================
--- linux.orig/block/fiops-iosched.c 2012-01-18 14:42:18.000000000 +0800
+++ linux/block/fiops-iosched.c 2012-01-18 14:43:51.000000000 +0800
@@ -20,6 +20,8 @@
#define VIOS_SYNC_SCALE (2)
#define VIOS_ASYNC_SCALE (5)
+#define VIOS_PRIO_SCALE (5)
+
struct fiops_rb_root {
struct rb_root rb;
struct rb_node *left;
@@ -29,10 +31,17 @@ struct fiops_rb_root {
};
#define FIOPS_RB_ROOT (struct fiops_rb_root) { .rb = RB_ROOT}
+enum wl_prio_t {
+ IDLE_WORKLOAD = 0,
+ BE_WORKLOAD = 1,
+ RT_WORKLOAD = 2,
+ FIOPS_PRIO_NR,
+};
+
struct fiops_data {
struct request_queue *queue;
- struct fiops_rb_root service_tree;
+ struct fiops_rb_root service_tree[FIOPS_PRIO_NR];
unsigned int busy_queues;
unsigned int in_flight[2];
@@ -60,13 +69,16 @@ struct fiops_ioc {
struct list_head fifo;
pid_t pid;
+ unsigned short ioprio;
+ enum wl_prio_t wl_type;
};
-#define ioc_service_tree(ioc) (&((ioc)->fiopsd->service_tree))
+#define ioc_service_tree(ioc) (&((ioc)->fiopsd->service_tree[(ioc)->wl_type]))
#define RQ_CIC(rq) icq_to_cic((rq)->elv.icq)
enum ioc_state_flags {
FIOPS_IOC_FLAG_on_rr = 0, /* on round-robin busy list */
+ FIOPS_IOC_FLAG_prio_changed, /* task priority has changed */
};
#define FIOPS_IOC_FNS(name) \
@@ -84,8 +96,18 @@ static inline int fiops_ioc_##name(const
}
FIOPS_IOC_FNS(on_rr);
+FIOPS_IOC_FNS(prio_changed);
#undef FIOPS_IOC_FNS
+enum wl_prio_t fiops_wl_type(short prio_class)
+{
+ if (prio_class == IOPRIO_CLASS_RT)
+ return RT_WORKLOAD;
+ if (prio_class == IOPRIO_CLASS_BE)
+ return BE_WORKLOAD;
+ return IDLE_WORKLOAD;
+}
+
static inline struct fiops_ioc *icq_to_cic(struct io_cq *icq)
{
/* cic->icq is the first member, %NULL will convert to %NULL */
@@ -298,6 +320,8 @@ static u64 fiops_scaled_vios(struct fiop
if (!rq_is_sync(rq))
vios = vios * fiopsd->async_scale / fiopsd->sync_scale;
+ vios += vios * (ioc->ioprio - IOPRIO_NORM) / VIOS_PRIO_SCALE;
+
return vios;
}
@@ -323,14 +347,19 @@ static int fiops_forced_dispatch(struct
{
struct fiops_ioc *ioc;
int dispatched = 0;
+ int i;
- while ((ioc = fiops_rb_first(&fiopsd->service_tree)) != NULL) {
- while (!list_empty(&ioc->fifo)) {
- fiops_dispatch_request(fiopsd, ioc);
- dispatched++;
+ for (i = RT_WORKLOAD; i >= IDLE_WORKLOAD; i--) {
+ while (!RB_EMPTY_ROOT(&fiopsd->service_tree[i].rb)) {
+ ioc = fiops_rb_first(&fiopsd->service_tree[i]);
+
+ while (!list_empty(&ioc->fifo)) {
+ fiops_dispatch_request(fiopsd, ioc);
+ dispatched++;
+ }
+ if (fiops_ioc_on_rr(ioc))
+ fiops_del_ioc_rr(fiopsd, ioc);
}
- if (fiops_ioc_on_rr(ioc))
- fiops_del_ioc_rr(fiopsd, ioc);
}
return dispatched;
}
@@ -338,10 +367,20 @@ static int fiops_forced_dispatch(struct
static struct fiops_ioc *fiops_select_ioc(struct fiops_data *fiopsd)
{
struct fiops_ioc *ioc;
+ struct fiops_rb_root *service_tree = NULL;
+ int i;
+
+ for (i = RT_WORKLOAD; i >= IDLE_WORKLOAD; i--) {
+ if (!RB_EMPTY_ROOT(&fiopsd->service_tree[i].rb)) {
+ service_tree = &fiopsd->service_tree[i];
+ break;
+ }
+ }
- if (RB_EMPTY_ROOT(&fiopsd->service_tree.rb))
+ if (!service_tree)
return NULL;
- ioc = fiops_rb_first(&fiopsd->service_tree);
+
+ ioc = fiops_rb_first(service_tree);
return ioc;
}
@@ -378,10 +417,49 @@ static int fiops_dispatch_requests(struc
return 1;
}
+static void fiops_init_prio_data(struct fiops_ioc *cic)
+{
+ struct task_struct *tsk = current;
+ struct io_context *ioc = cic->icq.ioc;
+ int ioprio_class;
+
+ if (!fiops_ioc_prio_changed(cic))
+ return;
+
+ ioprio_class = IOPRIO_PRIO_CLASS(ioc->ioprio);
+ switch (ioprio_class) {
+ default:
+ printk(KERN_ERR "fiops: bad prio %x\n", ioprio_class);
+ case IOPRIO_CLASS_NONE:
+ /*
+ * no prio set, inherit CPU scheduling settings
+ */
+ cic->ioprio = task_nice_ioprio(tsk);
+ cic->wl_type = fiops_wl_type(task_nice_ioclass(tsk));
+ break;
+ case IOPRIO_CLASS_RT:
+ cic->ioprio = task_ioprio(ioc);
+ cic->wl_type = fiops_wl_type(IOPRIO_CLASS_RT);
+ break;
+ case IOPRIO_CLASS_BE:
+ cic->ioprio = task_ioprio(ioc);
+ cic->wl_type = fiops_wl_type(IOPRIO_CLASS_BE);
+ break;
+ case IOPRIO_CLASS_IDLE:
+ cic->wl_type = fiops_wl_type(IOPRIO_CLASS_IDLE);
+ cic->ioprio = 7;
+ break;
+ }
+
+ fiops_clear_ioc_prio_changed(cic);
+}
+
static void fiops_insert_request(struct request_queue *q, struct request *rq)
{
struct fiops_ioc *ioc = RQ_CIC(rq);
+ fiops_init_prio_data(ioc);
+
list_add_tail(&rq->queuelist, &ioc->fifo);
fiops_add_rq_rb(rq);
@@ -507,6 +585,7 @@ static void fiops_kick_queue(struct work
static void *fiops_init_queue(struct request_queue *q)
{
struct fiops_data *fiopsd;
+ int i;
fiopsd = kzalloc_node(sizeof(*fiopsd), GFP_KERNEL, q->node);
if (!fiopsd)
@@ -514,7 +593,8 @@ static void *fiops_init_queue(struct req
fiopsd->queue = q;
- fiopsd->service_tree = FIOPS_RB_ROOT;
+ for (i = IDLE_WORKLOAD; i <= RT_WORKLOAD; i++)
+ fiopsd->service_tree[i] = FIOPS_RB_ROOT;
INIT_WORK(&fiopsd->unplug_work, fiops_kick_queue);
@@ -538,6 +618,7 @@ static void fiops_init_icq(struct io_cq
ioc->fiopsd = fiopsd;
ioc->pid = current->pid;
+ fiops_mark_ioc_prio_changed(ioc);
}
/*
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch v2 5/8]block: fiops preserve vios key for deep queue depth workload
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
` (3 preceding siblings ...)
2012-01-30 7:02 ` [patch v2 4/8]block: fiops add ioprio support Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 7:02 ` [patch v2 6/8]block: fiops bias sync workload Shaohua Li
` (4 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-vios-key-preserve.patch --]
[-- Type: text/plain, Size: 1134 bytes --]
If the task has running request, even it's added into service tree newly,
we preserve its vios key, so it will not lost its share. This should work
for task driving big queue depth. For single depth task, there is no approach
to preserve its vios key.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/fiops-iosched.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
Index: linux/block/fiops-iosched.c
===================================================================
--- linux.orig/block/fiops-iosched.c 2012-01-18 14:43:51.000000000 +0800
+++ linux/block/fiops-iosched.c 2012-01-18 14:44:25.000000000 +0800
@@ -188,9 +188,12 @@ static void fiops_service_tree_add(struc
int left;
/* New added IOC */
- if (RB_EMPTY_NODE(&ioc->rb_node))
- vios = max_vios(service_tree->min_vios, ioc->vios);
- else {
+ if (RB_EMPTY_NODE(&ioc->rb_node)) {
+ if (ioc->in_flight > 0)
+ vios = ioc->vios;
+ else
+ vios = max_vios(service_tree->min_vios, ioc->vios);
+ } else {
vios = ioc->vios;
/* ioc->service_tree might not equal to service_tree */
fiops_rb_erase(&ioc->rb_node, ioc->service_tree);
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch v2 6/8]block: fiops bias sync workload
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
` (4 preceding siblings ...)
2012-01-30 7:02 ` [patch v2 5/8]block: fiops preserve vios key for deep queue depth workload Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 7:02 ` [patch v2 7/8]block: fiops add some trace information Shaohua Li
` (3 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-limit-async-workload.patch --]
[-- Type: text/plain, Size: 1321 bytes --]
If there are async requests running, delay async workload. Otherwise
async workload (usually very deep iodepth) will use all queue iodepth
and later sync requests will get long delayed. The idea is from CFQ.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/fiops-iosched.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
Index: linux/block/fiops-iosched.c
===================================================================
--- linux.orig/block/fiops-iosched.c 2012-01-19 11:21:01.000000000 +0800
+++ linux/block/fiops-iosched.c 2012-01-21 10:30:46.000000000 +0800
@@ -372,6 +372,7 @@ static struct fiops_ioc *fiops_select_io
struct fiops_ioc *ioc;
struct fiops_rb_root *service_tree = NULL;
int i;
+ struct request *rq;
for (i = RT_WORKLOAD; i >= IDLE_WORKLOAD; i--) {
if (!RB_EMPTY_ROOT(&fiopsd->service_tree[i].rb)) {
@@ -384,6 +385,17 @@ static struct fiops_ioc *fiops_select_io
return NULL;
ioc = fiops_rb_first(service_tree);
+
+ rq = rq_entry_fifo(ioc->fifo.next);
+ /*
+ * we are the only async task and sync requests are in flight, delay a
+ * moment. If there are other tasks coming, sync tasks have no chance
+ * to be starved, don't delay
+ */
+ if (!rq_is_sync(rq) && fiopsd->in_flight[1] != 0 &&
+ service_tree->count == 1)
+ return NULL;
+
return ioc;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch v2 7/8]block: fiops add some trace information
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
` (5 preceding siblings ...)
2012-01-30 7:02 ` [patch v2 6/8]block: fiops bias sync workload Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 7:02 ` [patch v2 8/8]block: fiops sync preempts async Shaohua Li
` (2 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-tracing.patch --]
[-- Type: text/plain, Size: 2300 bytes --]
Add some trace information, which is helpful when I do debugging.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/fiops-iosched.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
Index: linux/block/fiops-iosched.c
===================================================================
--- linux.orig/block/fiops-iosched.c 2012-01-21 10:30:46.000000000 +0800
+++ linux/block/fiops-iosched.c 2012-01-21 10:35:04.000000000 +0800
@@ -10,6 +10,7 @@
#include <linux/jiffies.h>
#include <linux/rbtree.h>
#include <linux/ioprio.h>
+#include <linux/blktrace_api.h>
#include "blk.h"
#define VIOS_SCALE_SHIFT 10
@@ -99,6 +100,11 @@ FIOPS_IOC_FNS(on_rr);
FIOPS_IOC_FNS(prio_changed);
#undef FIOPS_IOC_FNS
+#define fiops_log_ioc(fiopsd, ioc, fmt, args...) \
+ blk_add_trace_msg((fiopsd)->queue, "ioc%d " fmt, (ioc)->pid, ##args)
+#define fiops_log(fiopsd, fmt, args...) \
+ blk_add_trace_msg((fiopsd)->queue, "fiops " fmt, ##args)
+
enum wl_prio_t fiops_wl_type(short prio_class)
{
if (prio_class == IOPRIO_CLASS_RT)
@@ -200,6 +206,8 @@ static void fiops_service_tree_add(struc
ioc->service_tree = NULL;
}
+ fiops_log_ioc(fiopsd, ioc, "service tree add, vios %lld", vios);
+
left = 1;
parent = NULL;
ioc->service_tree = service_tree;
@@ -393,8 +401,12 @@ static struct fiops_ioc *fiops_select_io
* to be starved, don't delay
*/
if (!rq_is_sync(rq) && fiopsd->in_flight[1] != 0 &&
- service_tree->count == 1)
+ service_tree->count == 1) {
+ fiops_log_ioc(fiopsd, ioc,
+ "postpone async, in_flight async %d sync %d",
+ fiopsd->in_flight[0], fiopsd->in_flight[1]);
return NULL;
+ }
return ioc;
}
@@ -405,6 +417,8 @@ static void fiops_charge_vios(struct fio
struct fiops_rb_root *service_tree = ioc->service_tree;
ioc->vios += vios;
+ fiops_log_ioc(fiopsd, ioc, "charge vios %lld, new vios %lld", vios, ioc->vios);
+
if (RB_EMPTY_ROOT(&ioc->sort_list))
fiops_del_ioc_rr(fiopsd, ioc);
else
@@ -498,6 +512,9 @@ static void fiops_completed_request(stru
fiopsd->in_flight[rq_is_sync(rq)]--;
ioc->in_flight--;
+ fiops_log_ioc(fiopsd, ioc, "in_flight %d, busy queues %d",
+ ioc->in_flight, fiopsd->busy_queues);
+
if (fiopsd->in_flight[0] + fiopsd->in_flight[1] == 0)
fiops_schedule_dispatch(fiopsd);
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch v2 8/8]block: fiops sync preempts async
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
` (6 preceding siblings ...)
2012-01-30 7:02 ` [patch v2 7/8]block: fiops add some trace information Shaohua Li
@ 2012-01-30 7:02 ` Shaohua Li
2012-01-30 11:53 ` Hillf Danton
2012-01-30 7:09 ` [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
2012-01-30 15:30 ` Vivek Goyal
9 siblings, 1 reply; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:02 UTC (permalink / raw)
To: axboe
Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim, shaohua.li
[-- Attachment #1: fiops-preempt-async.patch --]
[-- Type: text/plain, Size: 963 bytes --]
Debug patch only.
This implements CFQ like sync preempts async. But like CFQ, this
will starve async.
---
block/fiops-iosched.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
Index: linux/block/fiops-iosched.c
===================================================================
--- linux.orig/block/fiops-iosched.c 2012-01-21 10:35:04.000000000 +0800
+++ linux/block/fiops-iosched.c 2012-01-21 10:48:52.000000000 +0800
@@ -408,6 +408,21 @@ static struct fiops_ioc *fiops_select_io
return NULL;
}
+ /* Let sync request preempt async queue */
+ if (!rq_is_sync(rq) && service_tree->count > 1) {
+ struct rb_node *tmp = rb_next(&ioc->rb_node);
+ struct fiops_ioc *sync_ioc = NULL;
+ while (tmp) {
+ sync_ioc = rb_entry(tmp, struct fiops_ioc, rb_node);
+ rq = rq_entry_fifo(sync_ioc->fifo.next);
+ if (rq_is_sync(rq))
+ break;
+ tmp = rb_next(&sync_ioc->rb_node);
+ }
+ if (sync_ioc)
+ ioc = sync_ioc;
+ }
+
return ioc;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch v2 8/8]block: fiops sync preempts async
2012-01-30 7:02 ` [patch v2 8/8]block: fiops sync preempts async Shaohua Li
@ 2012-01-30 11:53 ` Hillf Danton
0 siblings, 0 replies; 13+ messages in thread
From: Hillf Danton @ 2012-01-30 11:53 UTC (permalink / raw)
To: Shaohua Li
Cc: axboe, linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 1684 bytes --]
Hello Shaohua
On Mon, Jan 30, 2012 at 3:02 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> Debug patch only.
>
> This implements CFQ like sync preempts async. But like CFQ, this
> will starve async.
>
> ---
> Â block/fiops-iosched.c | Â 15 +++++++++++++++
> Â 1 file changed, 15 insertions(+)
>
> Index: linux/block/fiops-iosched.c
> ===================================================================
> --- linux.orig/block/fiops-iosched.c   2012-01-21 10:35:04.000000000 +0800
> +++ linux/block/fiops-iosched.c 2012-01-21 10:48:52.000000000 +0800
> @@ -408,6 +408,21 @@ static struct fiops_ioc *fiops_select_io
> Â Â Â Â Â Â Â Â return NULL;
> Â Â Â Â }
>
> + Â Â Â /* Let sync request preempt async queue */
> + Â Â Â if (!rq_is_sync(rq) && service_tree->count > 1) {
> + Â Â Â Â Â Â Â struct rb_node *tmp = rb_next(&ioc->rb_node);
> + Â Â Â Â Â Â Â struct fiops_ioc *sync_ioc = NULL;
> + Â Â Â Â Â Â Â while (tmp) {
> + Â Â Â Â Â Â Â Â Â Â Â sync_ioc = rb_entry(tmp, struct fiops_ioc, rb_node);
> + Â Â Â Â Â Â Â Â Â Â Â rq = rq_entry_fifo(sync_ioc->fifo.next);
> + Â Â Â Â Â Â Â Â Â Â Â if (rq_is_sync(rq))
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â break;
> + Â Â Â Â Â Â Â Â Â Â Â tmp = rb_next(&sync_ioc->rb_node);
sync_ioc = NULL; yes?
Hillf
> + Â Â Â Â Â Â Â }
> + Â Â Â Â Â Â Â if (sync_ioc)
> + Â Â Â Â Â Â Â Â Â Â Â ioc = sync_ioc;
> + Â Â Â }
> +
> Â Â Â Â return ioc;
> Â }
>
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch v2 0/8]block: An IOPS based ioscheduler
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
` (7 preceding siblings ...)
2012-01-30 7:02 ` [patch v2 8/8]block: fiops sync preempts async Shaohua Li
@ 2012-01-30 7:09 ` Shaohua Li
2012-01-30 15:30 ` Vivek Goyal
9 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-30 7:09 UTC (permalink / raw)
To: axboe; +Cc: linux-kernel, vgoyal, david, jack, zhu.yanhai, namhyung.kim
[-- Attachment #1: Type: text/plain, Size: 2677 bytes --]
On Mon, 2012-01-30 at 15:02 +0800, Shaohua Li wrote:
> An IOPS based I/O scheduler
>
> Flash based storage has some different characteristics against rotate disk.
> 1. no I/O seek.
> 2. read and write I/O cost usually is much different.
> 3. Time which a request takes depends on request size.
> 4. High throughput and IOPS, low latency.
>
> CFQ iosched does well for rotate disk, for example fair dispatching, idle
> for sequential read. It also has optimization for flash based storage (for
> item 1 above), but overall it's not designed for flash based storage. It's
> a slice based algorithm. Since flash based storage request cost is very
> low, and drive has big queue_depth is quite popular now which makes
> dispatching cost even lower, CFQ's slice accounting (jiffy based)
> doesn't work well. CFQ doesn't consider above item 2 & 3.
>
> FIOPS (Fair IOPS) ioscheduler is trying to fix the gaps. It's IOPS based, so
> only targets for drive without I/O seek. It's quite similar like CFQ, but
> the dispatch decision is made according to IOPS instead of slice.
>
> To illustrate the design goals, let's compare Noop and CFQ:
> Noop: best throughput; No fairness and high latency for sync.
> CFQ: lower throughput in some cases; fairness and low latency for sync.
> CFQ throughput is slow sometimes because it doesn't drive deep queue depth.
> FIOPS adopts some merits of CFQ, for example, fairness and bias sync workload.
> And it will be faster than CFQ in general.
>
> Note, if workload iodepth is low, there is no way to maintain fairness without
> performance sacrifice. Neither with CFQ. In such case, FIOPS will choose to not
> lose performance because flash based storage is usually very fast and expensive,
> performance is more important.
>
> The algorithm is simple. Drive has a service tree, and each task lives in
> the tree. The key into the tree is called vios (virtual I/O). Every request
> has vios, which is calculated according to its ioprio, request size and so
> on. Task's vios is the sum of vios of all requests it dispatches. FIOPS
> always selects task with minimum vios in the service tree and let the task
> dispatch request. The dispatched request's vios is then added to the task's
> vios and the task is repositioned in the sevice tree.
>
> Benchmarks results:
> SSD I'm using: max throughput read: 250m/s; write: 80m/s. max IOPS for 4k
> request read 40k/s; write 20k/s
> Latency and fairness tests are done in a desktop with one SSD and kernel
> parameter mem=1G. I'll compare noop, cfq and fiops in such workload. The test
> script and result is attached.
attached is the fio scripts I used for latency and fairness test and
data.
[-- Attachment #2: fiops-data.tgz --]
[-- Type: application/x-compressed-tar, Size: 14188 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch v2 0/8]block: An IOPS based ioscheduler
2012-01-30 7:02 [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
` (8 preceding siblings ...)
2012-01-30 7:09 ` [patch v2 0/8]block: An IOPS based ioscheduler Shaohua Li
@ 2012-01-30 15:30 ` Vivek Goyal
2012-01-31 0:50 ` Shaohua Li
9 siblings, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2012-01-30 15:30 UTC (permalink / raw)
To: Shaohua Li; +Cc: axboe, linux-kernel, david, jack, zhu.yanhai, namhyung.kim
On Mon, Jan 30, 2012 at 03:02:13PM +0800, Shaohua Li wrote:
> An IOPS based I/O scheduler
>
> Flash based storage has some different characteristics against rotate disk.
> 1. no I/O seek.
> 2. read and write I/O cost usually is much different.
> 3. Time which a request takes depends on request size.
> 4. High throughput and IOPS, low latency.
Hi Shaohua,
Last time we agreed that you will try to extend CFQ iops mode to take care
of this case. I was wondering that if that idea is out of the window?
Also what's the real workload where this is going to benefit us. I had
struggled to run something which drove constantly deep queue depths to
get the fairness without idling.
Thanks
Vivek
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch v2 0/8]block: An IOPS based ioscheduler
2012-01-30 15:30 ` Vivek Goyal
@ 2012-01-31 0:50 ` Shaohua Li
0 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2012-01-31 0:50 UTC (permalink / raw)
To: Vivek Goyal; +Cc: axboe, linux-kernel, david, jack, zhu.yanhai, namhyung.kim
On Mon, 2012-01-30 at 10:30 -0500, Vivek Goyal wrote:
> On Mon, Jan 30, 2012 at 03:02:13PM +0800, Shaohua Li wrote:
> > An IOPS based I/O scheduler
> >
> > Flash based storage has some different characteristics against rotate disk.
> > 1. no I/O seek.
> > 2. read and write I/O cost usually is much different.
> > 3. Time which a request takes depends on request size.
> > 4. High throughput and IOPS, low latency.
>
> Hi Shaohua,
>
> Last time we agreed that you will try to extend CFQ iops mode to take care
> of this case. I was wondering that if that idea is out of the window?
I felt there is complexity to do the merge and mess code even with
wrapping some functions as you suggested. Another thing is I want to
avoid complexity of CFQ, for example allocating queue and sharing it
between tasks. cfq_set_request is one source that queue_lock gets
contented, making the code simple can avoid such contention.
> Also what's the real workload where this is going to benefit us. I had
> struggled to run something which drove constantly deep queue depths to
> get the fairness without idling.
To be honest, I don't have real workload. Our test environment runs some
micro benchmarks, which has some problems. I thought we all agreed CFQ
has some limitations. Hoping somebody having real workload can have a
look.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 13+ messages in thread