[PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
@ 2010-06-22 21:34 Jeff Moyer
  2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-22 21:34 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4

Hi,

Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
far worse than that of deadline for enterprise class storage when dealing
with file sizes of 8MB or less.  I used the following command line as a
representative test case:

  fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F

When run using the deadline I/O scheduler, an average of the first 5 numbers
will give you 448.4 files / second.  CFQ will yield only 106.7.  With
this patch series applied (and the two patches I sent yesterday), CFQ now
achieves 462.5 files / second.

This patch set is still an RFC.  I'd like to make it perform better when
there is a competing sequential reader present.  For now, I've addressed
the concerns voiced about the previous posting.

Review and testing would be greatly appreciated.

Thanks!
Jeff

---

New from the last round:

- removed the think time calculation I added for the sync-noidle service tree
- replaced above with a suggestion from Vivek to only guard against currently
  active sequential readers when determining if we can preempt the sync-noidle
  service tree.
- bug fixes

Over all, I think it's simpler now thanks to the suggestions from Jens and
Vivek.

[PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
[PATCH 2/3] jbd: yield the device queue when waiting for commits
[PATCH 3/3] jbd2: yield the device queue when waiting for journal commits

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer
@ 2010-06-22 21:35 ` Jeff Moyer
  2010-06-23  5:04   ` Andrew Morton
  2010-06-24  0:46   ` Vivek Goyal
  2010-06-22 21:35 ` [PATCH 2/3] jbd: yield the device queue when waiting for commits Jeff Moyer
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-22 21:35 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4, Jeff Moyer

This patch implements a blk_yield to allow a process to voluntarily
give up its I/O scheduler time slice.  This is desirable for those processes
which know that they will be blocked on I/O from another process, such as
the file system journal thread.  Following patches will put calls to blk_yield
into jbd and jbd2.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
---
 block/blk-core.c         |   13 +++++
 block/blk-settings.c     |    6 ++
 block/cfq-iosched.c      |  123 +++++++++++++++++++++++++++++++++++++++++++++-
 block/elevator.c         |    8 +++
 include/linux/blkdev.h   |    4 ++
 include/linux/elevator.h |    3 +
 6 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f84cce4..b9afbba 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -324,6 +324,18 @@ void blk_unplug(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_unplug);
 
+void generic_yield_iosched(struct request_queue *q, struct task_struct *tsk)
+{
+	elv_yield(q, tsk);
+}
+
+void blk_yield(struct request_queue *q, struct task_struct *tsk)
+{
+	if (q->yield_fn)
+		q->yield_fn(q, tsk);
+}
+EXPORT_SYMBOL(blk_yield);
+
 /**
  * blk_start_queue - restart a previously stopped queue
  * @q:    The &struct request_queue in question
@@ -609,6 +621,7 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;
 	q->unplug_fn		= generic_unplug_device;
+	q->yield_fn		= generic_yield_iosched;
 	q->queue_flags		= QUEUE_FLAG_DEFAULT;
 	q->queue_lock		= lock;
 
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f5ed5a1..fe548c9 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -171,6 +171,12 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
 }
 EXPORT_SYMBOL(blk_queue_make_request);
 
+void blk_queue_yield(struct request_queue *q, yield_fn *yield)
+{
+	q->yield_fn = yield;
+}
+EXPORT_SYMBOL_GPL(blk_queue_yield);
+
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
  * @q: the request queue for the device
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index dab836e..a9922b9 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -87,9 +87,12 @@ struct cfq_rb_root {
 	unsigned total_weight;
 	u64 min_vdisktime;
 	struct rb_node *active;
+	unsigned long last_expiry;
+	pid_t last_pid;
 };
 #define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT, .left = NULL, \
-			.count = 0, .min_vdisktime = 0, }
+			.count = 0, .min_vdisktime = 0, .last_expiry = 0UL, \
+			.last_pid = (pid_t)-1, }
 
 /*
  * Per process-grouping structure
@@ -147,6 +150,7 @@ struct cfq_queue {
 	struct cfq_queue *new_cfqq;
 	struct cfq_group *cfqg;
 	struct cfq_group *orig_cfqg;
+	struct cfq_io_context *yield_to;
 };
 
 /*
@@ -318,6 +322,7 @@ enum cfqq_state_flags {
 	CFQ_CFQQ_FLAG_split_coop,	/* shared cfqq will be splitted */
 	CFQ_CFQQ_FLAG_deep,		/* sync cfqq experienced large depth */
 	CFQ_CFQQ_FLAG_wait_busy,	/* Waiting for next request */
+	CFQ_CFQQ_FLAG_yield,		/* Allow another cfqq to run */
 };
 
 #define CFQ_CFQQ_FNS(name)						\
@@ -347,6 +352,7 @@ CFQ_CFQQ_FNS(coop);
 CFQ_CFQQ_FNS(split_coop);
 CFQ_CFQQ_FNS(deep);
 CFQ_CFQQ_FNS(wait_busy);
+CFQ_CFQQ_FNS(yield);
 #undef CFQ_CFQQ_FNS
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
@@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 	cfq_clear_cfqq_wait_request(cfqq);
 	cfq_clear_cfqq_wait_busy(cfqq);
 
+	if (!cfq_cfqq_yield(cfqq)) {
+		struct cfq_rb_root *st;
+		st = service_tree_for(cfqq->cfqg,
+				      cfqq_prio(cfqq), cfqq_type(cfqq));
+		st->last_expiry = jiffies;
+		st->last_pid = cfqq->pid;
+	}
+	cfq_clear_cfqq_yield(cfqq);
+
 	/*
 	 * If this cfqq is shared between multiple processes, check to
 	 * make sure that those processes are still issuing I/Os within
@@ -2118,7 +2133,7 @@ static void choose_service_tree(struct cfq_data *cfqd, struct cfq_group *cfqg)
 		slice = max(slice, 2 * cfqd->cfq_slice_idle);
 
 	slice = max_t(unsigned, slice, CFQ_MIN_TT);
-	cfq_log(cfqd, "workload slice:%d", slice);
+	cfq_log(cfqd, "workload:%d slice:%d", cfqd->serving_type, slice);
 	cfqd->workload_expires = jiffies + slice;
 	cfqd->noidle_tree_requires_idle = false;
 }
@@ -2153,6 +2168,36 @@ static void cfq_choose_cfqg(struct cfq_data *cfqd)
 	choose_service_tree(cfqd, cfqg);
 }
 
+static int cfq_should_yield_now(struct cfq_queue *cfqq,
+				struct cfq_queue **yield_to)
+{
+	struct cfq_queue *new_cfqq;
+
+	new_cfqq = cic_to_cfqq(cfqq->yield_to, 1);
+
+	/*
+	 * If the queue we're yielding to is in a different cgroup,
+	 * just expire our own time slice.
+	 */
+	if (new_cfqq->cfqg != cfqq->cfqg) {
+		*yield_to = NULL;
+		return 1;
+	}
+
+	/*
+	 * If the new queue has pending I/O, then switch to it
+	 * immediately.  Otherwise, see if we can idle until it is
+	 * ready to preempt us.
+	 */
+	if (!RB_EMPTY_ROOT(&new_cfqq->sort_list)) {
+		*yield_to = new_cfqq;
+		return 1;
+	}
+
+	*yield_to = NULL;
+	return 0;
+}
+
 /*
  * Select a queue for service. If we have a current active queue,
  * check whether to continue servicing it, or retrieve and set a new one.
@@ -2187,6 +2232,10 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 		 * have been idling all along on this queue and it should be
 		 * ok to wait for this request to complete.
 		 */
+		if (cfq_cfqq_yield(cfqq) &&
+		    cfq_should_yield_now(cfqq, &new_cfqq))
+			goto expire;
+
 		if (cfqq->cfqg->nr_cfqq == 1 && RB_EMPTY_ROOT(&cfqq->sort_list)
 		    && cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
 			cfqq = NULL;
@@ -2215,6 +2264,9 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 		goto expire;
 	}
 
+	if (cfq_cfqq_yield(cfqq) && cfq_should_yield_now(cfqq, &new_cfqq))
+		goto expire;
+
 	/*
 	 * No requests pending. If the active queue still has requests in
 	 * flight or is idling for a new request, allow either of these
@@ -2241,6 +2293,65 @@ keep_queue:
 	return cfqq;
 }
 
+static inline int expiry_data_valid(struct cfq_rb_root *service_tree)
+{
+	return (service_tree->last_pid != (pid_t)-1 &&
+		service_tree->last_expiry != 0UL);
+}
+
+static void cfq_yield(struct request_queue *q, struct task_struct *tsk)
+{
+	struct cfq_data *cfqd = q->elevator->elevator_data;
+	struct cfq_io_context *cic, *new_cic;
+	struct cfq_queue *cfqq;
+
+	cic = cfq_cic_lookup(cfqd, current->io_context);
+	if (!cic)
+		return;
+
+	task_lock(tsk);
+	new_cic = cfq_cic_lookup(cfqd, tsk->io_context);
+	atomic_long_inc(&tsk->io_context->refcount);
+	task_unlock(tsk);
+	if (!new_cic)
+		goto out_dec;
+
+	spin_lock_irq(q->queue_lock);
+
+	cfqq = cic_to_cfqq(cic, 1);
+	if (!cfqq)
+		goto out_unlock;
+
+	/*
+	 * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we
+	 * are idling on the last queue in that workload, *and* there are no
+	 * potential dependent readers running currently, then go ahead and
+	 * yield the queue.
+	 */
+	if (cfqd->active_queue == cfqq &&
+	    cfqd->serving_type == SYNC_NOIDLE_WORKLOAD) {
+		/*
+		 * If there's been no I/O from another process in the idle
+		 * slice time, then there is by definition no dependent
+		 * read going on for this service tree.
+		 */
+		if (expiry_data_valid(cfqq->service_tree) &&
+		    time_before(cfqq->service_tree->last_expiry +
+				cfq_slice_idle, jiffies) &&
+		    cfqq->service_tree->last_pid != cfqq->pid)
+			goto out_unlock;
+	}
+
+	cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid);
+	cfqq->yield_to = new_cic;
+	cfq_mark_cfqq_yield(cfqq);
+
+out_unlock:
+	spin_unlock_irq(q->queue_lock);
+out_dec:
+	put_io_context(tsk->io_context);
+}
+
 static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq)
 {
 	int dispatched = 0;
@@ -3123,6 +3234,13 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
 	if (!cfqq)
 		return false;
 
+	/*
+	 * If the active queue yielded its timeslice to this queue, let
+	 * it preempt.
+	 */
+	if (cfq_cfqq_yield(cfqq) && RQ_CIC(rq) == cfqq->yield_to)
+		return true;
+
 	if (cfq_class_idle(new_cfqq))
 		return false;
 
@@ -3973,6 +4091,7 @@ static struct elevator_type iosched_cfq = {
 		.elevator_deactivate_req_fn =	cfq_deactivate_request,
 		.elevator_queue_empty_fn =	cfq_queue_empty,
 		.elevator_completed_req_fn =	cfq_completed_request,
+		.elevator_yield_fn =		cfq_yield,
 		.elevator_former_req_fn =	elv_rb_former_request,
 		.elevator_latter_req_fn =	elv_rb_latter_request,
 		.elevator_set_req_fn =		cfq_set_request,
diff --git a/block/elevator.c b/block/elevator.c
index 923a913..5e33297 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -866,6 +866,14 @@ void elv_completed_request(struct request_queue *q, struct request *rq)
 	}
 }
 
+void elv_yield(struct request_queue *q, struct task_struct *tsk)
+{
+	struct elevator_queue *e = q->elevator;
+
+	if (e && e->ops->elevator_yield_fn)
+		e->ops->elevator_yield_fn(q, tsk);
+}
+
 #define to_elv(atr) container_of((atr), struct elv_fs_entry, attr)
 
 static ssize_t
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 09a8402..8d073c0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -263,6 +263,7 @@ struct request_pm_state
 
 typedef void (request_fn_proc) (struct request_queue *q);
 typedef int (make_request_fn) (struct request_queue *q, struct bio *bio);
+typedef void (yield_fn) (struct request_queue *q, struct task_struct *tsk);
 typedef int (prep_rq_fn) (struct request_queue *, struct request *);
 typedef void (unplug_fn) (struct request_queue *);
 
@@ -345,6 +346,7 @@ struct request_queue
 
 	request_fn_proc		*request_fn;
 	make_request_fn		*make_request_fn;
+	yield_fn		*yield_fn;
 	prep_rq_fn		*prep_rq_fn;
 	unplug_fn		*unplug_fn;
 	merge_bvec_fn		*merge_bvec_fn;
@@ -837,6 +839,7 @@ extern int blk_execute_rq(struct request_queue *, struct gendisk *,
 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 				  struct request *, int, rq_end_io_fn *);
 extern void blk_unplug(struct request_queue *q);
+extern void blk_yield(struct request_queue *q, struct task_struct *tsk);
 
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
@@ -929,6 +932,7 @@ extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
 						      request_fn_proc *, spinlock_t *);
 extern void blk_cleanup_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
+extern void blk_queue_yield(struct request_queue *, yield_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
 extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int);
 extern void blk_queue_max_segments(struct request_queue *, unsigned short);
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 2c958f4..a68b5b1 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -23,6 +23,7 @@ typedef void (elevator_add_req_fn) (struct request_queue *, struct request *);
 typedef int (elevator_queue_empty_fn) (struct request_queue *);
 typedef struct request *(elevator_request_list_fn) (struct request_queue *, struct request *);
 typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *);
+typedef void (elevator_yield_fn) (struct request_queue *, struct task_struct *tsk);
 typedef int (elevator_may_queue_fn) (struct request_queue *, int);
 
 typedef int (elevator_set_req_fn) (struct request_queue *, struct request *, gfp_t);
@@ -48,6 +49,7 @@ struct elevator_ops
 
 	elevator_queue_empty_fn *elevator_queue_empty_fn;
 	elevator_completed_req_fn *elevator_completed_req_fn;
+	elevator_yield_fn *elevator_yield_fn;
 
 	elevator_request_list_fn *elevator_former_req_fn;
 	elevator_request_list_fn *elevator_latter_req_fn;
@@ -111,6 +113,7 @@ extern void elv_bio_merged(struct request_queue *q, struct request *,
 				struct bio *);
 extern void elv_requeue_request(struct request_queue *, struct request *);
 extern int elv_queue_empty(struct request_queue *);
+extern void elv_yield(struct request_queue *, struct task_struct *);
 extern struct request *elv_former_request(struct request_queue *, struct request *);
 extern struct request *elv_latter_request(struct request_queue *, struct request *);
 extern int elv_register_queue(struct request_queue *q);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] jbd: yield the device queue when waiting for commits
  2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer
  2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer
@ 2010-06-22 21:35 ` Jeff Moyer
  2010-06-22 21:35 ` [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits Jeff Moyer
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-22 21:35 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4, Jeff Moyer

This patch gets CFQ back in line with deadline for iozone runs, especially
those testing small files + fsync timings.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/jbd/journal.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 93d1e47..9b6cf4c 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -36,6 +36,7 @@
 #include <linux/poison.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
+#include <linux/blkdev.h>
 
 #include <asm/uaccess.h>
 #include <asm/page.h>
@@ -549,6 +550,11 @@ int log_wait_commit(journal_t *journal, tid_t tid)
 	while (tid_gt(tid, journal->j_commit_sequence)) {
 		jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
 				  tid, journal->j_commit_sequence);
+		/*
+		 * Give up our I/O scheduler time slice to allow the journal
+		 * thread to issue I/O.
+		 */
+		blk_yield(journal->j_dev->bd_disk->queue, journal->j_task);
 		wake_up(&journal->j_wait_commit);
 		spin_unlock(&journal->j_state_lock);
 		wait_event(journal->j_wait_done_commit,
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits
  2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer
  2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer
  2010-06-22 21:35 ` [PATCH 2/3] jbd: yield the device queue when waiting for commits Jeff Moyer
@ 2010-06-22 21:35 ` Jeff Moyer
  2010-06-22 22:13 ` [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Joel Becker
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-22 21:35 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4, Jeff Moyer

This patch gets CFQ back in line with deadline for iozone runs, especially
those testing small files + fsync timings.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/jbd2/journal.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index bc2ff59..aba4754 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -41,6 +41,7 @@
 #include <linux/hash.h>
 #include <linux/log2.h>
 #include <linux/vmalloc.h>
+#include <linux/blkdev.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/jbd2.h>
@@ -580,6 +581,11 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
 	while (tid_gt(tid, journal->j_commit_sequence)) {
 		jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
 				  tid, journal->j_commit_sequence);
+		/*
+		 * Give up our I/O scheduler time slice to allow the journal
+		 * thread to issue I/O.
+		 */
+		blk_yield(journal->j_dev->bd_disk->queue, journal->j_task);
 		wake_up(&journal->j_wait_commit);
 		spin_unlock(&journal->j_state_lock);
 		wait_event(journal->j_wait_done_commit,
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer
                   ` (2 preceding siblings ...)
  2010-06-22 21:35 ` [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits Jeff Moyer
@ 2010-06-22 22:13 ` Joel Becker
  2010-06-23  9:20 ` Christoph Hellwig
  2010-06-23  9:30 ` Tao Ma
  5 siblings, 0 replies; 24+ messages in thread
From: Joel Becker @ 2010-06-22 22:13 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4

On Tue, Jun 22, 2010 at 05:34:59PM -0400, Jeff Moyer wrote:
> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
> far worse than that of deadline for enterprise class storage when dealing
> with file sizes of 8MB or less.  I used the following command line as a
> representative test case:
> 
>   fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F

	I'd be interested in how ocfs2 does, because we use jbd2 too.

Joel

-- 

Life's Little Instruction Book #139

	"Never deprive someone of hope; it might be all they have."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer
@ 2010-06-23  5:04   ` Andrew Morton
  2010-06-23 14:50     ` Jeff Moyer
  2010-06-24  0:46   ` Vivek Goyal
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2010-06-23  5:04 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4

On Tue, 22 Jun 2010 17:35:00 -0400 Jeff Moyer <jmoyer@redhat.com> wrote:

> This patch implements a blk_yield to allow a process to voluntarily
> give up its I/O scheduler time slice.  This is desirable for those processes
> which know that they will be blocked on I/O from another process, such as
> the file system journal thread.  Following patches will put calls to blk_yield
> into jbd and jbd2.
> 

I'm looking through this patch series trying to find the
analysis/description of the cause for this (bad) performance problem. 
But all I'm seeing is implementation stuff :(  It's hard to review code
with your eyes shut.


> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -324,6 +324,18 @@ void blk_unplug(struct request_queue *q)
>  }
>  EXPORT_SYMBOL(blk_unplug);
>  
> +void generic_yield_iosched(struct request_queue *q, struct task_struct *tsk)
> +{
> +	elv_yield(q, tsk);
> +}

static?

>
> ...
>
> +void blk_queue_yield(struct request_queue *q, yield_fn *yield)
> +{
> +	q->yield_fn = yield;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_yield);

There's a tradition in the block layer of using truly awful identifiers
for functions-which-set-things.  But there's no good reason for
retaining that tradition.  blk_queue_yield_set(), perhaps.

(what name would you give an accessor which _reads_ q->yield_fn?  yup,
"blk_queue_yield()".  doh).

>  /**
>   * blk_queue_bounce_limit - set bounce buffer limit for queue
>   * @q: the request queue for the device
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index dab836e..a9922b9 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -87,9 +87,12 @@ struct cfq_rb_root {
>  	unsigned total_weight;
>  	u64 min_vdisktime;
>  	struct rb_node *active;
> +	unsigned long last_expiry;
> +	pid_t last_pid;

These fields are pretty fundamental to understanding the
implementation.  Some nice descriptions would be nice.

>  };
>  #define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT, .left = NULL, \
> -			.count = 0, .min_vdisktime = 0, }
> +			.count = 0, .min_vdisktime = 0, .last_expiry = 0UL, \
> +			.last_pid = (pid_t)-1, }

May as well leave the 0 and NULL fields unmentioned (ie: don't do
crappy stuff because the old code did crappy stuff!)

>  /*
>   * Per process-grouping structure
> @@ -147,6 +150,7 @@ struct cfq_queue {
>  	struct cfq_queue *new_cfqq;
>  	struct cfq_group *cfqg;
>  	struct cfq_group *orig_cfqg;
> +	struct cfq_io_context *yield_to;
>  };
>  
>  /*
>
> ...
>
> +static int cfq_should_yield_now(struct cfq_queue *cfqq,
> +				struct cfq_queue **yield_to)

The bool-returning function could return a bool type.

> +{
> +	struct cfq_queue *new_cfqq;
> +
> +	new_cfqq = cic_to_cfqq(cfqq->yield_to, 1);
> +
> +	/*
> +	 * If the queue we're yielding to is in a different cgroup,
> +	 * just expire our own time slice.
> +	 */
> +	if (new_cfqq->cfqg != cfqq->cfqg) {
> +		*yield_to = NULL;
> +		return 1;
> +	}
> +
> +	/*
> +	 * If the new queue has pending I/O, then switch to it
> +	 * immediately.  Otherwise, see if we can idle until it is
> +	 * ready to preempt us.
> +	 */
> +	if (!RB_EMPTY_ROOT(&new_cfqq->sort_list)) {
> +		*yield_to = new_cfqq;
> +		return 1;
> +	}
> +
> +	*yield_to = NULL;
> +	return 0;
> +}
> +
>  /*
>   * Select a queue for service. If we have a current active queue,
>   * check whether to continue servicing it, or retrieve and set a new one.
>
> ...
>
> +static inline int expiry_data_valid(struct cfq_rb_root *service_tree)
> +{
> +	return (service_tree->last_pid != (pid_t)-1 &&
> +		service_tree->last_expiry != 0UL);
> +}

The compiler will inline this.

> +static void cfq_yield(struct request_queue *q, struct task_struct *tsk)

-ENODESCRIPTION

> +{
> +	struct cfq_data *cfqd = q->elevator->elevator_data;
> +	struct cfq_io_context *cic, *new_cic;
> +	struct cfq_queue *cfqq;
> +
> +	cic = cfq_cic_lookup(cfqd, current->io_context);
> +	if (!cic)
> +		return;
> +
> +	task_lock(tsk);
> +	new_cic = cfq_cic_lookup(cfqd, tsk->io_context);
> +	atomic_long_inc(&tsk->io_context->refcount);

How do we know tsk has an io_context?  Use get_io_context() and check
its result?

> +	task_unlock(tsk);
> +	if (!new_cic)
> +		goto out_dec;
> +
> +	spin_lock_irq(q->queue_lock);
> +
> +	cfqq = cic_to_cfqq(cic, 1);
> +	if (!cfqq)
> +		goto out_unlock;
> +
> +	/*
> +	 * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we
> +	 * are idling on the last queue in that workload, *and* there are no
> +	 * potential dependent readers running currently, then go ahead and
> +	 * yield the queue.
> +	 */

Comment explains the code, but doesn't explain the *reason* for the code.

> +	if (cfqd->active_queue == cfqq &&
> +	    cfqd->serving_type == SYNC_NOIDLE_WORKLOAD) {
> +		/*
> +		 * If there's been no I/O from another process in the idle
> +		 * slice time, then there is by definition no dependent
> +		 * read going on for this service tree.
> +		 */
> +		if (expiry_data_valid(cfqq->service_tree) &&
> +		    time_before(cfqq->service_tree->last_expiry +
> +				cfq_slice_idle, jiffies) &&
> +		    cfqq->service_tree->last_pid != cfqq->pid)
> +			goto out_unlock;
> +	}
> +
> +	cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid);
> +	cfqq->yield_to = new_cic;
> +	cfq_mark_cfqq_yield(cfqq);
> +
> +out_unlock:
> +	spin_unlock_irq(q->queue_lock);
> +out_dec:
> +	put_io_context(tsk->io_context);
> +}
> +
>  static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq)
>  {
>  	int dispatched = 0;
>
> ...
>
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -866,6 +866,14 @@ void elv_completed_request(struct request_queue *q, struct request *rq)
>  	}
>  }
>  
> +void elv_yield(struct request_queue *q, struct task_struct *tsk)
> +{
> +	struct elevator_queue *e = q->elevator;
> +
> +	if (e && e->ops->elevator_yield_fn)
> +		e->ops->elevator_yield_fn(q, tsk);
> +}

Again, no documentation.  How are other programmers to know when, why
and how they should use this?

>
> ...
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer
                   ` (3 preceding siblings ...)
  2010-06-22 22:13 ` [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Joel Becker
@ 2010-06-23  9:20 ` Christoph Hellwig
  2010-06-23 13:03   ` Jeff Moyer
  2010-06-23  9:30 ` Tao Ma
  5 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2010-06-23  9:20 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4

On Tue, Jun 22, 2010 at 05:34:59PM -0400, Jeff Moyer wrote:
> Hi,
> 
> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
> far worse than that of deadline for enterprise class storage when dealing
> with file sizes of 8MB or less.  I used the following command line as a
> representative test case:
> 
>   fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F
> 
> When run using the deadline I/O scheduler, an average of the first 5 numbers
> will give you 448.4 files / second.  CFQ will yield only 106.7.  With
> this patch series applied (and the two patches I sent yesterday), CFQ now
> achieves 462.5 files / second.
> 
> This patch set is still an RFC.  I'd like to make it perform better when
> there is a competing sequential reader present.  For now, I've addressed
> the concerns voiced about the previous posting.

What happened to the initial idea of just using the BIO_RW_META flag
for log writes?  In the end log writes are the most important writes you
have in a journaled filesystem, and they should not be effect to any
kind of queue idling logic or other interruption.  Log I/O is usually
very little (unless you use old XFS code with a worst-case directory
manipulation workload), and very latency sensitive.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer
                   ` (4 preceding siblings ...)
  2010-06-23  9:20 ` Christoph Hellwig
@ 2010-06-23  9:30 ` Tao Ma
  2010-06-23 13:06   ` Jeff Moyer
  2010-06-24  5:54   ` Tao Ma
  5 siblings, 2 replies; 24+ messages in thread
From: Tao Ma @ 2010-06-23  9:30 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4

Hi Jeff,

On 06/23/2010 05:34 AM, Jeff Moyer wrote:
> Hi,
>
> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
> far worse than that of deadline for enterprise class storage when dealing
> with file sizes of 8MB or less.  I used the following command line as a
> representative test case:
>
>    fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F
>
> When run using the deadline I/O scheduler, an average of the first 5 numbers
> will give you 448.4 files / second.  CFQ will yield only 106.7.  With
> this patch series applied (and the two patches I sent yesterday), CFQ now
> achieves 462.5 files / second.
which 2 patches? Could you paste the link or the subject? Just want to 
make my test env like yours. ;)
As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like 
to give it a try and give you some feedback about the test.

Regards,
Tao
>
> This patch set is still an RFC.  I'd like to make it perform better when
> there is a competing sequential reader present.  For now, I've addressed
> the concerns voiced about the previous posting.
>
> Review and testing would be greatly appreciated.
>
> Thanks!
> Jeff
>
> ---
>
> New from the last round:
>
> - removed the think time calculation I added for the sync-noidle service tree
> - replaced above with a suggestion from Vivek to only guard against currently
>    active sequential readers when determining if we can preempt the sync-noidle
>    service tree.
> - bug fixes
>
> Over all, I think it's simpler now thanks to the suggestions from Jens and
> Vivek.
>
> [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
> [PATCH 2/3] jbd: yield the device queue when waiting for commits
> [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-23  9:20 ` Christoph Hellwig
@ 2010-06-23 13:03   ` Jeff Moyer
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-23 13:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, vgoyal, linux-kernel, linux-ext4

Christoph Hellwig <hch@infradead.org> writes:

> On Tue, Jun 22, 2010 at 05:34:59PM -0400, Jeff Moyer wrote:
>> Hi,
>> 
>> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
>> far worse than that of deadline for enterprise class storage when dealing
>> with file sizes of 8MB or less.  I used the following command line as a
>> representative test case:
>> 
>>   fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F
>> 
>> When run using the deadline I/O scheduler, an average of the first 5 numbers
>> will give you 448.4 files / second.  CFQ will yield only 106.7.  With
>> this patch series applied (and the two patches I sent yesterday), CFQ now
>> achieves 462.5 files / second.
>> 
>> This patch set is still an RFC.  I'd like to make it perform better when
>> there is a competing sequential reader present.  For now, I've addressed
>> the concerns voiced about the previous posting.
>
> What happened to the initial idea of just using the BIO_RW_META flag
> for log writes?  In the end log writes are the most important writes you
> have in a journaled filesystem, and they should not be effect to any
> kind of queue idling logic or other interruption.  Log I/O is usually
> very little (unless you use old XFS code with a worst-case directory
> manipulation workload), and very latency sensitive.

Vivek showed that starting firefox in the presence of a processing doing
fsyncs (using the RQ_META approach) took twice as long as without the
patch:
  http://lkml.org/lkml/2010/4/6/276

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-23  9:30 ` Tao Ma
@ 2010-06-23 13:06   ` Jeff Moyer
  2010-06-24  5:54   ` Tao Ma
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-23 13:06 UTC (permalink / raw)
  To: Tao Ma; +Cc: axboe, vgoyal, linux-kernel, linux-ext4

Tao Ma <tao.ma@oracle.com> writes:

> Hi Jeff,
>
> On 06/23/2010 05:34 AM, Jeff Moyer wrote:
>> Hi,
>>
>> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
>> far worse than that of deadline for enterprise class storage when dealing
>> with file sizes of 8MB or less.  I used the following command line as a
>> representative test case:
>>
>>    fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F
>>
>> When run using the deadline I/O scheduler, an average of the first 5 numbers
>> will give you 448.4 files / second.  CFQ will yield only 106.7.  With
>> this patch series applied (and the two patches I sent yesterday), CFQ now
>> achieves 462.5 files / second.
> which 2 patches? Could you paste the link or the subject? Just want to
> make my test env like yours. ;)
> As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd
> like to give it a try and give you some feedback about the test.

http://lkml.org/lkml/2010/6/21/307:

[PATCH 1/2] cfq: always return false from should_idle if slice_idle is
set to zero
[PATCH 2/2] cfq: allow dispatching of both sync and async I/O together

Thanks in advance for the testing!

-Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-23  5:04   ` Andrew Morton
@ 2010-06-23 14:50     ` Jeff Moyer
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-23 14:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: axboe, vgoyal, linux-kernel, linux-ext4

Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue, 22 Jun 2010 17:35:00 -0400 Jeff Moyer <jmoyer@redhat.com> wrote:
>
>> This patch implements a blk_yield to allow a process to voluntarily
>> give up its I/O scheduler time slice.  This is desirable for those processes
>> which know that they will be blocked on I/O from another process, such as
>> the file system journal thread.  Following patches will put calls to blk_yield
>> into jbd and jbd2.
>> 
>
> I'm looking through this patch series trying to find the
> analysis/description of the cause for this (bad) performance problem. 
> But all I'm seeing is implementation stuff :(  It's hard to review code
> with your eyes shut.

Sorry about that, Andrew.  The problem case is (for example) iozone when
run with small file sizes (up to 8MB) configured to fsync after each
file is written.  Because the iozone process is issuing synchronous
writes, it is put onto CFQ's SYNC service tree.  The significance of
this is that CFQ will idle for up to 8ms waiting for requests on such
queues.  So, what happens is that the iozone process will issue, say,
64KB worth of write I/O.  That I/O will just land in the page cache.
Then, the iozone process does an fsync which forces those I/Os to disk
as synchronous writes.  Then, the file system's fsync method is invoked,
and for ext3/4, it calls log_start_commit followed by log_wait_commit.
Because those synchronous writes were forced out in the context of the
iozone process, CFQ will now idle on iozone's cfqq waiting for more I/O.
However, iozone's progress is gated by the journal thread, now.

So, I tried two approaches to solving the problem.  The first, which
Christoph brought up again in this thread, was to simply mark all
journal I/O as BIO_RW_META, which would cause the iozone process' cfqq
to be preempted when the journal issued its I/O.  However, Vivek pointed
out that this was bad for interactive performance.

The second approach, of which this is the fourth iteration, was to allow
the file system to explicitly tell the I/O scheduler that it is waiting
on I/O from another process.

Does that help?  Let me know if you have any more questions, and thanks
a ton for looking at this, Andrew.  I appreciate it.

The comments I've elided from my response make perfect sense, so I'll
address them in the next posting.

>>  };
>>  #define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT, .left = NULL, \
>> -			.count = 0, .min_vdisktime = 0, }
>> +			.count = 0, .min_vdisktime = 0, .last_expiry = 0UL, \
>> +			.last_pid = (pid_t)-1, }
>
> May as well leave the 0 and NULL fields unmentioned (ie: don't do
> crappy stuff because the old code did crappy stuff!)

I don't actually understand why you take issue with this.

>> +{
>> +	struct cfq_data *cfqd = q->elevator->elevator_data;
>> +	struct cfq_io_context *cic, *new_cic;
>> +	struct cfq_queue *cfqq;
>> +
>> +	cic = cfq_cic_lookup(cfqd, current->io_context);
>> +	if (!cic)
>> +		return;
>> +
>> +	task_lock(tsk);
>> +	new_cic = cfq_cic_lookup(cfqd, tsk->io_context);
>> +	atomic_long_inc(&tsk->io_context->refcount);
>
> How do we know tsk has an io_context?  Use get_io_context() and check
> its result?

I'll fix that up.  It works now only by luck (and the fact that there's
a good chance the journal thread has an i/o context).

>> +	task_unlock(tsk);
>> +	if (!new_cic)
>> +		goto out_dec;
>> +
>> +	spin_lock_irq(q->queue_lock);
>> +
>> +	cfqq = cic_to_cfqq(cic, 1);
>> +	if (!cfqq)
>> +		goto out_unlock;
>> +
>> +	/*
>> +	 * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we
>> +	 * are idling on the last queue in that workload, *and* there are no
>> +	 * potential dependent readers running currently, then go ahead and
>> +	 * yield the queue.
>> +	 */
>
> Comment explains the code, but doesn't explain the *reason* for the code.

Actually, it explains more than just what the code does.  It would be
difficult for one to divine that the code actually only really cares
about breaking up a currently running dependent reader.  I'll see if I
can make that more clear.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer
  2010-06-23  5:04   ` Andrew Morton
@ 2010-06-24  0:46   ` Vivek Goyal
  2010-06-25 16:51     ` Jeff Moyer
  1 sibling, 1 reply; 24+ messages in thread
From: Vivek Goyal @ 2010-06-24  0:46 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, linux-kernel, linux-ext4

On Tue, Jun 22, 2010 at 05:35:00PM -0400, Jeff Moyer wrote:

[..]
> @@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  	cfq_clear_cfqq_wait_request(cfqq);
>  	cfq_clear_cfqq_wait_busy(cfqq);
>  
> +	if (!cfq_cfqq_yield(cfqq)) {
> +		struct cfq_rb_root *st;
> +		st = service_tree_for(cfqq->cfqg,
> +				      cfqq_prio(cfqq), cfqq_type(cfqq));
> +		st->last_expiry = jiffies;
> +		st->last_pid = cfqq->pid;
> +	}
> +	cfq_clear_cfqq_yield(cfqq);

Jeff, I think cfqq is still on service tree at this point of time. If yes,
then we can simply use cfqq->service_tree, instead of calling
service_tree_for().

No clearing of cfqq->yield_to field?

[..]
>  /*
>   * Select a queue for service. If we have a current active queue,
>   * check whether to continue servicing it, or retrieve and set a new one.
> @@ -2187,6 +2232,10 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
>  		 * have been idling all along on this queue and it should be
>  		 * ok to wait for this request to complete.
>  		 */
> +		if (cfq_cfqq_yield(cfqq) &&
> +		    cfq_should_yield_now(cfqq, &new_cfqq))
> +			goto expire;
> +

I think we can get rid of this condition here and move the yield check
above outside above if condition. This if condition waits for request to
complete from this queue and waits for queue to get busy before slice
expiry. If we have decided to yield the queue, there is no point in
waiting for next request for queue to get busy.

>  		if (cfqq->cfqg->nr_cfqq == 1 && RB_EMPTY_ROOT(&cfqq->sort_list)
>  		    && cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
>  			cfqq = NULL;
> @@ -2215,6 +2264,9 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
>  		goto expire;
>  	}
>  
> +	if (cfq_cfqq_yield(cfqq) && cfq_should_yield_now(cfqq, &new_cfqq))
> +		goto expire;
> +

We can move this check up.
 
[..]
> +static void cfq_yield(struct request_queue *q, struct task_struct *tsk)
> +{
> +	struct cfq_data *cfqd = q->elevator->elevator_data;
> +	struct cfq_io_context *cic, *new_cic;
> +	struct cfq_queue *cfqq;
> +
> +	cic = cfq_cic_lookup(cfqd, current->io_context);
> +	if (!cic)
> +		return;
> +
> +	task_lock(tsk);
> +	new_cic = cfq_cic_lookup(cfqd, tsk->io_context);
> +	atomic_long_inc(&tsk->io_context->refcount);
> +	task_unlock(tsk);
> +	if (!new_cic)
> +		goto out_dec;
> +
> +	spin_lock_irq(q->queue_lock);
> +
> +	cfqq = cic_to_cfqq(cic, 1);
> +	if (!cfqq)
> +		goto out_unlock;
> +
> +	/*
> +	 * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we
> +	 * are idling on the last queue in that workload, *and* there are no
> +	 * potential dependent readers running currently, then go ahead and
> +	 * yield the queue.
> +	 */
> +	if (cfqd->active_queue == cfqq &&
> +	    cfqd->serving_type == SYNC_NOIDLE_WORKLOAD) {
> +		/*
> +		 * If there's been no I/O from another process in the idle
> +		 * slice time, then there is by definition no dependent
> +		 * read going on for this service tree.
> +		 */
> +		if (expiry_data_valid(cfqq->service_tree) &&
> +		    time_before(cfqq->service_tree->last_expiry +
> +				cfq_slice_idle, jiffies) &&
> +		    cfqq->service_tree->last_pid != cfqq->pid)
> +			goto out_unlock;
> +	}
> +
> +	cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid);
> +	cfqq->yield_to = new_cic;

We are stashing away a pointer to cic without taking reference?

> +	cfq_mark_cfqq_yield(cfqq);
> +
> +out_unlock:
> +	spin_unlock_irq(q->queue_lock);
> +out_dec:
> +	put_io_context(tsk->io_context);
> +}
> +
>  static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq)
>  {
>  	int dispatched = 0;
> @@ -3123,6 +3234,13 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
>  	if (!cfqq)
>  		return false;
>  
> +	/*
> +	 * If the active queue yielded its timeslice to this queue, let
> +	 * it preempt.
> +	 */
> +	if (cfq_cfqq_yield(cfqq) && RQ_CIC(rq) == cfqq->yield_to)
> +		return true;
> +

I think we need to again if if we are sync-noidle workload then allow
preemption only if no dependent read is currently on, otherwise
sync-noidle service tree loses share.

This version looks much simpler than previous one and is much easier
to understand. I will do some testing on friday and provide you feedback.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-23  9:30 ` Tao Ma
  2010-06-23 13:06   ` Jeff Moyer
@ 2010-06-24  5:54   ` Tao Ma
  2010-06-24 14:56     ` Jeff Moyer
  2010-06-27 13:48     ` Jeff Moyer
  1 sibling, 2 replies; 24+ messages in thread
From: Tao Ma @ 2010-06-24  5:54 UTC (permalink / raw)
  To: Tao Ma
  Cc: Jeff Moyer, axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker,
	Sunil Mushran, ocfs2-devel

[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]

Hi Jeff,

On 06/23/2010 05:30 PM, Tao Ma wrote:
> Hi Jeff,
>
> On 06/23/2010 05:34 AM, Jeff Moyer wrote:
>> Hi,
>>
>> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
>> far worse than that of deadline for enterprise class storage when dealing
>> with file sizes of 8MB or less. I used the following command line as a
>> representative test case:
>>
>> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w
>> 4096 -F
>>
>> When run using the deadline I/O scheduler, an average of the first 5
>> numbers
>> will give you 448.4 files / second. CFQ will yield only 106.7. With
>> this patch series applied (and the two patches I sent yesterday), CFQ now
>> achieves 462.5 files / second.
> which 2 patches? Could you paste the link or the subject? Just want to
> make my test env like yours. ;)
> As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like
> to give it a try and give you some feedback about the test.
I am sorry to say that the patch make jbd2 locked up when I tested 
fs_mark using ocfs2.
I have attached the log from my netconsole server. After I reverted the 
patch [3/3], the box works again.

Regards,
Tao

[-- Attachment #2: lockup.log --]
[-- Type: text/x-log, Size: 4429 bytes --]

 BUG: soft lockup - CPU#0 stuck for 61s! [jbd2/sda11-15:5456] 
 Modules linked in:
  ocfs2
  jbd2
  ocfs2_nodemanager
  configfs
  ocfs2_stackglue
  netconsole
  autofs4
  hidp
  rfcomm
  l2cap
  crc16
  bluetooth
  rfkill
  sunrpc
  ib_iser
  rdma_cm
  ib_cm
  iw_cm
  ib_sa
  ib_mad
  ib_core
  ib_addr
  ipv6
  iscsi_tcp
  libiscsi_tcp
  libiscsi
  scsi_transport_iscsi
  dm_mirror
  dm_region_hash
  dm_log
  dm_multipath
  dm_mod
  video
  output
  sbs
  sbshc
  battery
  acpi_memhotplug
  ac
  lp
  sg
  dcdbas
  sr_mod
  cdrom
  option
  usb_wwan
  usbserial
  serio_raw
  rtc_cmos
  rtc_core
  parport_pc
  parport
  rtc_lib
  snd_hda_codec_analog
  tpm_tis
  tpm
  tpm_bios
  button
  snd_hda_intel
  snd_hda_codec
  snd_seq_dummy
  snd_seq_oss
  snd_seq_midi_event
  snd_seq
  e1000
  tg3
  snd_seq_device
  libphy
  i2c_i801
  snd_pcm_oss
  snd_mixer_oss
  i2c_core
  snd_pcm
  snd_timer
  snd
  soundcore
  snd_page_alloc
  shpchp
  pcspkr
  ata_piix
  libata
  sd_mod
  scsi_mod
  ext3
  jbd
  ehci_hcd
  ohci_hcd
  uhci_hcd
  [last unloaded: microcode]
  
 CPU 0 
  
 Modules linked in:
  ocfs2
  jbd2
  ocfs2_nodemanager
  configfs
  ocfs2_stackglue
  netconsole
  autofs4
  hidp
  rfcomm
  l2cap
  crc16
  bluetooth
  rfkill
  sunrpc
  ib_iser
  rdma_cm
  ib_cm
  iw_cm
  ib_sa
  ib_mad
  ib_core
  ib_addr
  ipv6
  iscsi_tcp
  libiscsi_tcp
  libiscsi
  scsi_transport_iscsi
  dm_mirror
  dm_region_hash
  dm_log
  dm_multipath
  dm_mod
  video
  output
  sbs
  sbshc
  battery
  acpi_memhotplug
  ac
  lp
  sg
  dcdbas
  sr_mod
  cdrom
  option
  usb_wwan
  usbserial
  serio_raw
  rtc_cmos
  rtc_core
  parport_pc
  parport
  rtc_lib
  snd_hda_codec_analog
  tpm_tis
  tpm
  tpm_bios
  button
  snd_hda_intel
  snd_hda_codec
  snd_seq_dummy
  snd_seq_oss
  snd_seq_midi_event
  snd_seq
  e1000
  tg3
  snd_seq_device
  libphy
  i2c_i801
  snd_pcm_oss
  snd_mixer_oss
  i2c_core
  snd_pcm
  snd_timer
  snd
  soundcore
  snd_page_alloc
  shpchp
  pcspkr
  ata_piix
  libata
  sd_mod
  scsi_mod
  ext3
  jbd
  ehci_hcd
  ohci_hcd
  uhci_hcd
  [last unloaded: microcode]
  
  
 Pid: 5456, comm: jbd2/sda11-15 Not tainted 2.6.35-rc3+ #4 0MM599/OptiPlex 745                  
 RIP: 0010:[<ffffffff822fcfe7>] 
  [<ffffffff822fcfe7>] _raw_spin_lock+0xe/0x15 
 RSP: 0018:ffff88012780de78  EFLAGS: 00000297 
 RAX: 0000000000001d1c RBX: ffff88012fbb4000 RCX: 0000000000000000 
 RDX: 0000000000000000 RSI: ffff88012fbd9650 RDI: ffff88012fbb4024 
 RBP: ffffffff820031ce R08: ffff88012780c000 R09: 0000000000000000 
 R10: ffff88012fbd8fa8 R11: 0000000000000000 R12: ffff88013a545880 
 R13: ffffffff8202d2b5 R14: 0000000000000000 R15: ffff880001e13640 
 FS:  0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000 
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
 CR2: 0000000000000000 CR3: 0000000127853000 CR4: 00000000000006f0 
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
 Process jbd2/sda11-15 (pid: 5456, threadinfo ffff88012780c000, task ffff88012fbd8f60) 
 Stack: 
  ffffffffa027b955
  0000000000000000
  ffff88012fbd8f60
  ffffffff8204de27
  
 
  ffff88012780de98
  ffff88012780de98
  ffff88012fbfbae8
  0000000000000292
  
 
  ffff88012780def8
  ffff88012fbb4000
  ffff88012fbfbae0
  ffffffffa027b7fa
  
 Call Trace: 
  [<ffffffffa027b955>] ? kjournald2+0x15b/0x1cf [jbd2] 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffffa027b7fa>] ? kjournald2+0x0/0x1cf [jbd2] 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 Code: 
 e0 
 8d 
 90 
 00 
 01 
 00 
 00 
 75 
 05 
 3e 
 66 
 0f 
 b1 
 17 
 0f 
 94 
 c2 
 0f 
 b6 
 c2 
 85 
 c0 
 0f 
 95 
 c0 
 0f 
 b6 
 c0 
 c3 
 b8 
 00 
 01 
 00 
 00 
 3e 
 66 
 0f 
 c1 
 07 
 38 
 e0 
 74 
 06 
 f3> 
 90 
 8a 
 07 
 eb 
 f6 
 c3 
 9c 
 58 
 fa 
 ba 
 00 
 01 
 00 
 00 
 3e 
 66 
 0f 
 c1 
 17 
 38 
  
 Call Trace: 
  [<ffffffffa027b955>] ? kjournald2+0x15b/0x1cf [jbd2] 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffffa027b7fa>] ? kjournald2+0x0/0x1cf [jbd2] 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-24  5:54   ` Tao Ma
@ 2010-06-24 14:56     ` Jeff Moyer
  2010-06-27 13:48     ` Jeff Moyer
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-24 14:56 UTC (permalink / raw)
  To: Tao Ma
  Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker,
	Sunil Mushran, ocfs2-devel

Tao Ma <tao.ma@oracle.com> writes:

> Hi Jeff,
>
[...]
> I am sorry to say that the patch make jbd2 locked up when I tested
> fs_mark using ocfs2.
> I have attached the log from my netconsole server. After I reverted
> the patch [3/3], the box works again.

Thanks for the report, Tao, I'll try to reproduce it here and get back
to you.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-24  0:46   ` Vivek Goyal
@ 2010-06-25 16:51     ` Jeff Moyer
  2010-06-25 18:55       ` Jens Axboe
  2010-06-25 20:02       ` Vivek Goyal
  0 siblings, 2 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-25 16:51 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, linux-kernel, linux-ext4

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Jun 22, 2010 at 05:35:00PM -0400, Jeff Moyer wrote:
>
> [..]
>> @@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>>  	cfq_clear_cfqq_wait_request(cfqq);
>>  	cfq_clear_cfqq_wait_busy(cfqq);
>>  
>> +	if (!cfq_cfqq_yield(cfqq)) {
>> +		struct cfq_rb_root *st;
>> +		st = service_tree_for(cfqq->cfqg,
>> +				      cfqq_prio(cfqq), cfqq_type(cfqq));
>> +		st->last_expiry = jiffies;
>> +		st->last_pid = cfqq->pid;
>> +	}
>> +	cfq_clear_cfqq_yield(cfqq);
>
> Jeff, I think cfqq is still on service tree at this point of time. If yes,
> then we can simply use cfqq->service_tree, instead of calling
> service_tree_for().

Yup.

> No clearing of cfqq->yield_to field?

Nope.  Again, it's not required, but if you really want me to, I'll add
it.

> [..]
>>  /*
>>   * Select a queue for service. If we have a current active queue,
>>   * check whether to continue servicing it, or retrieve and set a new one.
>> @@ -2187,6 +2232,10 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
>>  		 * have been idling all along on this queue and it should be
>>  		 * ok to wait for this request to complete.
>>  		 */
>> +		if (cfq_cfqq_yield(cfqq) &&
>> +		    cfq_should_yield_now(cfqq, &new_cfqq))
>> +			goto expire;
>> +
>
> I think we can get rid of this condition here and move the yield check
> above outside above if condition. This if condition waits for request to
> complete from this queue and waits for queue to get busy before slice
> expiry. If we have decided to yield the queue, there is no point in
> waiting for next request for queue to get busy.

Yeah, this is a vestige of the older code layout.  Thanks, this cleans
things up nicely.

>> +	cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid);
>> +	cfqq->yield_to = new_cic;
>
> We are stashing away a pointer to cic without taking reference?

There is no reference counting on the cic.

>> @@ -3123,6 +3234,13 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
>>  	if (!cfqq)
>>  		return false;
>>  
>> +	/*
>> +	 * If the active queue yielded its timeslice to this queue, let
>> +	 * it preempt.
>> +	 */
>> +	if (cfq_cfqq_yield(cfqq) && RQ_CIC(rq) == cfqq->yield_to)
>> +		return true;
>> +
>
> I think we need to again if if we are sync-noidle workload then allow
> preemption only if no dependent read is currently on, otherwise
> sync-noidle service tree loses share.

I think you mean don't yield if there is a dependent reader.  Yeah,
makes sense.

> This version looks much simpler than previous one and is much easier
> to understand. I will do some testing on friday and provide you feedback.

Great, thanks again for the review!

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-25 16:51     ` Jeff Moyer
@ 2010-06-25 18:55       ` Jens Axboe
  2010-06-25 19:57         ` Jeff Moyer
  2010-06-25 20:02       ` Vivek Goyal
  1 sibling, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2010-06-25 18:55 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Vivek Goyal, linux-kernel, linux-ext4

On 25/06/10 18.51, Jeff Moyer wrote:
>>> +	cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid);
>>> +	cfqq->yield_to = new_cic;
>>
>> We are stashing away a pointer to cic without taking reference?
> 
> There is no reference counting on the cic.

Not on the cic itself, but on the io context it belongs to. So you
need to grab a reference to that, if you are stowing a reference
to the cic.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-25 18:55       ` Jens Axboe
@ 2010-06-25 19:57         ` Jeff Moyer
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Moyer @ 2010-06-25 19:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Vivek Goyal, linux-kernel, linux-ext4

Jens Axboe <axboe@kernel.dk> writes:

> On 25/06/10 18.51, Jeff Moyer wrote:
>>>> +	cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid);
>>>> +	cfqq->yield_to = new_cic;
>>>
>>> We are stashing away a pointer to cic without taking reference?
>> 
>> There is no reference counting on the cic.
>
> Not on the cic itself, but on the io context it belongs to. So you
> need to grab a reference to that, if you are stowing a reference
> to the cic.

OK, easy enough.  Thanks!

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
  2010-06-25 16:51     ` Jeff Moyer
  2010-06-25 18:55       ` Jens Axboe
@ 2010-06-25 20:02       ` Vivek Goyal
  1 sibling, 0 replies; 24+ messages in thread
From: Vivek Goyal @ 2010-06-25 20:02 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, linux-kernel, linux-ext4

On Fri, Jun 25, 2010 at 12:51:58PM -0400, Jeff Moyer wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > On Tue, Jun 22, 2010 at 05:35:00PM -0400, Jeff Moyer wrote:
> >
> > [..]
> >> @@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
> >>  	cfq_clear_cfqq_wait_request(cfqq);
> >>  	cfq_clear_cfqq_wait_busy(cfqq);
> >>  
> >> +	if (!cfq_cfqq_yield(cfqq)) {
> >> +		struct cfq_rb_root *st;
> >> +		st = service_tree_for(cfqq->cfqg,
> >> +				      cfqq_prio(cfqq), cfqq_type(cfqq));
> >> +		st->last_expiry = jiffies;
> >> +		st->last_pid = cfqq->pid;
> >> +	}
> >> +	cfq_clear_cfqq_yield(cfqq);
> >
> > Jeff, I think cfqq is still on service tree at this point of time. If yes,
> > then we can simply use cfqq->service_tree, instead of calling
> > service_tree_for().
> 
> Yup.
> 
> > No clearing of cfqq->yield_to field?
> 
> Nope.  Again, it's not required, but if you really want me to, I'll add
> it.

I think clearing up is better as it leaves no scope for confusion.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-24  5:54   ` Tao Ma
  2010-06-24 14:56     ` Jeff Moyer
@ 2010-06-27 13:48     ` Jeff Moyer
  2010-06-28  6:41       ` Tao Ma
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Moyer @ 2010-06-27 13:48 UTC (permalink / raw)
  To: Tao Ma
  Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker,
	Sunil Mushran, ocfs2-devel

Tao Ma <tao.ma@oracle.com> writes:

> Hi Jeff,
>
> On 06/23/2010 05:30 PM, Tao Ma wrote:
>> Hi Jeff,
>>
>> On 06/23/2010 05:34 AM, Jeff Moyer wrote:
>>> Hi,
>>>
>>> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
>>> far worse than that of deadline for enterprise class storage when dealing
>>> with file sizes of 8MB or less. I used the following command line as a
>>> representative test case:
>>>
>>> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w
>>> 4096 -F
>>>
>>> When run using the deadline I/O scheduler, an average of the first 5
>>> numbers
>>> will give you 448.4 files / second. CFQ will yield only 106.7. With
>>> this patch series applied (and the two patches I sent yesterday), CFQ now
>>> achieves 462.5 files / second.
>> which 2 patches? Could you paste the link or the subject? Just want to
>> make my test env like yours. ;)
>> As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like
>> to give it a try and give you some feedback about the test.
> I am sorry to say that the patch make jbd2 locked up when I tested
> fs_mark using ocfs2.
> I have attached the log from my netconsole server. After I reverted
> the patch [3/3], the box works again.

I can't reproduce this, unfortunately.  Also, when building with the
.config you sent me, the disassembly doesn't line up with the stack
trace you posted.

I'm not sure why yielding the queue would cause a deadlock.  The only
explanation I can come up with is that I/O is not being issued.  I'm
assuming that no other I/O will be completed to the file system in
question.  Is that right?  Could you send along the output from sysrq-t?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-27 13:48     ` Jeff Moyer
@ 2010-06-28  6:41       ` Tao Ma
  2010-06-28 13:58         ` Jeff Moyer
  2010-06-29 14:56         ` Jeff Moyer
  0 siblings, 2 replies; 24+ messages in thread
From: Tao Ma @ 2010-06-28  6:41 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker,
	Sunil Mushran, ocfs2-devel

[-- Attachment #1: Type: text/plain, Size: 2226 bytes --]

Hi Jeff,

On 06/27/2010 09:48 PM, Jeff Moyer wrote:
> Tao Ma<tao.ma@oracle.com>  writes:
>
>> Hi Jeff,
>>
>> On 06/23/2010 05:30 PM, Tao Ma wrote:
>>> Hi Jeff,
>>>
>>> On 06/23/2010 05:34 AM, Jeff Moyer wrote:
>>>> Hi,
>>>>
>>>> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is
>>>> far worse than that of deadline for enterprise class storage when dealing
>>>> with file sizes of 8MB or less. I used the following command line as a
>>>> representative test case:
>>>>
>>>> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w
>>>> 4096 -F
>>>>
>>>> When run using the deadline I/O scheduler, an average of the first 5
>>>> numbers
>>>> will give you 448.4 files / second. CFQ will yield only 106.7. With
>>>> this patch series applied (and the two patches I sent yesterday), CFQ now
>>>> achieves 462.5 files / second.
>>> which 2 patches? Could you paste the link or the subject? Just want to
>>> make my test env like yours. ;)
>>> As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like
>>> to give it a try and give you some feedback about the test.
>> I am sorry to say that the patch make jbd2 locked up when I tested
>> fs_mark using ocfs2.
>> I have attached the log from my netconsole server. After I reverted
>> the patch [3/3], the box works again.
>
> I can't reproduce this, unfortunately.  Also, when building with the
> .config you sent me, the disassembly doesn't line up with the stack
> trace you posted.
>
> I'm not sure why yielding the queue would cause a deadlock.  The only
> explanation I can come up with is that I/O is not being issued.  I'm
> assuming that no other I/O will be completed to the file system in
> question.  Is that right?  Could you send along the output from sysrq-t?
yes, I just mounted it and begin the test, so there should be no 
outstanding I/O. So do you need me to setup another disk for test?
I have attached the sysrq output in sysrq.log. please check.

btw, I also met with a NULL pointer deference in cfq_yield. I have 
attached the null.log also. This seems to be related to the previous 
deadlock and happens when I try to remount the same volume after reboot 
and ocfs2 try to do some recovery.

Regards,
Tao

[-- Attachment #2: sysrq.log --]
[-- Type: text/x-log, Size: 33613 bytes --]

 SysRq : 
 Show State 
   task                        PC stack   pid father 
 init          R
   running task    
     0     1      0 0x00000000 
  ffff88013badb510
  0000000000000082
  ffffffff8271b020
  0000000001e0ef88
  
  0000000000000286
  ffffffff82050937
  0000000000000286
  ffffffff820507e7
  
  0000000000000286
  ffff88013badb510
  00000000004c4b3f
  ffff88013baddad8
  
 Call Trace: 
  [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 
  [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b 
  [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb 
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff820d053c>] ? do_select+0x46c/0x4ed 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff820339ce>] ? __wake_up+0x30/0x44 
  [<ffffffffa00245fa>] ? journal_stop+0x20f/0x222 [jbd] 
  [<ffffffff821698f2>] ? number+0x121/0x223 
  [<ffffffffa0043d2e>] ? __ext3_journal_stop+0x1f/0x3d [ext3] 
  [<ffffffffa003dd48>] ? ext3_writeback_write_end+0x94/0xb7 [ext3] 
  [<ffffffff8216b769>] ? vsnprintf+0x3e4/0x421 
  [<ffffffff820d217d>] ? __d_lookup+0xb7/0xf6 
  [<ffffffff820ca799>] ? do_lookup+0x89/0x1e2 
  [<ffffffff820d164b>] ? dput+0x27/0x116 
  [<ffffffff820ccc4c>] ? link_path_walk+0x90a/0x930 
  [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa 
  [<ffffffff820cdc25>] ? user_path_at+0x52/0x79 
  [<ffffffff820c5d05>] ? cp_new_stat+0xea/0x101 
  [<ffffffff8203b7e7>] ? timespec_add_safe+0x37/0x66 
  [<ffffffff820d0bda>] ? sys_select+0x9a/0xc3 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
 kthreadd      S
  0000000000000000 
     0     2      0 0x00000000 
  ffff88013badae20
  0000000000000046
  ffff88013747b550
  00000000ffffffff
  
  ffffffff82003610
  0000000000000010
  0000000000000202
  0000000000000000
  
  0000000000000018
  ffff88013badae20
  ffff880128015ac8
  0000000000000000
  
 Call Trace: 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  [<ffffffff8204dcbf>] ? kthreadd+0x72/0xeb 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204dc4d>] ? kthreadd+0x0/0xeb 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 ksoftirqd/0   S
  0000000000000000 
     0     3      2 0x00000000 
  ffff88013bada730
  0000000000000046
  ffff88013bae8e60
  0000000000013640
  
  ffff88013bada730
  0000000000000000
  ffffffff8272bbe0
  ffffffff822fb8de
  
  ffff88013badb510
  0000000000000000
  ffffffff82791c00
  ffff88013baddd88
  
 Call Trace: 
  [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 
  [<ffffffff8203c2a6>] ? run_ksoftirqd+0x0/0x116 
  [<ffffffff8203c2ed>] ? run_ksoftirqd+0x47/0x116 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 migration/0   S
  0000000000000000 
     0     4      2 0x00000000 
  ffff88013bada040
  0000000000000046
  ffff88013764cea0
  000000008205d592
  
  000000033badb510
  ffff88013abbfe68
  ffffffff820651c6
  ffff88013abbfe70
  
  0000000000000282
  ffff880001e0f000
  ffffffff820651c6
  ffff88013abbfed8
  
 Call Trace: 
  [<ffffffff820651c6>] ? stop_machine_cpu_stop+0x0/0x8f 
  [<ffffffff820651c6>] ? stop_machine_cpu_stop+0x0/0x8f 
  [<ffffffff82065696>] ? cpu_stopper_thread+0x119/0x13f 
  [<ffffffff8202d2b5>] ? finish_task_switch+0x33/0x80 
  [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 
  [<ffffffff8206557d>] ? cpu_stopper_thread+0x0/0x13f 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 watchdog/0    R
   running task    
     0     5      2 0x00000000 
  ffff88013bae9550
  0000000000000046
  ffffffff8271b020
  0000000000000086
  
  01ff88013bae9550
  ffffffff82051a4d
  0000000000000286
  0000000000000286
  
  000000000000f070
  ffff88013bae9550
  0000000000000000
  ffff88013baddd98
  
 Call Trace: 
  [<ffffffff82051a4d>] ? sched_clock_local+0x9/0x6c 
  [<ffffffff8206f29a>] ? watchdog+0x0/0x9e 
  [<ffffffff8206f2e9>] ? watchdog+0x4f/0x9e 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 events/0      R
   running task    
     0     6      2 0x00000000 
  ffff88013bae8e60
  0000000000000046
  ffff88013b032ab0
  0000000082042340
  
  ffff88013baf1e90
  ffff880001e16318
  0000000000000001
  ffffffff8204dff0
  
  0000000000000246
  ffff88013baf1ef8
  ffffffff82747420
  ffff880001e16300
  
 Call Trace: 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff821dc54f>] ? console_callback+0x0/0xf5 
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 khelper       S
  ffffffff82049cae 
     0     7      2 0x00000000 
  ffff88013bae8770
  0000000000000046
  ffff88013747b550
  0000000000000000
  
  ffff88013baf3e90
  ffff880001e16398
  0000000000000001
  ffffffff8204dff0
  
  ffff880001e16380
  ffff88013baf3ef8
  ffff88012f9872c0
  ffff880001e16380
  
 Call Trace: 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff82049cae>] ? __call_usermodehelper+0x0/0x6f 
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 async/mgr     S
  0000000000000000 
     0    45      2 0x00000000 
  ffff88013bb62970
  0000000000000046
  ffff8801374be9b0
  000000008202e0e1
  
  ffffffff8272bbe0
  000000008202d2b5
  0000000000000003
  0000000000013640
  
  0000000000000282
  0000000000000001
  0000000000000001
  ffff88013badddb8
  
 Call Trace: 
  [<ffffffff82052a50>] ? async_manager_thread+0x0/0xe8 
  [<ffffffff82052b12>] ? async_manager_thread+0xc2/0xe8 
  [<ffffffff8202e0f3>] ? default_wake_function+0x0/0x9 
  [<ffffffff82052a50>] ? async_manager_thread+0x0/0xe8 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 sync_supers   R
   running task    
     0   153      2 0x00000000 
  ffff88013b048040
  0000000000000046
  ffffffff8271b020
  0000000000000000
  
  ffffffff8272bbe0
  ffffffff822fb8de
  ffff88013badb510
  0000000082027cc9
  
  000000033b093610
  ffff88013b095ef8
  0000000000000000
  ffff88013baddda8
  
 Call Trace: 
  [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 
  [<ffffffff820a1e15>] ? bdi_sync_supers+0x0/0x54 
  [<ffffffff820a1e54>] ? bdi_sync_supers+0x3f/0x54 
  [<ffffffff820a1e15>] ? bdi_sync_supers+0x0/0x54 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 bdi-default   R
   running task    
     0   155      2 0x00000000 
  ffff88013b048e20
  0000000000000046
  ffffffff8271b020
  00000000ffffffff
  
  ffff88013b08ddf0
  00000000fffda363
  0000000000000286
  ffffffff820420c2
  
  0000000000000286
  ffffffff82894dc0
  ffffffff82894dc0
  00000000fffdb6eb
  
 Call Trace: 
  [<ffffffff820420c2>] ? try_to_del_timer_sync+0xa4/0xaf 
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff82042139>] ? process_timeout+0x0/0x5 
  [<ffffffff820a246f>] ? bdi_forker_task+0x160/0x295 
  [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 
  [<ffffffff820a230f>] ? bdi_forker_task+0x0/0x295 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 kblockd/0     S
  ffffffff8216160d 
     0   156      2 0x00000000 
  ffff88013b049510
  0000000000000046
  ffff88013747b550
  000000003a1caa20
  
  ffff88013b081e90
  ffff880001e16498
  0000000000000001
  ffffffff8204dff0
  
  ffff88013a1caa20
  ffff88013b081ef8
  ffff88013a2a5b48
  ffff880001e16480
  
 Call Trace: 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8216160d>] ? cfq_kick_queue+0x0/0x3a 
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 kacpid        S
  ffffffff82198b60 
     0   158      2 0x00000000 
  ffff88013b03caf0
  0000000000000046
  ffff88013b03d1e0
  000000008202cfe7
  
  ffff88013bbcfe90
  ffff880001e16518
  0000000000000001
  ffffffff8204dff0
  
  ffffffff8272bbe0
  ffff88013bbcfef8
  ffff88013bbb6620
  ffff880001e16500
  
 Call Trace: 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff82198b60>] ? bind_to_cpu0+0x0/0x22 
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 kacpi_notify  S
  ffffffff82198b60 
     0   159      2 0x00000000 
  ffff88013b03d1e0
  0000000000000046
  ffff88013b03d8d0
  000000008202cfe7
  
  ffff88013b021e90
  ffff880001e16598
  0000000000000001
  ffffffff8204dff0
  
  ffffffff8272bbe0
  ffff88013b021ef8
  ffff88013bbb6620
  ffff880001e16580
  
 Call Trace: 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff82198b60>] ? bind_to_cpu0+0x0/0x22 
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 kacpi_hotplug S
  ffffffff82198b60 
     0   160      2 0x00000000 
  ffff88013b03d8d0
  0000000000000046
  ffff88013badb510
  000000008202cfe7
  
  ffff88013b023e90
  ffff880001e16618
  0000000000000001
  ffffffff8204dff0
  
  ffffffff8272bbe0
  ffff88013b023ef8
  ffff88013bbb6620
  ffff880001e16600
  
 Call Trace: 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff82198b60>] ? bind_to_cpu0+0x0/0x22 
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 khubd         S
  0000000000000001 
     0   245      2 0x00000000 
  ffff88013b052770
  0000000000000046
  ffff88013bb63060
  0000000000000004
  
  ffff88013bbd9e70
  ffffffff82752610
  0000000000000001
  ffffffff8204dff0
  
  ffff88013a107230
  ffffffff82752600
  0000000000000000
  ffff88013a107200
  
 Call Trace: 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8221b6c0>] ? hub_thread+0xcee/0xddc 
  [<ffffffff8202d2b5>] ? finish_task_switch+0x33/0x80 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8221a9d2>] ? hub_thread+0x0/0xddc 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 kseriod       S
  ffff88013b0ab5c0 
     0   248      2 0x00000000 
  ffff88013b088100
  0000000000000046
  ffff88013bb63060
  00000000821f3499
  
  ffff88013b0b1e90
  ffffffff82753c80
  0000000000000001
  ffffffff8204dff0
  
 Call Trace: 
  [<ffffffff8222cc37>] ? serio_thread+0x0/0x2c5 
  0000000000000000 
  ffffffff820420c2
  ffffffff82894dc0
  [<ffffffff82042139>] ? process_timeout+0x0/0x5 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
 kswapd0       S
  ffffffff8204dff0
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8209b521>] ? kswapd+0x0/0x591 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
     0   305      2 0x00000000 
  
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  0000000000000001
 Call Trace: 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  0000000000000000
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  ffff880001e1ab98
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  0000000000000000 
  
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  ffff88013bb4cfe0
 Call Trace: 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  ffff88013a2a5010
  [<ffffffffa0064d38>] ? scsi_error_handler+0x0/0x576 [scsi_mod] 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  0000000000000293
  ffff88013a342800
  [<ffffffffa0064d38>] ? scsi_error_handler+0x0/0x576 [scsi_mod] 
  ffff88013b053550
  ffff88013a342000
  [<ffffffffa0064d38>] ? scsi_error_handler+0x0/0x576 [scsi_mod] 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  000000003a383e18
  00000000fffdb6c4
  [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  0000000000000046
  0000000000000000
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  0000000000000400
  ffffffff82894dc0
  [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  ffff88013b0887f0
  0000000000000046
 Call Trace: 
  [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
     0   555      2 0x00000000 
  0000000000000286
  [<ffffffff82042139>] ? process_timeout+0x0/0x5 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  ffff88013baf9590
  
  [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
 flush-1:6     R
  0000000000000000
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  0000000000000046
  00000000fffdb6cf
  [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 
  [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 flush-1:8     R
   running task    
  000000003a3cfe18
  
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 
 flush-1:9     R
  0000000000000000
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff820dc772>] ? bdi_writeback_task+0x79/0x104 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  000000003a3d5e18
  00000000fffdb6d6
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  0000000000000000
  00000000fffdb6d6
  [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  0000000000000046
  0000000000000000
 Call Trace: 
  [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  00000000fffda355
  
  [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 
   running task    
  
  [<ffffffff820dc772>] ? bdi_writeback_task+0x79/0x104 
   running task    
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  ffff88013a3ee590
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  ffff88013b01c340
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  0000000000000000
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff82133275>] ? cap_file_permission+0x0/0x3 
  [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 
     0   638      2 0x00000000 
  
 Call Trace: 
  [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 
  [<ffffffff82042139>] ? process_timeout+0x0/0x5 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  ffff88013b027160
  ffff88013abe37c0
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  0000000000000046
  
 Call Trace: 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  ffff88013747ae60
  ffff880001e1b380
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  ffffffff8204dff0
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  
 Call Trace: 
  [<ffffffffa002907d>] ? kjournald+0x0/0x1f4 [jbd] 
  0000000000000000 
  
  
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  
  
  [<ffffffffa0398550>] ? process_req+0x0/0x12d [ib_addr] 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  0000000000000046
  0000000000000001
 Call Trace: 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  0000000000000000 
  
  
  [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  
  
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
     0  2512      2 0x00000000 
  
 Call Trace: 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
  0000000000000286
 Call Trace: 
  [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  ffffffff82050937
 Call Trace: 
  [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 
  [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b 
  [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb 
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff8216b8b0>] ? snprintf+0x44/0x4c 
  [<ffffffff821072bd>] ? proc_flush_task+0x130/0x25a 
  [<ffffffff82002807>] ? system_call_after_swapgs+0x17/0x65 
  
  
  [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b 
  [<ffffffff820d053c>] ? do_select+0x46c/0x4ed 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff822642d5>] ? sock_aio_write+0xf5/0x10d 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  ffffffff8289a010
  [<ffffffff82058213>] ? futex_wait_queue_me+0xb6/0xd1 
  [<ffffffff8209167e>] ? __generic_file_aio_write+0x338/0x3a2 
  [<ffffffff820e01b3>] ? vfs_statfs+0x5b/0x76 
  0000000000000086
  [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff820c2986>] ? do_sync_read+0xab/0xeb 
  ffff88013764cea0
 Call Trace: 
  [<ffffffff8204dd48>] ? bit_waitqueue+0x10/0xa0 
  [<ffffffff821524a5>] ? elv_insert+0x10e/0x1c0 
  [<ffffffff82160575>] ? cfq_close_cooperator+0xd3/0x163 
  [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 
     0  3158      1 0x00000080 
  
  [<ffffffff8210d843>] ? kmsg_read+0x44/0x4e 
  [<ffffffff820c342d>] ? sys_read+0x45/0x6e 
  ffff88013bb035d0
  
  [<ffffffff82299263>] ? ip_cork_release+0x2e/0x3b 
  [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e 
  [<ffffffff82264670>] ? sock_sendmsg+0xc0/0xdb 
  [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 
  
  
  [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff822d8f16>] ? unix_dgram_sendmsg+0x3eb/0x473 
  [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa 
  [<ffffffff82266086>] ? sys_sendto+0xf3/0x127 
  [<ffffffff820d0bda>] ? sys_select+0x9a/0xc3 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
 rpciod/0      S
  0000000000000000 
  ffff88012f42be90
 Call Trace: 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  ffffffff8271b020
  ffffffff82894dc0
  [<ffffffff82042139>] ? process_timeout+0x0/0x5 
  [<ffffffff8202e0f3>] ? default_wake_function+0x0/0x9 
  ffff88013764cea0
  
  [<ffffffff82031caa>] ? check_preempt_wakeup+0xed/0x14b 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  0000000000000286
  [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb 
  [<ffffffff8226a8fa>] ? skb_queue_tail+0x17/0x3e 
  [<ffffffff822634e4>] ? sock_ioctl+0x207/0x215 
  
  [<ffffffff822fbc74>] ? schedule_timeout+0x21/0x1e0 
  [<ffffffff82091f4a>] ? filemap_fault+0xb6/0x310 
     0  3402      2 0x00000080 
  
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  ffff88012faafe98
  [<ffffffffa023f268>] ? rfcomm_run+0x190/0x1198 [rfcomm] 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff82054c34>] ? ktime_get_ts+0x68/0xb1 
  [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa 
     0  3472      1 0x00000080 
  
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
     0  3476      1 0x00000080 
  
  
  [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff822d8982>] ? unix_wait_for_peer+0x9e/0xaa 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8226a8fa>] ? skb_queue_tail+0x17/0x3e 
  [<ffffffff822d8f16>] ? unix_dgram_sendmsg+0x3eb/0x473 
  [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  00007f92897f2010
 Call Trace: 
  [<ffffffff820451e1>] ? __dequeue_signal+0x15/0x11b 
  [<ffffffff820c3ba4>] ? fput+0x1b0/0x1e1 
  ffff88013baf8ea0
  ffff88012fadbb88
  [<ffffffff82058324>] ? futex_wait+0xf6/0x230 
  [<ffffffff820598ac>] ? do_futex+0xa4/0xb1c 
  [<ffffffff820a5dd7>] ? handle_mm_fault+0x754/0x7b3 
     0  3504      1 0x00000080 
  
 Call Trace: 
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
  [<ffffffff8202518b>] ? flush_tlb_page+0x4f/0x6e 
  [<ffffffff8205a43f>] ? sys_futex+0x11b/0x139 
  000000003abe30c0
  
  [<ffffffff8202d2d0>] ? finish_task_switch+0x4e/0x80 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff82095445>] ? get_page_from_freelist+0x425/0x577 
  [<ffffffff820a3c2a>] ? __do_fault+0x3a4/0x3e5 
  [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 
  ffff88013baf8ea0
  0000000000000000
  [<ffffffff82074a5c>] ? delayacct_end+0x74/0x7f 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff820598d2>] ? do_futex+0xca/0xb1c 
  [<ffffffff820c1708>] ? __dentry_open+0x18d/0x278 
  [<ffffffff8202d2b5>] ? finish_task_switch+0x33/0x80 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  000280da00000010
 Call Trace: 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff82090947>] ? find_get_page+0x18/0x78 
  [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 
     0  3544      1 0x00000080 
  
  [<ffffffff822fd150>] ? _raw_spin_lock_bh+0x9/0x1f 
  [<ffffffff8229f0d8>] ? inet_csk_accept+0xb6/0x205 
  [<ffffffff8206bf98>] ? audit_syscall_entry+0x1bd/0x1e8 
  
  00000000004c4b3f
  [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff8209583e>] ? __alloc_pages_nodemask+0x104/0x568 
  [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  ffff88013a08a8f0
  
 Call Trace: 
  [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb 
  [<ffffffffa003c0d0>] ? ext3_get_blocks_handle+0x9b/0x8df [ext3] 
  [<ffffffff82090947>] ? find_get_page+0x18/0x78 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffffa003dd48>] ? ext3_writeback_write_end+0x94/0xb7 [ext3] 
  [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa 
  [<ffffffff8202511d>] ? flush_tlb_mm+0x51/0x70 
     0  3615      1 0x00000080 
  
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffffa003dd48>] ? ext3_writeback_write_end+0x94/0xb7 [ext3] 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  ffff88012fb99ad8
  [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb 
  [<ffffffff8226a8fa>] ? skb_queue_tail+0x17/0x3e 
     0  3657      1 0x00000080 
 Call Trace: 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff820d910d>] ? seq_printf+0x67/0x8f 
  ffff88013a08b6d0
 Call Trace: 
  0000000000000286
  [<ffffffff82095445>] ? get_page_from_freelist+0x425/0x577 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff82090947>] ? find_get_page+0x18/0x78 
  [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 
  ffffffff82050937
  [<ffffffff822fc66f>] ? do_nanosleep+0x73/0xa7 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  ffffffff82050937
 Call Trace: 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 
  [<ffffffffa0040503>] ? ext3_find_entry+0x4eb/0x586 [ext3] 
  [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa 
 atd           S
  ffffffff822ffed3
  
  [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 
  [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 
  [<ffffffff822fc66f>] ? do_nanosleep+0x73/0xa7 
  [<ffffffff820509ea>] ? hrtimer_nanosleep+0x86/0xf1 
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
  [<ffffffff82050aab>] ? sys_nanosleep+0x56/0x6c 
  0000000001e0ef88
  
  [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 
  [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e 
  [<ffffffff820598d2>] ? do_futex+0xca/0xb1c 
  [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 
  [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 
  0000000001e0ef88
  
  [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb 
  [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e 
  
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  0000000000000082
 Call Trace: 
  [<ffffffff822643ee>] ? sock_aio_read+0x101/0x119 
  [<ffffffff8216ca4e>] ? __strncpy_from_user+0x28/0x41 
     0  3818      1 0x00000080 
  0000000000000003
  [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff82164448>] ? cpumask_any_but+0x27/0x33 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  0000000000000001
  0000000000000003
  [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb 
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff82164448>] ? cpumask_any_but+0x27/0x33 
  [<ffffffff820a5dd7>] ? handle_mm_fault+0x754/0x7b3 
  [<ffffffff820c2de6>] ? do_readv_writev+0x182/0x197 
  [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 
  
 Call Trace: 
  [<ffffffff8223449a>] ? evdev_read+0xc7/0x200 
 hald-addon-ke S
  
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  ffff88013b032ab0
  ffff88013a5fa000
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
   running task    
  
  
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
     0  3890      1 0x00000080 
  
  
  [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b 
  [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb 
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
  [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 
  [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e 
  [<ffffffff820a0f11>] ? zone_statistics+0x3a/0x5b 
  [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 
  [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 
 Call Trace: 
  [<ffffffff820585e1>] ? futex_wake+0xf2/0x104 
  [<ffffffff820d6dc9>] ? mntput_no_expire+0x1c/0x97 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  0000000000000286
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff820a3c2a>] ? __do_fault+0x3a4/0x3e5 
  [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 
  
  [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 
  [<ffffffff82090947>] ? find_get_page+0x18/0x78 
  [<ffffffff820d0bda>] ? sys_select+0x9a/0xc3 
  
  [<ffffffff8204df83>] ? prepare_to_wait_exclusive+0x35/0x6d 
  [<ffffffff822d9652>] ? unix_accept+0x54/0x101 
  
  
  [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 
  [<ffffffff820d0940>] ? pollwake+0x0/0x54 
  [<ffffffff82091f4a>] ? filemap_fault+0xb6/0x310 
  [<ffffffff8203b7e7>] ? timespec_add_safe+0x37/0x66 
  ffff88013a475060
  ffff88012f6c7eb8
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff820ce623>] ? sys_fcntl+0x2f4/0x48d 
  
 Call Trace: 
  [<ffffffff8216c84d>] ? __put_user_4+0x1d/0x30 
 saslauthd     S
  ffffffff8204dff0
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  
  
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffff820f2070>] ? fcntl_setlk+0x1a6/0x25f 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8216c84d>] ? __put_user_4+0x1d/0x30 
  [<ffffffff820ce623>] ? sys_fcntl+0x2f4/0x48d 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  
  
  [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 
  [<ffffffff82050aab>] ? sys_nanosleep+0x56/0x6c 
  ffff88013a08a200
  ffff88013bb58240
  [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  0000001300000000
  [<ffffffff820a3c2a>] ? __do_fault+0x3a4/0x3e5 
  [<ffffffff821ce324>] ? n_tty_read+0x3e9/0x6b2 
  [<ffffffff821caf36>] ? tty_read+0x6f/0xbb 
     0  4071      1 0x00000080 
  ffff8801374da800
  [<ffffffff822fbc74>] ? schedule_timeout+0x21/0x1e0 
  [<ffffffff821cfe10>] ? tty_ldisc_ref_wait+0x10/0x91 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  00000000821db81b
  ffff88012f758800
  [<ffffffff8204ac92>] ? flush_work+0xe/0x7e 
  [<ffffffff821caf36>] ? tty_read+0x6f/0xbb 
  [<ffffffff820c342d>] ? sys_read+0x45/0x6e 
  
  
  [<ffffffff820420c2>] ? try_to_del_timer_sync+0xa4/0xaf 
  [<ffffffff821cfe10>] ? tty_ldisc_ref_wait+0x10/0x91 
  [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 
  0000001300000000
 Call Trace: 
  [<ffffffff8204ac92>] ? flush_work+0xe/0x7e 
  [<ffffffff821cfe10>] ? tty_ldisc_ref_wait+0x10/0x91 
     0  4085   4069 0x00000080 
  
  [<ffffffff82039518>] ? session_of_pgrp+0xe/0x37 
  [<ffffffff8203a215>] ? do_wait+0x1ba/0x20e 
     0  4134      2 0x00000080 
  
  [<ffffffffa04954dc>] ? ocfs2_truncate_log_worker+0x0/0x140 [ocfs2] 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 o2quot/0      S
 Call Trace: 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
 jbd2/sda11-15 R
   running task    
  
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  ffff88013bb6f0a0
  0000000000000000
  [<ffffffff8203a2f9>] ? sys_wait4+0x90/0xab 
  ffffffffa00bac05
  [<ffffffff82034d46>] ? __cond_resched+0x1d/0x26 
  [<ffffffff822ffd62>] ? do_page_fault+0x1b3/0x346 
  [<ffffffff82029bf1>] ? update_curr+0x77/0xe4 
  [<ffffffffa005fcde>] ? scsi_done+0x0/0x41 [scsi_mod] 
  [<ffffffff82030fa5>] ? enqueue_task_fair+0x213/0x23b 
  [<ffffffffa02fbdb3>] ? jbd2_journal_start+0x90/0xbf [jbd2] 
  [<ffffffff820911fb>] ? filemap_write_and_wait_range+0x3b/0x4a 
   .sysctl_sched_features                   : 15471 
   .nr_uninterruptible            : 0 
   .cpu_load[1]                   : 55559 
   .sched_goidle                  : 11896 
  
   .max_vruntime                  : 19789.211500 
   .nr_spread_over                : 9 
   .rt_throttled                  : 0 
  
 ---------------------------------------------------------------------------------------------------------- 
  
     19786.211500         0.102040    152126.157713
     19786.211500         0.184139    152631.345425
  
  
     19786.211500         0.048046    147560.772980
     19786.211500         0.051120    147561.204512
     19786.211500         0.047465    147561.197120
  
 last message repeated 3 times
     19786.211500         0.036472    147561.895980
  
          iscsid  2520     19786.211500       494   110 
  
  
     19786.211500         0.536960    117158.277565
     19786.211500         0.015188    120124.852286
          python  3549     19786.211500        24   120 
        sendmail  3657     19786.211500        27   120 
  
  
     19786.211500        18.205067    106358.280463
     19786.211500         1.547554    105330.629658
     19786.211500         1.231707    105331.170248
   jbd2/sda11-15  4137     89173.901514         2   120 

[-- Attachment #3: null.log --]
[-- Type: text/x-log, Size: 4153 bytes --]

 BUG: unable to handle kernel 
 NULL pointer dereference
  at (null) 
 IP:
  [<ffffffff82161537>] cfq_yield+0x5f/0x135 
 PGD 1342ff067 
 PUD 1342ee067 
 PMD 0 
  
 Oops: 0002 [#1] 
 SMP 
  
 last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:02.0/irq 
 CPU 0 
  
 Modules linked in:
  ocfs2
  jbd2
  ocfs2_nodemanager
  configfs
  ocfs2_stackglue
  netconsole
  autofs4
  hidp
  rfcomm
  l2cap
  crc16
  bluetooth
  rfkill
  sunrpc
  ib_iser
  rdma_cm
  ib_cm
  iw_cm
  ib_sa
  ib_mad
  ib_core
  ib_addr
  ipv6
  iscsi_tcp
  libiscsi_tcp
  libiscsi
  scsi_transport_iscsi
  dm_mirror
  dm_region_hash
  dm_log
  dm_multipath
  dm_mod
  video
  output
  sbs
  sbshc
  battery
  acpi_memhotplug
  ac
  lp
  sg
  dcdbas
  sr_mod
  cdrom
  option
  usb_wwan
  usbserial
  serio_raw
  parport_pc
  parport
  rtc_cmos
  rtc_core
  rtc_lib
  snd_hda_codec_analog
  tpm_tis
  tpm
  tpm_bios
  button
  snd_hda_intel
  snd_hda_codec
  snd_seq_dummy
  tg3
  snd_seq_oss
  snd_seq_midi_event
  snd_seq
  e1000
  snd_seq_device
  libphy
  snd_pcm_oss
  snd_mixer_oss
  i2c_i801
  i2c_core
  snd_pcm
  snd_timer
  snd
  soundcore
  snd_page_alloc
  shpchp
  pcspkr
  ata_piix
  libata
  sd_mod
  scsi_mod
  ext3
  jbd
  ehci_hcd
  ohci_hcd
  uhci_hcd
  [last unloaded: microcode]
  
  
 Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745                  
 RIP: 0010:[<ffffffff82161537>] 
  [<ffffffff82161537>] cfq_yield+0x5f/0x135 
 RSP: 0018:ffff880123061c60  EFLAGS: 00010246 
 RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0 
 RDX: ffff8801255953d8 RSI: 0000000000000000 RDI: ffff88013a2a3800 
 RBP: ffff88012c3ba770 R08: ffff88012c3a30b8 R09: ffff880001e136a0 
 R10: ffff88012c3bb598 R11: ffff88012c3a30c8 R12: ffff88013a2a3800 
 R13: ffff88013a1d0a20 R14: 0000000000000000 R15: 0000000000000000 
 FS:  0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000 
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
 CR2: 0000000000000000 CR3: 00000001342da000 CR4: 00000000000006f0 
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
 Process ocfs2_wq (pid: 4130, threadinfo ffff880123060000, task ffff88013ab38240) 
 Stack: 
  ffff88012c3a3000
  ffff88012c3a3024
  ffff88013414e580
  ffff8801255953d8
  
 
  0000000000000003
  ffffffffa030e260
  ffff88012c3a3000
  0000000000000003
  
 
  ffff8801255953d8
  0000000000000003
  0000000000000282
  ffffffffa030db11
  
 Call Trace: 
  [<ffffffffa030e260>] ? jbd2_log_wait_commit+0x3c/0x10e [jbd2] 
  [<ffffffffa030db11>] ? __jbd2_log_start_commit+0x2c/0x33 [jbd2] 
  [<ffffffffa030856b>] ? jbd2_journal_stop+0x1f7/0x21f [jbd2] 
  [<ffffffffa0308db3>] ? jbd2_journal_start+0x90/0xbf [jbd2] 
  [<ffffffffa04cd8f8>] ? ocfs2_commit_trans+0x23/0xb1 [ocfs2] 
  [<ffffffffa04d2212>] ? ocfs2_complete_local_alloc_recovery+0x2fa/0x3a1 [ocfs2] 
  [<ffffffff82029bf1>] ? update_curr+0x77/0xe4 
  [<ffffffffa04cf681>] ? ocfs2_complete_recovery+0x0/0xab2 [ocfs2] 
  [<ffffffffa04cf887>] ? ocfs2_complete_recovery+0x206/0xab2 [ocfs2] 
  [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 
  [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d 
  [<ffffffffa04cf681>] ? ocfs2_complete_recovery+0x0/0xab2 [ocfs2] 
  [<ffffffff8204afdf>] ? worker_thread+0x14a/0x1e4 
  [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e 
  [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 
  [<ffffffff8204db33>] ? kthread+0x79/0x81 
  [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 
  [<ffffffff8204daba>] ? kthread+0x0/0x81 
  [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 
 Code: 
 0f 
 84 
 f2 
 00 
 last message repeated 2 times
 48 
 8d 
 bd 
 a0 
 05 
 00 
 00 
 e8 
 ab 
 ba 
 19 
 00 
 48 
 8b 
 b5 
 00 
 06 
 00 
 00 
 4c 
 89 
 e7 
 e8 
 69 
 eb 
 ff 
 ff 
 49 
 89 
 c6 
 48 
 8b 
 85 
 00 
 06 
 00 
 00 
 e> 
 48 
 ff 
 00 
 fe 
 85 
 a0 
 05 
 00 
 00 
 4d 
 85 
 f6 
 0f 
 84 
 a6 
 00 
 last message repeated 2 times
 49 
 8b 
  
 RIP 
  [<ffffffff82161537>] cfq_yield+0x5f/0x135 
  RSP <ffff880123061c60> 
 CR2: 0000000000000000 
 ---[ end trace 374ddf0f57161b27 ]--- 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-28  6:41       ` Tao Ma
@ 2010-06-28 13:58         ` Jeff Moyer
  2010-06-28 23:16           ` Tao Ma
  2010-06-29 14:56         ` Jeff Moyer
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Moyer @ 2010-06-28 13:58 UTC (permalink / raw)
  To: Tao Ma
  Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker,
	Sunil Mushran, ocfs2-devel

Tao Ma <tao.ma@oracle.com> writes:

> btw, I also met with a NULL pointer deference in cfq_yield. I have
> attached the null.log also. This seems to be related to the previous
> deadlock and happens when I try to remount the same volume after
> reboot and ocfs2 try to do some recovery.

Since I can't reproduce your binary kernel even with your .config, could
you send me the disassembly of the cfq_yield function from your vmlinux
binary?

Thanks!
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-28 13:58         ` Jeff Moyer
@ 2010-06-28 23:16           ` Tao Ma
  0 siblings, 0 replies; 24+ messages in thread
From: Tao Ma @ 2010-06-28 23:16 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4, ocfs2-devel

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

Jeff Moyer wrote:
> Tao Ma <tao.ma@oracle.com> writes:
>
>   
>> btw, I also met with a NULL pointer deference in cfq_yield. I have
>> attached the null.log also. This seems to be related to the previous
>> deadlock and happens when I try to remount the same volume after
>> reboot and ocfs2 try to do some recovery.
>>     
>
> Since I can't reproduce your binary kernel even with your .config, could
> you send me the disassembly of the cfq_yield function from your vmlinux
> binary?
>   
no problem. I have attached it.
btw, if you have any debug patch, I am happy to run it to make the 
problem more clear to you.

Regards,
Tao

[-- Attachment #2: cfq_yield.txt --]
[-- Type: text/plain, Size: 5069 bytes --]

ffffffff821614d8 <cfq_yield>:
ffffffff821614d8:	41 56                	push   %r14
ffffffff821614da:	41 55                	push   %r13
ffffffff821614dc:	49 89 fd             	mov    %rdi,%r13
ffffffff821614df:	41 54                	push   %r12
ffffffff821614e1:	55                   	push   %rbp
ffffffff821614e2:	48 89 f5             	mov    %rsi,%rbp
ffffffff821614e5:	53                   	push   %rbx
ffffffff821614e6:	48 8b 47 18          	mov    0x18(%rdi),%rax
ffffffff821614ea:	4c 8b 60 08          	mov    0x8(%rax),%r12
ffffffff821614ee:	65 48 8b 04 25 40 b5 	mov    %gs:0xb540,%rax
ffffffff821614f5:	00 00 
ffffffff821614f7:	48 8b b0 00 06 00 00 	mov    0x600(%rax),%rsi
ffffffff821614fe:	4c 89 e7             	mov    %r12,%rdi
ffffffff82161501:	e8 90 eb ff ff       	callq  ffffffff82160096 <cfq_cic_lookup>
ffffffff82161506:	48 85 c0             	test   %rax,%rax
ffffffff82161509:	48 89 c3             	mov    %rax,%rbx
ffffffff8216150c:	0f 84 f2 00 00 00    	je     ffffffff82161604 <cfq_yield+0x12c>
ffffffff82161512:	48 8d bd a0 05 00 00 	lea    0x5a0(%rbp),%rdi
ffffffff82161519:	e8 ab ba 19 00       	callq  ffffffff822fcfc9 <_raw_spin_lock>
ffffffff8216151e:	48 8b b5 00 06 00 00 	mov    0x600(%rbp),%rsi
ffffffff82161525:	4c 89 e7             	mov    %r12,%rdi
ffffffff82161528:	e8 69 eb ff ff       	callq  ffffffff82160096 <cfq_cic_lookup>
ffffffff8216152d:	49 89 c6             	mov    %rax,%r14
ffffffff82161530:	48 8b 85 00 06 00 00 	mov    0x600(%rbp),%rax
ffffffff82161537:	f0 48 ff 00          	lock incq (%rax)
ffffffff8216153b:	fe 85 a0 05 00 00    	incb   0x5a0(%rbp)
ffffffff82161541:	4d 85 f6             	test   %r14,%r14
ffffffff82161544:	0f 84 a6 00 00 00    	je     ffffffff821615f0 <cfq_yield+0x118>
ffffffff8216154a:	49 8b bd 48 03 00 00 	mov    0x348(%r13),%rdi
ffffffff82161551:	e8 a0 ba 19 00       	callq  ffffffff822fcff6 <_raw_spin_lock_irq>
ffffffff82161556:	48 8b 5b 10          	mov    0x10(%rbx),%rbx
ffffffff8216155a:	48 85 db             	test   %rbx,%rbx
ffffffff8216155d:	0f 84 83 00 00 00    	je     ffffffff821615e6 <cfq_yield+0x10e>
ffffffff82161563:	49 39 9c 24 68 03 00 	cmp    %rbx,0x368(%r12)
ffffffff8216156a:	00 
ffffffff8216156b:	75 41                	jne    ffffffff821615ae <cfq_yield+0xd6>
ffffffff8216156d:	41 83 bc 24 54 02 00 	cmpl   $0x1,0x254(%r12)
ffffffff82161574:	00 01 
ffffffff82161576:	75 36                	jne    ffffffff821615ae <cfq_yield+0xd6>
ffffffff82161578:	48 8b 83 d0 00 00 00 	mov    0xd0(%rbx),%rax
ffffffff8216157f:	8b 70 38             	mov    0x38(%rax),%esi
ffffffff82161582:	83 fe ff             	cmp    $0xffffffffffffffff,%esi
ffffffff82161585:	74 27                	je     ffffffff821615ae <cfq_yield+0xd6>
ffffffff82161587:	48 8b 48 30          	mov    0x30(%rax),%rcx
ffffffff8216158b:	48 85 c9             	test   %rcx,%rcx
ffffffff8216158e:	74 1e                	je     ffffffff821615ae <cfq_yield+0xd6>
ffffffff82161590:	48 63 05 4d f2 5d 00 	movslq 6156877(%rip),%rax        # ffffffff827407e4 <cfq_slice_idle>
ffffffff82161597:	48 8b 15 e2 42 63 00 	mov    6505186(%rip),%rdx        # ffffffff82795880 <jiffies>
ffffffff8216159e:	48 01 c8             	add    %rcx,%rax
ffffffff821615a1:	48 39 d0             	cmp    %rdx,%rax
ffffffff821615a4:	79 08                	jns    ffffffff821615ae <cfq_yield+0xd6>
ffffffff821615a6:	3b b3 c0 00 00 00    	cmp    0xc0(%rbx),%esi
ffffffff821615ac:	75 38                	jne    ffffffff821615e6 <cfq_yield+0x10e>
ffffffff821615ae:	49 8b 04 24          	mov    (%r12),%rax
ffffffff821615b2:	48 8b b8 80 04 00 00 	mov    0x480(%rax),%rdi
ffffffff821615b9:	48 85 ff             	test   %rdi,%rdi
ffffffff821615bc:	74 1a                	je     ffffffff821615d8 <cfq_yield+0x100>
ffffffff821615be:	8b 8d 68 02 00 00    	mov    0x268(%rbp),%ecx
ffffffff821615c4:	8b 93 c0 00 00 00    	mov    0xc0(%rbx),%edx
ffffffff821615ca:	48 c7 c6 1d 8c 55 82 	mov    $0xffffffff82558c1d,%rsi
ffffffff821615d1:	31 c0                	xor    %eax,%eax
ffffffff821615d3:	e8 a1 07 f2 ff       	callq  ffffffff82081d79 <__trace_note_message>
ffffffff821615d8:	81 4b 04 00 20 00 00 	orl    $0x2000,0x4(%rbx)
ffffffff821615df:	4c 89 b3 f0 00 00 00 	mov    %r14,0xf0(%rbx)
ffffffff821615e6:	49 8b 85 48 03 00 00 	mov    0x348(%r13),%rax
ffffffff821615ed:	fe 00                	incb   (%rax)
ffffffff821615ef:	fb                   	sti    
ffffffff821615f0:	5b                   	pop    %rbx
ffffffff821615f1:	48 8b bd 00 06 00 00 	mov    0x600(%rbp),%rdi
ffffffff821615f8:	5d                   	pop    %rbp
ffffffff821615f9:	41 5c                	pop    %r12
ffffffff821615fb:	41 5d                	pop    %r13
ffffffff821615fd:	41 5e                	pop    %r14
ffffffff821615ff:	e9 51 8b ff ff       	jmpq   ffffffff8215a155 <put_io_context>
ffffffff82161604:	5b                   	pop    %rbx
ffffffff82161605:	5d                   	pop    %rbp
ffffffff82161606:	41 5c                	pop    %r12
ffffffff82161608:	41 5d                	pop    %r13
ffffffff8216160a:	41 5e                	pop    %r14
ffffffff8216160c:	c3                   	retq   

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-28  6:41       ` Tao Ma
  2010-06-28 13:58         ` Jeff Moyer
@ 2010-06-29 14:56         ` Jeff Moyer
  2010-06-30  0:31           ` Tao Ma
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Moyer @ 2010-06-29 14:56 UTC (permalink / raw)
  To: Tao Ma
  Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker,
	Sunil Mushran, ocfs2-devel

Tao Ma <tao.ma@oracle.com> writes:

> Hi Jeff,
>
> On 06/27/2010 09:48 PM, Jeff Moyer wrote:
>> Tao Ma<tao.ma@oracle.com>  writes:
>>> I am sorry to say that the patch make jbd2 locked up when I tested
>>> fs_mark using ocfs2.
>>> I have attached the log from my netconsole server. After I reverted
>>> the patch [3/3], the box works again.
>>
>> I can't reproduce this, unfortunately.  Also, when building with the
>> .config you sent me, the disassembly doesn't line up with the stack
>> trace you posted.
>>
>> I'm not sure why yielding the queue would cause a deadlock.  The only
>> explanation I can come up with is that I/O is not being issued.  I'm
>> assuming that no other I/O will be completed to the file system in
>> question.  Is that right?  Could you send along the output from sysrq-t?
> yes, I just mounted it and begin the test, so there should be no
> outstanding I/O. So do you need me to setup another disk for test?
> I have attached the sysrq output in sysrq.log. please check.

Well, if it doesn't take long to reproduce, then it might be helpful to
see a blktrace of the run.  However, it might also just be worth waiting
for the next version of the patch to see if that fixes your issue.

> btw, I also met with a NULL pointer deference in cfq_yield. I have
> attached the null.log also. This seems to be related to the previous
> deadlock and happens when I try to remount the same volume after
> reboot and ocfs2 try to do some recovery.

 Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745                  
 RIP: 0010:[<ffffffff82161537>] 
  [<ffffffff82161537>] cfq_yield+0x5f/0x135 
 RSP: 0018:ffff880123061c60  EFLAGS: 00010246 
 RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0 

ffffffff82161528:	e8 69 eb ff ff       	callq  ffffffff82160096 <cfq_cic_lookup>
ffffffff8216152d:	49 89 c6             	mov    %rax,%r14
ffffffff82161530:	48 8b 85 00 06 00 00 	mov    0x600(%rbp),%rax
ffffffff82161537:	f0 48 ff 00          	lock incq (%rax)

I'm pretty sure that's a NULL pointer deref of the tsk->iocontext that
was passed into the yield function.  I've since fixed that, so your
recovery code should be safe in the newest version (which I've not yet
posted).

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
  2010-06-29 14:56         ` Jeff Moyer
@ 2010-06-30  0:31           ` Tao Ma
  0 siblings, 0 replies; 24+ messages in thread
From: Tao Ma @ 2010-06-30  0:31 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker,
	Sunil Mushran, ocfs2-devel

Hi Jeff,

On 06/29/2010 10:56 PM, Jeff Moyer wrote:
> Tao Ma<tao.ma@oracle.com>  writes:
>
>> Hi Jeff,
>>
>> On 06/27/2010 09:48 PM, Jeff Moyer wrote:
>>> Tao Ma<tao.ma@oracle.com>   writes:
>>>> I am sorry to say that the patch make jbd2 locked up when I tested
>>>> fs_mark using ocfs2.
>>>> I have attached the log from my netconsole server. After I reverted
>>>> the patch [3/3], the box works again.
>>>
>>> I can't reproduce this, unfortunately.  Also, when building with the
>>> .config you sent me, the disassembly doesn't line up with the stack
>>> trace you posted.
>>>
>>> I'm not sure why yielding the queue would cause a deadlock.  The only
>>> explanation I can come up with is that I/O is not being issued.  I'm
>>> assuming that no other I/O will be completed to the file system in
>>> question.  Is that right?  Could you send along the output from sysrq-t?
>> yes, I just mounted it and begin the test, so there should be no
>> outstanding I/O. So do you need me to setup another disk for test?
>> I have attached the sysrq output in sysrq.log. please check.
>
> Well, if it doesn't take long to reproduce, then it might be helpful to
> see a blktrace of the run.  However, it might also just be worth waiting
> for the next version of the patch to see if that fixes your issue.
>
>> btw, I also met with a NULL pointer deference in cfq_yield. I have
>> attached the null.log also. This seems to be related to the previous
>> deadlock and happens when I try to remount the same volume after
>> reboot and ocfs2 try to do some recovery.
>
>   Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745
>   RIP: 0010:[<ffffffff82161537>]
>    [<ffffffff82161537>] cfq_yield+0x5f/0x135
>   RSP: 0018:ffff880123061c60  EFLAGS: 00010246
>   RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0
>
> ffffffff82161528:	e8 69 eb ff ff       	callq  ffffffff82160096<cfq_cic_lookup>
> ffffffff8216152d:	49 89 c6             	mov    %rax,%r14
> ffffffff82161530:	48 8b 85 00 06 00 00 	mov    0x600(%rbp),%rax
> ffffffff82161537:	f0 48 ff 00          	lock incq (%rax)
>
> I'm pretty sure that's a NULL pointer deref of the tsk->iocontext that
> was passed into the yield function.  I've since fixed that, so your
> recovery code should be safe in the newest version (which I've not yet
> posted).
ok, so could you please cc me when the new patches are out? It would be 
easier for me to track it. Thanks.

Regards,
Tao

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2010-06-30  0:32 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer
2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer
2010-06-23  5:04   ` Andrew Morton
2010-06-23 14:50     ` Jeff Moyer
2010-06-24  0:46   ` Vivek Goyal
2010-06-25 16:51     ` Jeff Moyer
2010-06-25 18:55       ` Jens Axboe
2010-06-25 19:57         ` Jeff Moyer
2010-06-25 20:02       ` Vivek Goyal
2010-06-22 21:35 ` [PATCH 2/3] jbd: yield the device queue when waiting for commits Jeff Moyer
2010-06-22 21:35 ` [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits Jeff Moyer
2010-06-22 22:13 ` [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Joel Becker
2010-06-23  9:20 ` Christoph Hellwig
2010-06-23 13:03   ` Jeff Moyer
2010-06-23  9:30 ` Tao Ma
2010-06-23 13:06   ` Jeff Moyer
2010-06-24  5:54   ` Tao Ma
2010-06-24 14:56     ` Jeff Moyer
2010-06-27 13:48     ` Jeff Moyer
2010-06-28  6:41       ` Tao Ma
2010-06-28 13:58         ` Jeff Moyer
2010-06-28 23:16           ` Tao Ma
2010-06-29 14:56         ` Jeff Moyer
2010-06-30  0:31           ` Tao Ma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).