* [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ @ 2010-06-22 21:34 Jeff Moyer 2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer ` (5 more replies) 0 siblings, 6 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-22 21:34 UTC (permalink / raw) To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4 Hi, Running iozone with the fsync flag, or fs_mark, the performance of CFQ is far worse than that of deadline for enterprise class storage when dealing with file sizes of 8MB or less. I used the following command line as a representative test case: fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F When run using the deadline I/O scheduler, an average of the first 5 numbers will give you 448.4 files / second. CFQ will yield only 106.7. With this patch series applied (and the two patches I sent yesterday), CFQ now achieves 462.5 files / second. This patch set is still an RFC. I'd like to make it perform better when there is a competing sequential reader present. For now, I've addressed the concerns voiced about the previous posting. Review and testing would be greatly appreciated. Thanks! Jeff --- New from the last round: - removed the think time calculation I added for the sync-noidle service tree - replaced above with a suggestion from Vivek to only guard against currently active sequential readers when determining if we can preempt the sync-noidle service tree. - bug fixes Over all, I think it's simpler now thanks to the suggestions from Jens and Vivek. [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. [PATCH 2/3] jbd: yield the device queue when waiting for commits [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer @ 2010-06-22 21:35 ` Jeff Moyer 2010-06-23 5:04 ` Andrew Morton 2010-06-24 0:46 ` Vivek Goyal 2010-06-22 21:35 ` [PATCH 2/3] jbd: yield the device queue when waiting for commits Jeff Moyer ` (4 subsequent siblings) 5 siblings, 2 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-22 21:35 UTC (permalink / raw) To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4, Jeff Moyer This patch implements a blk_yield to allow a process to voluntarily give up its I/O scheduler time slice. This is desirable for those processes which know that they will be blocked on I/O from another process, such as the file system journal thread. Following patches will put calls to blk_yield into jbd and jbd2. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> --- block/blk-core.c | 13 +++++ block/blk-settings.c | 6 ++ block/cfq-iosched.c | 123 +++++++++++++++++++++++++++++++++++++++++++++- block/elevator.c | 8 +++ include/linux/blkdev.h | 4 ++ include/linux/elevator.h | 3 + 6 files changed, 155 insertions(+), 2 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index f84cce4..b9afbba 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -324,6 +324,18 @@ void blk_unplug(struct request_queue *q) } EXPORT_SYMBOL(blk_unplug); +void generic_yield_iosched(struct request_queue *q, struct task_struct *tsk) +{ + elv_yield(q, tsk); +} + +void blk_yield(struct request_queue *q, struct task_struct *tsk) +{ + if (q->yield_fn) + q->yield_fn(q, tsk); +} +EXPORT_SYMBOL(blk_yield); + /** * blk_start_queue - restart a previously stopped queue * @q: The &struct request_queue in question @@ -609,6 +621,7 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, q->request_fn = rfn; q->prep_rq_fn = NULL; q->unplug_fn = generic_unplug_device; + q->yield_fn = generic_yield_iosched; q->queue_flags = QUEUE_FLAG_DEFAULT; q->queue_lock = lock; diff --git a/block/blk-settings.c b/block/blk-settings.c index f5ed5a1..fe548c9 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -171,6 +171,12 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) } EXPORT_SYMBOL(blk_queue_make_request); +void blk_queue_yield(struct request_queue *q, yield_fn *yield) +{ + q->yield_fn = yield; +} +EXPORT_SYMBOL_GPL(blk_queue_yield); + /** * blk_queue_bounce_limit - set bounce buffer limit for queue * @q: the request queue for the device diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index dab836e..a9922b9 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -87,9 +87,12 @@ struct cfq_rb_root { unsigned total_weight; u64 min_vdisktime; struct rb_node *active; + unsigned long last_expiry; + pid_t last_pid; }; #define CFQ_RB_ROOT (struct cfq_rb_root) { .rb = RB_ROOT, .left = NULL, \ - .count = 0, .min_vdisktime = 0, } + .count = 0, .min_vdisktime = 0, .last_expiry = 0UL, \ + .last_pid = (pid_t)-1, } /* * Per process-grouping structure @@ -147,6 +150,7 @@ struct cfq_queue { struct cfq_queue *new_cfqq; struct cfq_group *cfqg; struct cfq_group *orig_cfqg; + struct cfq_io_context *yield_to; }; /* @@ -318,6 +322,7 @@ enum cfqq_state_flags { CFQ_CFQQ_FLAG_split_coop, /* shared cfqq will be splitted */ CFQ_CFQQ_FLAG_deep, /* sync cfqq experienced large depth */ CFQ_CFQQ_FLAG_wait_busy, /* Waiting for next request */ + CFQ_CFQQ_FLAG_yield, /* Allow another cfqq to run */ }; #define CFQ_CFQQ_FNS(name) \ @@ -347,6 +352,7 @@ CFQ_CFQQ_FNS(coop); CFQ_CFQQ_FNS(split_coop); CFQ_CFQQ_FNS(deep); CFQ_CFQQ_FNS(wait_busy); +CFQ_CFQQ_FNS(yield); #undef CFQ_CFQQ_FNS #ifdef CONFIG_CFQ_GROUP_IOSCHED @@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq, cfq_clear_cfqq_wait_request(cfqq); cfq_clear_cfqq_wait_busy(cfqq); + if (!cfq_cfqq_yield(cfqq)) { + struct cfq_rb_root *st; + st = service_tree_for(cfqq->cfqg, + cfqq_prio(cfqq), cfqq_type(cfqq)); + st->last_expiry = jiffies; + st->last_pid = cfqq->pid; + } + cfq_clear_cfqq_yield(cfqq); + /* * If this cfqq is shared between multiple processes, check to * make sure that those processes are still issuing I/Os within @@ -2118,7 +2133,7 @@ static void choose_service_tree(struct cfq_data *cfqd, struct cfq_group *cfqg) slice = max(slice, 2 * cfqd->cfq_slice_idle); slice = max_t(unsigned, slice, CFQ_MIN_TT); - cfq_log(cfqd, "workload slice:%d", slice); + cfq_log(cfqd, "workload:%d slice:%d", cfqd->serving_type, slice); cfqd->workload_expires = jiffies + slice; cfqd->noidle_tree_requires_idle = false; } @@ -2153,6 +2168,36 @@ static void cfq_choose_cfqg(struct cfq_data *cfqd) choose_service_tree(cfqd, cfqg); } +static int cfq_should_yield_now(struct cfq_queue *cfqq, + struct cfq_queue **yield_to) +{ + struct cfq_queue *new_cfqq; + + new_cfqq = cic_to_cfqq(cfqq->yield_to, 1); + + /* + * If the queue we're yielding to is in a different cgroup, + * just expire our own time slice. + */ + if (new_cfqq->cfqg != cfqq->cfqg) { + *yield_to = NULL; + return 1; + } + + /* + * If the new queue has pending I/O, then switch to it + * immediately. Otherwise, see if we can idle until it is + * ready to preempt us. + */ + if (!RB_EMPTY_ROOT(&new_cfqq->sort_list)) { + *yield_to = new_cfqq; + return 1; + } + + *yield_to = NULL; + return 0; +} + /* * Select a queue for service. If we have a current active queue, * check whether to continue servicing it, or retrieve and set a new one. @@ -2187,6 +2232,10 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd) * have been idling all along on this queue and it should be * ok to wait for this request to complete. */ + if (cfq_cfqq_yield(cfqq) && + cfq_should_yield_now(cfqq, &new_cfqq)) + goto expire; + if (cfqq->cfqg->nr_cfqq == 1 && RB_EMPTY_ROOT(&cfqq->sort_list) && cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) { cfqq = NULL; @@ -2215,6 +2264,9 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd) goto expire; } + if (cfq_cfqq_yield(cfqq) && cfq_should_yield_now(cfqq, &new_cfqq)) + goto expire; + /* * No requests pending. If the active queue still has requests in * flight or is idling for a new request, allow either of these @@ -2241,6 +2293,65 @@ keep_queue: return cfqq; } +static inline int expiry_data_valid(struct cfq_rb_root *service_tree) +{ + return (service_tree->last_pid != (pid_t)-1 && + service_tree->last_expiry != 0UL); +} + +static void cfq_yield(struct request_queue *q, struct task_struct *tsk) +{ + struct cfq_data *cfqd = q->elevator->elevator_data; + struct cfq_io_context *cic, *new_cic; + struct cfq_queue *cfqq; + + cic = cfq_cic_lookup(cfqd, current->io_context); + if (!cic) + return; + + task_lock(tsk); + new_cic = cfq_cic_lookup(cfqd, tsk->io_context); + atomic_long_inc(&tsk->io_context->refcount); + task_unlock(tsk); + if (!new_cic) + goto out_dec; + + spin_lock_irq(q->queue_lock); + + cfqq = cic_to_cfqq(cic, 1); + if (!cfqq) + goto out_unlock; + + /* + * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we + * are idling on the last queue in that workload, *and* there are no + * potential dependent readers running currently, then go ahead and + * yield the queue. + */ + if (cfqd->active_queue == cfqq && + cfqd->serving_type == SYNC_NOIDLE_WORKLOAD) { + /* + * If there's been no I/O from another process in the idle + * slice time, then there is by definition no dependent + * read going on for this service tree. + */ + if (expiry_data_valid(cfqq->service_tree) && + time_before(cfqq->service_tree->last_expiry + + cfq_slice_idle, jiffies) && + cfqq->service_tree->last_pid != cfqq->pid) + goto out_unlock; + } + + cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid); + cfqq->yield_to = new_cic; + cfq_mark_cfqq_yield(cfqq); + +out_unlock: + spin_unlock_irq(q->queue_lock); +out_dec: + put_io_context(tsk->io_context); +} + static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq) { int dispatched = 0; @@ -3123,6 +3234,13 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, if (!cfqq) return false; + /* + * If the active queue yielded its timeslice to this queue, let + * it preempt. + */ + if (cfq_cfqq_yield(cfqq) && RQ_CIC(rq) == cfqq->yield_to) + return true; + if (cfq_class_idle(new_cfqq)) return false; @@ -3973,6 +4091,7 @@ static struct elevator_type iosched_cfq = { .elevator_deactivate_req_fn = cfq_deactivate_request, .elevator_queue_empty_fn = cfq_queue_empty, .elevator_completed_req_fn = cfq_completed_request, + .elevator_yield_fn = cfq_yield, .elevator_former_req_fn = elv_rb_former_request, .elevator_latter_req_fn = elv_rb_latter_request, .elevator_set_req_fn = cfq_set_request, diff --git a/block/elevator.c b/block/elevator.c index 923a913..5e33297 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -866,6 +866,14 @@ void elv_completed_request(struct request_queue *q, struct request *rq) } } +void elv_yield(struct request_queue *q, struct task_struct *tsk) +{ + struct elevator_queue *e = q->elevator; + + if (e && e->ops->elevator_yield_fn) + e->ops->elevator_yield_fn(q, tsk); +} + #define to_elv(atr) container_of((atr), struct elv_fs_entry, attr) static ssize_t diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 09a8402..8d073c0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -263,6 +263,7 @@ struct request_pm_state typedef void (request_fn_proc) (struct request_queue *q); typedef int (make_request_fn) (struct request_queue *q, struct bio *bio); +typedef void (yield_fn) (struct request_queue *q, struct task_struct *tsk); typedef int (prep_rq_fn) (struct request_queue *, struct request *); typedef void (unplug_fn) (struct request_queue *); @@ -345,6 +346,7 @@ struct request_queue request_fn_proc *request_fn; make_request_fn *make_request_fn; + yield_fn *yield_fn; prep_rq_fn *prep_rq_fn; unplug_fn *unplug_fn; merge_bvec_fn *merge_bvec_fn; @@ -837,6 +839,7 @@ extern int blk_execute_rq(struct request_queue *, struct gendisk *, extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *, struct request *, int, rq_end_io_fn *); extern void blk_unplug(struct request_queue *q); +extern void blk_yield(struct request_queue *q, struct task_struct *tsk); static inline struct request_queue *bdev_get_queue(struct block_device *bdev) { @@ -929,6 +932,7 @@ extern struct request_queue *blk_init_allocated_queue(struct request_queue *, request_fn_proc *, spinlock_t *); extern void blk_cleanup_queue(struct request_queue *); extern void blk_queue_make_request(struct request_queue *, make_request_fn *); +extern void blk_queue_yield(struct request_queue *, yield_fn *); extern void blk_queue_bounce_limit(struct request_queue *, u64); extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int); extern void blk_queue_max_segments(struct request_queue *, unsigned short); diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 2c958f4..a68b5b1 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -23,6 +23,7 @@ typedef void (elevator_add_req_fn) (struct request_queue *, struct request *); typedef int (elevator_queue_empty_fn) (struct request_queue *); typedef struct request *(elevator_request_list_fn) (struct request_queue *, struct request *); typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *); +typedef void (elevator_yield_fn) (struct request_queue *, struct task_struct *tsk); typedef int (elevator_may_queue_fn) (struct request_queue *, int); typedef int (elevator_set_req_fn) (struct request_queue *, struct request *, gfp_t); @@ -48,6 +49,7 @@ struct elevator_ops elevator_queue_empty_fn *elevator_queue_empty_fn; elevator_completed_req_fn *elevator_completed_req_fn; + elevator_yield_fn *elevator_yield_fn; elevator_request_list_fn *elevator_former_req_fn; elevator_request_list_fn *elevator_latter_req_fn; @@ -111,6 +113,7 @@ extern void elv_bio_merged(struct request_queue *q, struct request *, struct bio *); extern void elv_requeue_request(struct request_queue *, struct request *); extern int elv_queue_empty(struct request_queue *); +extern void elv_yield(struct request_queue *, struct task_struct *); extern struct request *elv_former_request(struct request_queue *, struct request *); extern struct request *elv_latter_request(struct request_queue *, struct request *); extern int elv_register_queue(struct request_queue *q); -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer @ 2010-06-23 5:04 ` Andrew Morton 2010-06-23 14:50 ` Jeff Moyer 2010-06-24 0:46 ` Vivek Goyal 1 sibling, 1 reply; 24+ messages in thread From: Andrew Morton @ 2010-06-23 5:04 UTC (permalink / raw) To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4 On Tue, 22 Jun 2010 17:35:00 -0400 Jeff Moyer <jmoyer@redhat.com> wrote: > This patch implements a blk_yield to allow a process to voluntarily > give up its I/O scheduler time slice. This is desirable for those processes > which know that they will be blocked on I/O from another process, such as > the file system journal thread. Following patches will put calls to blk_yield > into jbd and jbd2. > I'm looking through this patch series trying to find the analysis/description of the cause for this (bad) performance problem. But all I'm seeing is implementation stuff :( It's hard to review code with your eyes shut. > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -324,6 +324,18 @@ void blk_unplug(struct request_queue *q) > } > EXPORT_SYMBOL(blk_unplug); > > +void generic_yield_iosched(struct request_queue *q, struct task_struct *tsk) > +{ > + elv_yield(q, tsk); > +} static? > > ... > > +void blk_queue_yield(struct request_queue *q, yield_fn *yield) > +{ > + q->yield_fn = yield; > +} > +EXPORT_SYMBOL_GPL(blk_queue_yield); There's a tradition in the block layer of using truly awful identifiers for functions-which-set-things. But there's no good reason for retaining that tradition. blk_queue_yield_set(), perhaps. (what name would you give an accessor which _reads_ q->yield_fn? yup, "blk_queue_yield()". doh). > /** > * blk_queue_bounce_limit - set bounce buffer limit for queue > * @q: the request queue for the device > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > index dab836e..a9922b9 100644 > --- a/block/cfq-iosched.c > +++ b/block/cfq-iosched.c > @@ -87,9 +87,12 @@ struct cfq_rb_root { > unsigned total_weight; > u64 min_vdisktime; > struct rb_node *active; > + unsigned long last_expiry; > + pid_t last_pid; These fields are pretty fundamental to understanding the implementation. Some nice descriptions would be nice. > }; > #define CFQ_RB_ROOT (struct cfq_rb_root) { .rb = RB_ROOT, .left = NULL, \ > - .count = 0, .min_vdisktime = 0, } > + .count = 0, .min_vdisktime = 0, .last_expiry = 0UL, \ > + .last_pid = (pid_t)-1, } May as well leave the 0 and NULL fields unmentioned (ie: don't do crappy stuff because the old code did crappy stuff!) > /* > * Per process-grouping structure > @@ -147,6 +150,7 @@ struct cfq_queue { > struct cfq_queue *new_cfqq; > struct cfq_group *cfqg; > struct cfq_group *orig_cfqg; > + struct cfq_io_context *yield_to; > }; > > /* > > ... > > +static int cfq_should_yield_now(struct cfq_queue *cfqq, > + struct cfq_queue **yield_to) The bool-returning function could return a bool type. > +{ > + struct cfq_queue *new_cfqq; > + > + new_cfqq = cic_to_cfqq(cfqq->yield_to, 1); > + > + /* > + * If the queue we're yielding to is in a different cgroup, > + * just expire our own time slice. > + */ > + if (new_cfqq->cfqg != cfqq->cfqg) { > + *yield_to = NULL; > + return 1; > + } > + > + /* > + * If the new queue has pending I/O, then switch to it > + * immediately. Otherwise, see if we can idle until it is > + * ready to preempt us. > + */ > + if (!RB_EMPTY_ROOT(&new_cfqq->sort_list)) { > + *yield_to = new_cfqq; > + return 1; > + } > + > + *yield_to = NULL; > + return 0; > +} > + > /* > * Select a queue for service. If we have a current active queue, > * check whether to continue servicing it, or retrieve and set a new one. > > ... > > +static inline int expiry_data_valid(struct cfq_rb_root *service_tree) > +{ > + return (service_tree->last_pid != (pid_t)-1 && > + service_tree->last_expiry != 0UL); > +} The compiler will inline this. > +static void cfq_yield(struct request_queue *q, struct task_struct *tsk) -ENODESCRIPTION > +{ > + struct cfq_data *cfqd = q->elevator->elevator_data; > + struct cfq_io_context *cic, *new_cic; > + struct cfq_queue *cfqq; > + > + cic = cfq_cic_lookup(cfqd, current->io_context); > + if (!cic) > + return; > + > + task_lock(tsk); > + new_cic = cfq_cic_lookup(cfqd, tsk->io_context); > + atomic_long_inc(&tsk->io_context->refcount); How do we know tsk has an io_context? Use get_io_context() and check its result? > + task_unlock(tsk); > + if (!new_cic) > + goto out_dec; > + > + spin_lock_irq(q->queue_lock); > + > + cfqq = cic_to_cfqq(cic, 1); > + if (!cfqq) > + goto out_unlock; > + > + /* > + * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we > + * are idling on the last queue in that workload, *and* there are no > + * potential dependent readers running currently, then go ahead and > + * yield the queue. > + */ Comment explains the code, but doesn't explain the *reason* for the code. > + if (cfqd->active_queue == cfqq && > + cfqd->serving_type == SYNC_NOIDLE_WORKLOAD) { > + /* > + * If there's been no I/O from another process in the idle > + * slice time, then there is by definition no dependent > + * read going on for this service tree. > + */ > + if (expiry_data_valid(cfqq->service_tree) && > + time_before(cfqq->service_tree->last_expiry + > + cfq_slice_idle, jiffies) && > + cfqq->service_tree->last_pid != cfqq->pid) > + goto out_unlock; > + } > + > + cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid); > + cfqq->yield_to = new_cic; > + cfq_mark_cfqq_yield(cfqq); > + > +out_unlock: > + spin_unlock_irq(q->queue_lock); > +out_dec: > + put_io_context(tsk->io_context); > +} > + > static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq) > { > int dispatched = 0; > > ... > > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -866,6 +866,14 @@ void elv_completed_request(struct request_queue *q, struct request *rq) > } > } > > +void elv_yield(struct request_queue *q, struct task_struct *tsk) > +{ > + struct elevator_queue *e = q->elevator; > + > + if (e && e->ops->elevator_yield_fn) > + e->ops->elevator_yield_fn(q, tsk); > +} Again, no documentation. How are other programmers to know when, why and how they should use this? > > ... > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-23 5:04 ` Andrew Morton @ 2010-06-23 14:50 ` Jeff Moyer 0 siblings, 0 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-23 14:50 UTC (permalink / raw) To: Andrew Morton; +Cc: axboe, vgoyal, linux-kernel, linux-ext4 Andrew Morton <akpm@linux-foundation.org> writes: > On Tue, 22 Jun 2010 17:35:00 -0400 Jeff Moyer <jmoyer@redhat.com> wrote: > >> This patch implements a blk_yield to allow a process to voluntarily >> give up its I/O scheduler time slice. This is desirable for those processes >> which know that they will be blocked on I/O from another process, such as >> the file system journal thread. Following patches will put calls to blk_yield >> into jbd and jbd2. >> > > I'm looking through this patch series trying to find the > analysis/description of the cause for this (bad) performance problem. > But all I'm seeing is implementation stuff :( It's hard to review code > with your eyes shut. Sorry about that, Andrew. The problem case is (for example) iozone when run with small file sizes (up to 8MB) configured to fsync after each file is written. Because the iozone process is issuing synchronous writes, it is put onto CFQ's SYNC service tree. The significance of this is that CFQ will idle for up to 8ms waiting for requests on such queues. So, what happens is that the iozone process will issue, say, 64KB worth of write I/O. That I/O will just land in the page cache. Then, the iozone process does an fsync which forces those I/Os to disk as synchronous writes. Then, the file system's fsync method is invoked, and for ext3/4, it calls log_start_commit followed by log_wait_commit. Because those synchronous writes were forced out in the context of the iozone process, CFQ will now idle on iozone's cfqq waiting for more I/O. However, iozone's progress is gated by the journal thread, now. So, I tried two approaches to solving the problem. The first, which Christoph brought up again in this thread, was to simply mark all journal I/O as BIO_RW_META, which would cause the iozone process' cfqq to be preempted when the journal issued its I/O. However, Vivek pointed out that this was bad for interactive performance. The second approach, of which this is the fourth iteration, was to allow the file system to explicitly tell the I/O scheduler that it is waiting on I/O from another process. Does that help? Let me know if you have any more questions, and thanks a ton for looking at this, Andrew. I appreciate it. The comments I've elided from my response make perfect sense, so I'll address them in the next posting. >> }; >> #define CFQ_RB_ROOT (struct cfq_rb_root) { .rb = RB_ROOT, .left = NULL, \ >> - .count = 0, .min_vdisktime = 0, } >> + .count = 0, .min_vdisktime = 0, .last_expiry = 0UL, \ >> + .last_pid = (pid_t)-1, } > > May as well leave the 0 and NULL fields unmentioned (ie: don't do > crappy stuff because the old code did crappy stuff!) I don't actually understand why you take issue with this. >> +{ >> + struct cfq_data *cfqd = q->elevator->elevator_data; >> + struct cfq_io_context *cic, *new_cic; >> + struct cfq_queue *cfqq; >> + >> + cic = cfq_cic_lookup(cfqd, current->io_context); >> + if (!cic) >> + return; >> + >> + task_lock(tsk); >> + new_cic = cfq_cic_lookup(cfqd, tsk->io_context); >> + atomic_long_inc(&tsk->io_context->refcount); > > How do we know tsk has an io_context? Use get_io_context() and check > its result? I'll fix that up. It works now only by luck (and the fact that there's a good chance the journal thread has an i/o context). >> + task_unlock(tsk); >> + if (!new_cic) >> + goto out_dec; >> + >> + spin_lock_irq(q->queue_lock); >> + >> + cfqq = cic_to_cfqq(cic, 1); >> + if (!cfqq) >> + goto out_unlock; >> + >> + /* >> + * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we >> + * are idling on the last queue in that workload, *and* there are no >> + * potential dependent readers running currently, then go ahead and >> + * yield the queue. >> + */ > > Comment explains the code, but doesn't explain the *reason* for the code. Actually, it explains more than just what the code does. It would be difficult for one to divine that the code actually only really cares about breaking up a currently running dependent reader. I'll see if I can make that more clear. Cheers, Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer 2010-06-23 5:04 ` Andrew Morton @ 2010-06-24 0:46 ` Vivek Goyal 2010-06-25 16:51 ` Jeff Moyer 1 sibling, 1 reply; 24+ messages in thread From: Vivek Goyal @ 2010-06-24 0:46 UTC (permalink / raw) To: Jeff Moyer; +Cc: axboe, linux-kernel, linux-ext4 On Tue, Jun 22, 2010 at 05:35:00PM -0400, Jeff Moyer wrote: [..] > @@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq, > cfq_clear_cfqq_wait_request(cfqq); > cfq_clear_cfqq_wait_busy(cfqq); > > + if (!cfq_cfqq_yield(cfqq)) { > + struct cfq_rb_root *st; > + st = service_tree_for(cfqq->cfqg, > + cfqq_prio(cfqq), cfqq_type(cfqq)); > + st->last_expiry = jiffies; > + st->last_pid = cfqq->pid; > + } > + cfq_clear_cfqq_yield(cfqq); Jeff, I think cfqq is still on service tree at this point of time. If yes, then we can simply use cfqq->service_tree, instead of calling service_tree_for(). No clearing of cfqq->yield_to field? [..] > /* > * Select a queue for service. If we have a current active queue, > * check whether to continue servicing it, or retrieve and set a new one. > @@ -2187,6 +2232,10 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd) > * have been idling all along on this queue and it should be > * ok to wait for this request to complete. > */ > + if (cfq_cfqq_yield(cfqq) && > + cfq_should_yield_now(cfqq, &new_cfqq)) > + goto expire; > + I think we can get rid of this condition here and move the yield check above outside above if condition. This if condition waits for request to complete from this queue and waits for queue to get busy before slice expiry. If we have decided to yield the queue, there is no point in waiting for next request for queue to get busy. > if (cfqq->cfqg->nr_cfqq == 1 && RB_EMPTY_ROOT(&cfqq->sort_list) > && cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) { > cfqq = NULL; > @@ -2215,6 +2264,9 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd) > goto expire; > } > > + if (cfq_cfqq_yield(cfqq) && cfq_should_yield_now(cfqq, &new_cfqq)) > + goto expire; > + We can move this check up. [..] > +static void cfq_yield(struct request_queue *q, struct task_struct *tsk) > +{ > + struct cfq_data *cfqd = q->elevator->elevator_data; > + struct cfq_io_context *cic, *new_cic; > + struct cfq_queue *cfqq; > + > + cic = cfq_cic_lookup(cfqd, current->io_context); > + if (!cic) > + return; > + > + task_lock(tsk); > + new_cic = cfq_cic_lookup(cfqd, tsk->io_context); > + atomic_long_inc(&tsk->io_context->refcount); > + task_unlock(tsk); > + if (!new_cic) > + goto out_dec; > + > + spin_lock_irq(q->queue_lock); > + > + cfqq = cic_to_cfqq(cic, 1); > + if (!cfqq) > + goto out_unlock; > + > + /* > + * If we are currently servicing the SYNC_NOIDLE_WORKLOAD, and we > + * are idling on the last queue in that workload, *and* there are no > + * potential dependent readers running currently, then go ahead and > + * yield the queue. > + */ > + if (cfqd->active_queue == cfqq && > + cfqd->serving_type == SYNC_NOIDLE_WORKLOAD) { > + /* > + * If there's been no I/O from another process in the idle > + * slice time, then there is by definition no dependent > + * read going on for this service tree. > + */ > + if (expiry_data_valid(cfqq->service_tree) && > + time_before(cfqq->service_tree->last_expiry + > + cfq_slice_idle, jiffies) && > + cfqq->service_tree->last_pid != cfqq->pid) > + goto out_unlock; > + } > + > + cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid); > + cfqq->yield_to = new_cic; We are stashing away a pointer to cic without taking reference? > + cfq_mark_cfqq_yield(cfqq); > + > +out_unlock: > + spin_unlock_irq(q->queue_lock); > +out_dec: > + put_io_context(tsk->io_context); > +} > + > static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq) > { > int dispatched = 0; > @@ -3123,6 +3234,13 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, > if (!cfqq) > return false; > > + /* > + * If the active queue yielded its timeslice to this queue, let > + * it preempt. > + */ > + if (cfq_cfqq_yield(cfqq) && RQ_CIC(rq) == cfqq->yield_to) > + return true; > + I think we need to again if if we are sync-noidle workload then allow preemption only if no dependent read is currently on, otherwise sync-noidle service tree loses share. This version looks much simpler than previous one and is much easier to understand. I will do some testing on friday and provide you feedback. Thanks Vivek ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-24 0:46 ` Vivek Goyal @ 2010-06-25 16:51 ` Jeff Moyer 2010-06-25 18:55 ` Jens Axboe 2010-06-25 20:02 ` Vivek Goyal 0 siblings, 2 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-25 16:51 UTC (permalink / raw) To: Vivek Goyal; +Cc: axboe, linux-kernel, linux-ext4 Vivek Goyal <vgoyal@redhat.com> writes: > On Tue, Jun 22, 2010 at 05:35:00PM -0400, Jeff Moyer wrote: > > [..] >> @@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq, >> cfq_clear_cfqq_wait_request(cfqq); >> cfq_clear_cfqq_wait_busy(cfqq); >> >> + if (!cfq_cfqq_yield(cfqq)) { >> + struct cfq_rb_root *st; >> + st = service_tree_for(cfqq->cfqg, >> + cfqq_prio(cfqq), cfqq_type(cfqq)); >> + st->last_expiry = jiffies; >> + st->last_pid = cfqq->pid; >> + } >> + cfq_clear_cfqq_yield(cfqq); > > Jeff, I think cfqq is still on service tree at this point of time. If yes, > then we can simply use cfqq->service_tree, instead of calling > service_tree_for(). Yup. > No clearing of cfqq->yield_to field? Nope. Again, it's not required, but if you really want me to, I'll add it. > [..] >> /* >> * Select a queue for service. If we have a current active queue, >> * check whether to continue servicing it, or retrieve and set a new one. >> @@ -2187,6 +2232,10 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd) >> * have been idling all along on this queue and it should be >> * ok to wait for this request to complete. >> */ >> + if (cfq_cfqq_yield(cfqq) && >> + cfq_should_yield_now(cfqq, &new_cfqq)) >> + goto expire; >> + > > I think we can get rid of this condition here and move the yield check > above outside above if condition. This if condition waits for request to > complete from this queue and waits for queue to get busy before slice > expiry. If we have decided to yield the queue, there is no point in > waiting for next request for queue to get busy. Yeah, this is a vestige of the older code layout. Thanks, this cleans things up nicely. >> + cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid); >> + cfqq->yield_to = new_cic; > > We are stashing away a pointer to cic without taking reference? There is no reference counting on the cic. >> @@ -3123,6 +3234,13 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, >> if (!cfqq) >> return false; >> >> + /* >> + * If the active queue yielded its timeslice to this queue, let >> + * it preempt. >> + */ >> + if (cfq_cfqq_yield(cfqq) && RQ_CIC(rq) == cfqq->yield_to) >> + return true; >> + > > I think we need to again if if we are sync-noidle workload then allow > preemption only if no dependent read is currently on, otherwise > sync-noidle service tree loses share. I think you mean don't yield if there is a dependent reader. Yeah, makes sense. > This version looks much simpler than previous one and is much easier > to understand. I will do some testing on friday and provide you feedback. Great, thanks again for the review! Cheers, Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-25 16:51 ` Jeff Moyer @ 2010-06-25 18:55 ` Jens Axboe 2010-06-25 19:57 ` Jeff Moyer 2010-06-25 20:02 ` Vivek Goyal 1 sibling, 1 reply; 24+ messages in thread From: Jens Axboe @ 2010-06-25 18:55 UTC (permalink / raw) To: Jeff Moyer; +Cc: Vivek Goyal, linux-kernel, linux-ext4 On 25/06/10 18.51, Jeff Moyer wrote: >>> + cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid); >>> + cfqq->yield_to = new_cic; >> >> We are stashing away a pointer to cic without taking reference? > > There is no reference counting on the cic. Not on the cic itself, but on the io context it belongs to. So you need to grab a reference to that, if you are stowing a reference to the cic. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-25 18:55 ` Jens Axboe @ 2010-06-25 19:57 ` Jeff Moyer 0 siblings, 0 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-25 19:57 UTC (permalink / raw) To: Jens Axboe; +Cc: Vivek Goyal, linux-kernel, linux-ext4 Jens Axboe <axboe@kernel.dk> writes: > On 25/06/10 18.51, Jeff Moyer wrote: >>>> + cfq_log_cfqq(cfqd, cfqq, "yielding queue to %d", tsk->pid); >>>> + cfqq->yield_to = new_cic; >>> >>> We are stashing away a pointer to cic without taking reference? >> >> There is no reference counting on the cic. > > Not on the cic itself, but on the io context it belongs to. So you > need to grab a reference to that, if you are stowing a reference > to the cic. OK, easy enough. Thanks! Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. 2010-06-25 16:51 ` Jeff Moyer 2010-06-25 18:55 ` Jens Axboe @ 2010-06-25 20:02 ` Vivek Goyal 1 sibling, 0 replies; 24+ messages in thread From: Vivek Goyal @ 2010-06-25 20:02 UTC (permalink / raw) To: Jeff Moyer; +Cc: axboe, linux-kernel, linux-ext4 On Fri, Jun 25, 2010 at 12:51:58PM -0400, Jeff Moyer wrote: > Vivek Goyal <vgoyal@redhat.com> writes: > > > On Tue, Jun 22, 2010 at 05:35:00PM -0400, Jeff Moyer wrote: > > > > [..] > >> @@ -1614,6 +1620,15 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq, > >> cfq_clear_cfqq_wait_request(cfqq); > >> cfq_clear_cfqq_wait_busy(cfqq); > >> > >> + if (!cfq_cfqq_yield(cfqq)) { > >> + struct cfq_rb_root *st; > >> + st = service_tree_for(cfqq->cfqg, > >> + cfqq_prio(cfqq), cfqq_type(cfqq)); > >> + st->last_expiry = jiffies; > >> + st->last_pid = cfqq->pid; > >> + } > >> + cfq_clear_cfqq_yield(cfqq); > > > > Jeff, I think cfqq is still on service tree at this point of time. If yes, > > then we can simply use cfqq->service_tree, instead of calling > > service_tree_for(). > > Yup. > > > No clearing of cfqq->yield_to field? > > Nope. Again, it's not required, but if you really want me to, I'll add > it. I think clearing up is better as it leaves no scope for confusion. Thanks Vivek ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/3] jbd: yield the device queue when waiting for commits 2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer 2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer @ 2010-06-22 21:35 ` Jeff Moyer 2010-06-22 21:35 ` [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits Jeff Moyer ` (3 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-22 21:35 UTC (permalink / raw) To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4, Jeff Moyer This patch gets CFQ back in line with deadline for iozone runs, especially those testing small files + fsync timings. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> --- fs/jbd/journal.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c index 93d1e47..9b6cf4c 100644 --- a/fs/jbd/journal.c +++ b/fs/jbd/journal.c @@ -36,6 +36,7 @@ #include <linux/poison.h> #include <linux/proc_fs.h> #include <linux/debugfs.h> +#include <linux/blkdev.h> #include <asm/uaccess.h> #include <asm/page.h> @@ -549,6 +550,11 @@ int log_wait_commit(journal_t *journal, tid_t tid) while (tid_gt(tid, journal->j_commit_sequence)) { jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n", tid, journal->j_commit_sequence); + /* + * Give up our I/O scheduler time slice to allow the journal + * thread to issue I/O. + */ + blk_yield(journal->j_dev->bd_disk->queue, journal->j_task); wake_up(&journal->j_wait_commit); spin_unlock(&journal->j_state_lock); wait_event(journal->j_wait_done_commit, -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits 2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer 2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer 2010-06-22 21:35 ` [PATCH 2/3] jbd: yield the device queue when waiting for commits Jeff Moyer @ 2010-06-22 21:35 ` Jeff Moyer 2010-06-22 22:13 ` [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Joel Becker ` (2 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-22 21:35 UTC (permalink / raw) To: axboe, vgoyal; +Cc: linux-kernel, linux-ext4, Jeff Moyer This patch gets CFQ back in line with deadline for iozone runs, especially those testing small files + fsync timings. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> --- fs/jbd2/journal.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index bc2ff59..aba4754 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -41,6 +41,7 @@ #include <linux/hash.h> #include <linux/log2.h> #include <linux/vmalloc.h> +#include <linux/blkdev.h> #define CREATE_TRACE_POINTS #include <trace/events/jbd2.h> @@ -580,6 +581,11 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) while (tid_gt(tid, journal->j_commit_sequence)) { jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n", tid, journal->j_commit_sequence); + /* + * Give up our I/O scheduler time slice to allow the journal + * thread to issue I/O. + */ + blk_yield(journal->j_dev->bd_disk->queue, journal->j_task); wake_up(&journal->j_wait_commit); spin_unlock(&journal->j_state_lock); wait_event(journal->j_wait_done_commit, -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer ` (2 preceding siblings ...) 2010-06-22 21:35 ` [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits Jeff Moyer @ 2010-06-22 22:13 ` Joel Becker 2010-06-23 9:20 ` Christoph Hellwig 2010-06-23 9:30 ` Tao Ma 5 siblings, 0 replies; 24+ messages in thread From: Joel Becker @ 2010-06-22 22:13 UTC (permalink / raw) To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4 On Tue, Jun 22, 2010 at 05:34:59PM -0400, Jeff Moyer wrote: > Running iozone with the fsync flag, or fs_mark, the performance of CFQ is > far worse than that of deadline for enterprise class storage when dealing > with file sizes of 8MB or less. I used the following command line as a > representative test case: > > fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F I'd be interested in how ocfs2 does, because we use jbd2 too. Joel -- Life's Little Instruction Book #139 "Never deprive someone of hope; it might be all they have." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer ` (3 preceding siblings ...) 2010-06-22 22:13 ` [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Joel Becker @ 2010-06-23 9:20 ` Christoph Hellwig 2010-06-23 13:03 ` Jeff Moyer 2010-06-23 9:30 ` Tao Ma 5 siblings, 1 reply; 24+ messages in thread From: Christoph Hellwig @ 2010-06-23 9:20 UTC (permalink / raw) To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4 On Tue, Jun 22, 2010 at 05:34:59PM -0400, Jeff Moyer wrote: > Hi, > > Running iozone with the fsync flag, or fs_mark, the performance of CFQ is > far worse than that of deadline for enterprise class storage when dealing > with file sizes of 8MB or less. I used the following command line as a > representative test case: > > fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F > > When run using the deadline I/O scheduler, an average of the first 5 numbers > will give you 448.4 files / second. CFQ will yield only 106.7. With > this patch series applied (and the two patches I sent yesterday), CFQ now > achieves 462.5 files / second. > > This patch set is still an RFC. I'd like to make it perform better when > there is a competing sequential reader present. For now, I've addressed > the concerns voiced about the previous posting. What happened to the initial idea of just using the BIO_RW_META flag for log writes? In the end log writes are the most important writes you have in a journaled filesystem, and they should not be effect to any kind of queue idling logic or other interruption. Log I/O is usually very little (unless you use old XFS code with a worst-case directory manipulation workload), and very latency sensitive. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-23 9:20 ` Christoph Hellwig @ 2010-06-23 13:03 ` Jeff Moyer 0 siblings, 0 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-23 13:03 UTC (permalink / raw) To: Christoph Hellwig; +Cc: axboe, vgoyal, linux-kernel, linux-ext4 Christoph Hellwig <hch@infradead.org> writes: > On Tue, Jun 22, 2010 at 05:34:59PM -0400, Jeff Moyer wrote: >> Hi, >> >> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is >> far worse than that of deadline for enterprise class storage when dealing >> with file sizes of 8MB or less. I used the following command line as a >> representative test case: >> >> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F >> >> When run using the deadline I/O scheduler, an average of the first 5 numbers >> will give you 448.4 files / second. CFQ will yield only 106.7. With >> this patch series applied (and the two patches I sent yesterday), CFQ now >> achieves 462.5 files / second. >> >> This patch set is still an RFC. I'd like to make it perform better when >> there is a competing sequential reader present. For now, I've addressed >> the concerns voiced about the previous posting. > > What happened to the initial idea of just using the BIO_RW_META flag > for log writes? In the end log writes are the most important writes you > have in a journaled filesystem, and they should not be effect to any > kind of queue idling logic or other interruption. Log I/O is usually > very little (unless you use old XFS code with a worst-case directory > manipulation workload), and very latency sensitive. Vivek showed that starting firefox in the presence of a processing doing fsyncs (using the RQ_META approach) took twice as long as without the patch: http://lkml.org/lkml/2010/4/6/276 Cheers, Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer ` (4 preceding siblings ...) 2010-06-23 9:20 ` Christoph Hellwig @ 2010-06-23 9:30 ` Tao Ma 2010-06-23 13:06 ` Jeff Moyer 2010-06-24 5:54 ` Tao Ma 5 siblings, 2 replies; 24+ messages in thread From: Tao Ma @ 2010-06-23 9:30 UTC (permalink / raw) To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4 Hi Jeff, On 06/23/2010 05:34 AM, Jeff Moyer wrote: > Hi, > > Running iozone with the fsync flag, or fs_mark, the performance of CFQ is > far worse than that of deadline for enterprise class storage when dealing > with file sizes of 8MB or less. I used the following command line as a > representative test case: > > fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F > > When run using the deadline I/O scheduler, an average of the first 5 numbers > will give you 448.4 files / second. CFQ will yield only 106.7. With > this patch series applied (and the two patches I sent yesterday), CFQ now > achieves 462.5 files / second. which 2 patches? Could you paste the link or the subject? Just want to make my test env like yours. ;) As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like to give it a try and give you some feedback about the test. Regards, Tao > > This patch set is still an RFC. I'd like to make it perform better when > there is a competing sequential reader present. For now, I've addressed > the concerns voiced about the previous posting. > > Review and testing would be greatly appreciated. > > Thanks! > Jeff > > --- > > New from the last round: > > - removed the think time calculation I added for the sync-noidle service tree > - replaced above with a suggestion from Vivek to only guard against currently > active sequential readers when determining if we can preempt the sync-noidle > service tree. > - bug fixes > > Over all, I think it's simpler now thanks to the suggestions from Jens and > Vivek. > > [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler. > [PATCH 2/3] jbd: yield the device queue when waiting for commits > [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-23 9:30 ` Tao Ma @ 2010-06-23 13:06 ` Jeff Moyer 2010-06-24 5:54 ` Tao Ma 1 sibling, 0 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-23 13:06 UTC (permalink / raw) To: Tao Ma; +Cc: axboe, vgoyal, linux-kernel, linux-ext4 Tao Ma <tao.ma@oracle.com> writes: > Hi Jeff, > > On 06/23/2010 05:34 AM, Jeff Moyer wrote: >> Hi, >> >> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is >> far worse than that of deadline for enterprise class storage when dealing >> with file sizes of 8MB or less. I used the following command line as a >> representative test case: >> >> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F >> >> When run using the deadline I/O scheduler, an average of the first 5 numbers >> will give you 448.4 files / second. CFQ will yield only 106.7. With >> this patch series applied (and the two patches I sent yesterday), CFQ now >> achieves 462.5 files / second. > which 2 patches? Could you paste the link or the subject? Just want to > make my test env like yours. ;) > As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd > like to give it a try and give you some feedback about the test. http://lkml.org/lkml/2010/6/21/307: [PATCH 1/2] cfq: always return false from should_idle if slice_idle is set to zero [PATCH 2/2] cfq: allow dispatching of both sync and async I/O together Thanks in advance for the testing! -Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-23 9:30 ` Tao Ma 2010-06-23 13:06 ` Jeff Moyer @ 2010-06-24 5:54 ` Tao Ma 2010-06-24 14:56 ` Jeff Moyer 2010-06-27 13:48 ` Jeff Moyer 1 sibling, 2 replies; 24+ messages in thread From: Tao Ma @ 2010-06-24 5:54 UTC (permalink / raw) To: Tao Ma Cc: Jeff Moyer, axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker, Sunil Mushran, ocfs2-devel [-- Attachment #1: Type: text/plain, Size: 1180 bytes --] Hi Jeff, On 06/23/2010 05:30 PM, Tao Ma wrote: > Hi Jeff, > > On 06/23/2010 05:34 AM, Jeff Moyer wrote: >> Hi, >> >> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is >> far worse than that of deadline for enterprise class storage when dealing >> with file sizes of 8MB or less. I used the following command line as a >> representative test case: >> >> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w >> 4096 -F >> >> When run using the deadline I/O scheduler, an average of the first 5 >> numbers >> will give you 448.4 files / second. CFQ will yield only 106.7. With >> this patch series applied (and the two patches I sent yesterday), CFQ now >> achieves 462.5 files / second. > which 2 patches? Could you paste the link or the subject? Just want to > make my test env like yours. ;) > As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like > to give it a try and give you some feedback about the test. I am sorry to say that the patch make jbd2 locked up when I tested fs_mark using ocfs2. I have attached the log from my netconsole server. After I reverted the patch [3/3], the box works again. Regards, Tao [-- Attachment #2: lockup.log --] [-- Type: text/x-log, Size: 4429 bytes --] BUG: soft lockup - CPU#0 stuck for 61s! [jbd2/sda11-15:5456] Modules linked in: ocfs2 jbd2 ocfs2_nodemanager configfs ocfs2_stackglue netconsole autofs4 hidp rfcomm l2cap crc16 bluetooth rfkill sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc battery acpi_memhotplug ac lp sg dcdbas sr_mod cdrom option usb_wwan usbserial serio_raw rtc_cmos rtc_core parport_pc parport rtc_lib snd_hda_codec_analog tpm_tis tpm tpm_bios button snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq e1000 tg3 snd_seq_device libphy i2c_i801 snd_pcm_oss snd_mixer_oss i2c_core snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode] CPU 0 Modules linked in: ocfs2 jbd2 ocfs2_nodemanager configfs ocfs2_stackglue netconsole autofs4 hidp rfcomm l2cap crc16 bluetooth rfkill sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc battery acpi_memhotplug ac lp sg dcdbas sr_mod cdrom option usb_wwan usbserial serio_raw rtc_cmos rtc_core parport_pc parport rtc_lib snd_hda_codec_analog tpm_tis tpm tpm_bios button snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq e1000 tg3 snd_seq_device libphy i2c_i801 snd_pcm_oss snd_mixer_oss i2c_core snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode] Pid: 5456, comm: jbd2/sda11-15 Not tainted 2.6.35-rc3+ #4 0MM599/OptiPlex 745 RIP: 0010:[<ffffffff822fcfe7>] [<ffffffff822fcfe7>] _raw_spin_lock+0xe/0x15 RSP: 0018:ffff88012780de78 EFLAGS: 00000297 RAX: 0000000000001d1c RBX: ffff88012fbb4000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88012fbd9650 RDI: ffff88012fbb4024 RBP: ffffffff820031ce R08: ffff88012780c000 R09: 0000000000000000 R10: ffff88012fbd8fa8 R11: 0000000000000000 R12: ffff88013a545880 R13: ffffffff8202d2b5 R14: 0000000000000000 R15: ffff880001e13640 FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000127853000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process jbd2/sda11-15 (pid: 5456, threadinfo ffff88012780c000, task ffff88012fbd8f60) Stack: ffffffffa027b955 0000000000000000 ffff88012fbd8f60 ffffffff8204de27 ffff88012780de98 ffff88012780de98 ffff88012fbfbae8 0000000000000292 ffff88012780def8 ffff88012fbb4000 ffff88012fbfbae0 ffffffffa027b7fa Call Trace: [<ffffffffa027b955>] ? kjournald2+0x15b/0x1cf [jbd2] [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffffa027b7fa>] ? kjournald2+0x0/0x1cf [jbd2] [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 Code: e0 8d 90 00 01 00 00 75 05 3e 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 c3 b8 00 01 00 00 3e 66 0f c1 07 38 e0 74 06 f3> 90 8a 07 eb f6 c3 9c 58 fa ba 00 01 00 00 3e 66 0f c1 17 38 Call Trace: [<ffffffffa027b955>] ? kjournald2+0x15b/0x1cf [jbd2] [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffffa027b7fa>] ? kjournald2+0x0/0x1cf [jbd2] [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-24 5:54 ` Tao Ma @ 2010-06-24 14:56 ` Jeff Moyer 2010-06-27 13:48 ` Jeff Moyer 1 sibling, 0 replies; 24+ messages in thread From: Jeff Moyer @ 2010-06-24 14:56 UTC (permalink / raw) To: Tao Ma Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker, Sunil Mushran, ocfs2-devel Tao Ma <tao.ma@oracle.com> writes: > Hi Jeff, > [...] > I am sorry to say that the patch make jbd2 locked up when I tested > fs_mark using ocfs2. > I have attached the log from my netconsole server. After I reverted > the patch [3/3], the box works again. Thanks for the report, Tao, I'll try to reproduce it here and get back to you. Cheers, Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-24 5:54 ` Tao Ma 2010-06-24 14:56 ` Jeff Moyer @ 2010-06-27 13:48 ` Jeff Moyer 2010-06-28 6:41 ` Tao Ma 1 sibling, 1 reply; 24+ messages in thread From: Jeff Moyer @ 2010-06-27 13:48 UTC (permalink / raw) To: Tao Ma Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker, Sunil Mushran, ocfs2-devel Tao Ma <tao.ma@oracle.com> writes: > Hi Jeff, > > On 06/23/2010 05:30 PM, Tao Ma wrote: >> Hi Jeff, >> >> On 06/23/2010 05:34 AM, Jeff Moyer wrote: >>> Hi, >>> >>> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is >>> far worse than that of deadline for enterprise class storage when dealing >>> with file sizes of 8MB or less. I used the following command line as a >>> representative test case: >>> >>> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w >>> 4096 -F >>> >>> When run using the deadline I/O scheduler, an average of the first 5 >>> numbers >>> will give you 448.4 files / second. CFQ will yield only 106.7. With >>> this patch series applied (and the two patches I sent yesterday), CFQ now >>> achieves 462.5 files / second. >> which 2 patches? Could you paste the link or the subject? Just want to >> make my test env like yours. ;) >> As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like >> to give it a try and give you some feedback about the test. > I am sorry to say that the patch make jbd2 locked up when I tested > fs_mark using ocfs2. > I have attached the log from my netconsole server. After I reverted > the patch [3/3], the box works again. I can't reproduce this, unfortunately. Also, when building with the .config you sent me, the disassembly doesn't line up with the stack trace you posted. I'm not sure why yielding the queue would cause a deadlock. The only explanation I can come up with is that I/O is not being issued. I'm assuming that no other I/O will be completed to the file system in question. Is that right? Could you send along the output from sysrq-t? Cheers, Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-27 13:48 ` Jeff Moyer @ 2010-06-28 6:41 ` Tao Ma 2010-06-28 13:58 ` Jeff Moyer 2010-06-29 14:56 ` Jeff Moyer 0 siblings, 2 replies; 24+ messages in thread From: Tao Ma @ 2010-06-28 6:41 UTC (permalink / raw) To: Jeff Moyer Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker, Sunil Mushran, ocfs2-devel [-- Attachment #1: Type: text/plain, Size: 2226 bytes --] Hi Jeff, On 06/27/2010 09:48 PM, Jeff Moyer wrote: > Tao Ma<tao.ma@oracle.com> writes: > >> Hi Jeff, >> >> On 06/23/2010 05:30 PM, Tao Ma wrote: >>> Hi Jeff, >>> >>> On 06/23/2010 05:34 AM, Jeff Moyer wrote: >>>> Hi, >>>> >>>> Running iozone with the fsync flag, or fs_mark, the performance of CFQ is >>>> far worse than that of deadline for enterprise class storage when dealing >>>> with file sizes of 8MB or less. I used the following command line as a >>>> representative test case: >>>> >>>> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w >>>> 4096 -F >>>> >>>> When run using the deadline I/O scheduler, an average of the first 5 >>>> numbers >>>> will give you 448.4 files / second. CFQ will yield only 106.7. With >>>> this patch series applied (and the two patches I sent yesterday), CFQ now >>>> achieves 462.5 files / second. >>> which 2 patches? Could you paste the link or the subject? Just want to >>> make my test env like yours. ;) >>> As Joel mentioned in another mail, ocfs2 also use jbd/jbd2, so I'd like >>> to give it a try and give you some feedback about the test. >> I am sorry to say that the patch make jbd2 locked up when I tested >> fs_mark using ocfs2. >> I have attached the log from my netconsole server. After I reverted >> the patch [3/3], the box works again. > > I can't reproduce this, unfortunately. Also, when building with the > .config you sent me, the disassembly doesn't line up with the stack > trace you posted. > > I'm not sure why yielding the queue would cause a deadlock. The only > explanation I can come up with is that I/O is not being issued. I'm > assuming that no other I/O will be completed to the file system in > question. Is that right? Could you send along the output from sysrq-t? yes, I just mounted it and begin the test, so there should be no outstanding I/O. So do you need me to setup another disk for test? I have attached the sysrq output in sysrq.log. please check. btw, I also met with a NULL pointer deference in cfq_yield. I have attached the null.log also. This seems to be related to the previous deadlock and happens when I try to remount the same volume after reboot and ocfs2 try to do some recovery. Regards, Tao [-- Attachment #2: sysrq.log --] [-- Type: text/x-log, Size: 33613 bytes --] SysRq : Show State task PC stack pid father init R running task 0 1 0 0x00000000 ffff88013badb510 0000000000000082 ffffffff8271b020 0000000001e0ef88 0000000000000286 ffffffff82050937 0000000000000286 ffffffff820507e7 0000000000000286 ffff88013badb510 00000000004c4b3f ffff88013baddad8 Call Trace: [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff820d053c>] ? do_select+0x46c/0x4ed [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff820339ce>] ? __wake_up+0x30/0x44 [<ffffffffa00245fa>] ? journal_stop+0x20f/0x222 [jbd] [<ffffffff821698f2>] ? number+0x121/0x223 [<ffffffffa0043d2e>] ? __ext3_journal_stop+0x1f/0x3d [ext3] [<ffffffffa003dd48>] ? ext3_writeback_write_end+0x94/0xb7 [ext3] [<ffffffff8216b769>] ? vsnprintf+0x3e4/0x421 [<ffffffff820d217d>] ? __d_lookup+0xb7/0xf6 [<ffffffff820ca799>] ? do_lookup+0x89/0x1e2 [<ffffffff820d164b>] ? dput+0x27/0x116 [<ffffffff820ccc4c>] ? link_path_walk+0x90a/0x930 [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa [<ffffffff820cdc25>] ? user_path_at+0x52/0x79 [<ffffffff820c5d05>] ? cp_new_stat+0xea/0x101 [<ffffffff8203b7e7>] ? timespec_add_safe+0x37/0x66 [<ffffffff820d0bda>] ? sys_select+0x9a/0xc3 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b kthreadd S 0000000000000000 0 2 0 0x00000000 ffff88013badae20 0000000000000046 ffff88013747b550 00000000ffffffff ffffffff82003610 0000000000000010 0000000000000202 0000000000000000 0000000000000018 ffff88013badae20 ffff880128015ac8 0000000000000000 Call Trace: [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 [<ffffffff8204dcbf>] ? kthreadd+0x72/0xeb [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204dc4d>] ? kthreadd+0x0/0xeb [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 ksoftirqd/0 S 0000000000000000 0 3 2 0x00000000 ffff88013bada730 0000000000000046 ffff88013bae8e60 0000000000013640 ffff88013bada730 0000000000000000 ffffffff8272bbe0 ffffffff822fb8de ffff88013badb510 0000000000000000 ffffffff82791c00 ffff88013baddd88 Call Trace: [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 [<ffffffff8203c2a6>] ? run_ksoftirqd+0x0/0x116 [<ffffffff8203c2ed>] ? run_ksoftirqd+0x47/0x116 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 migration/0 S 0000000000000000 0 4 2 0x00000000 ffff88013bada040 0000000000000046 ffff88013764cea0 000000008205d592 000000033badb510 ffff88013abbfe68 ffffffff820651c6 ffff88013abbfe70 0000000000000282 ffff880001e0f000 ffffffff820651c6 ffff88013abbfed8 Call Trace: [<ffffffff820651c6>] ? stop_machine_cpu_stop+0x0/0x8f [<ffffffff820651c6>] ? stop_machine_cpu_stop+0x0/0x8f [<ffffffff82065696>] ? cpu_stopper_thread+0x119/0x13f [<ffffffff8202d2b5>] ? finish_task_switch+0x33/0x80 [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 [<ffffffff8206557d>] ? cpu_stopper_thread+0x0/0x13f [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 watchdog/0 R running task 0 5 2 0x00000000 ffff88013bae9550 0000000000000046 ffffffff8271b020 0000000000000086 01ff88013bae9550 ffffffff82051a4d 0000000000000286 0000000000000286 000000000000f070 ffff88013bae9550 0000000000000000 ffff88013baddd98 Call Trace: [<ffffffff82051a4d>] ? sched_clock_local+0x9/0x6c [<ffffffff8206f29a>] ? watchdog+0x0/0x9e [<ffffffff8206f2e9>] ? watchdog+0x4f/0x9e [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 events/0 R running task 0 6 2 0x00000000 ffff88013bae8e60 0000000000000046 ffff88013b032ab0 0000000082042340 ffff88013baf1e90 ffff880001e16318 0000000000000001 ffffffff8204dff0 0000000000000246 ffff88013baf1ef8 ffffffff82747420 ffff880001e16300 Call Trace: [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff821dc54f>] ? console_callback+0x0/0xf5 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 khelper S ffffffff82049cae 0 7 2 0x00000000 ffff88013bae8770 0000000000000046 ffff88013747b550 0000000000000000 ffff88013baf3e90 ffff880001e16398 0000000000000001 ffffffff8204dff0 ffff880001e16380 ffff88013baf3ef8 ffff88012f9872c0 ffff880001e16380 Call Trace: [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff82049cae>] ? __call_usermodehelper+0x0/0x6f [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 async/mgr S 0000000000000000 0 45 2 0x00000000 ffff88013bb62970 0000000000000046 ffff8801374be9b0 000000008202e0e1 ffffffff8272bbe0 000000008202d2b5 0000000000000003 0000000000013640 0000000000000282 0000000000000001 0000000000000001 ffff88013badddb8 Call Trace: [<ffffffff82052a50>] ? async_manager_thread+0x0/0xe8 [<ffffffff82052b12>] ? async_manager_thread+0xc2/0xe8 [<ffffffff8202e0f3>] ? default_wake_function+0x0/0x9 [<ffffffff82052a50>] ? async_manager_thread+0x0/0xe8 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 sync_supers R running task 0 153 2 0x00000000 ffff88013b048040 0000000000000046 ffffffff8271b020 0000000000000000 ffffffff8272bbe0 ffffffff822fb8de ffff88013badb510 0000000082027cc9 000000033b093610 ffff88013b095ef8 0000000000000000 ffff88013baddda8 Call Trace: [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 [<ffffffff820a1e15>] ? bdi_sync_supers+0x0/0x54 [<ffffffff820a1e54>] ? bdi_sync_supers+0x3f/0x54 [<ffffffff820a1e15>] ? bdi_sync_supers+0x0/0x54 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 bdi-default R running task 0 155 2 0x00000000 ffff88013b048e20 0000000000000046 ffffffff8271b020 00000000ffffffff ffff88013b08ddf0 00000000fffda363 0000000000000286 ffffffff820420c2 0000000000000286 ffffffff82894dc0 ffffffff82894dc0 00000000fffdb6eb Call Trace: [<ffffffff820420c2>] ? try_to_del_timer_sync+0xa4/0xaf [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff82042139>] ? process_timeout+0x0/0x5 [<ffffffff820a246f>] ? bdi_forker_task+0x160/0x295 [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 [<ffffffff820a230f>] ? bdi_forker_task+0x0/0x295 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 kblockd/0 S ffffffff8216160d 0 156 2 0x00000000 ffff88013b049510 0000000000000046 ffff88013747b550 000000003a1caa20 ffff88013b081e90 ffff880001e16498 0000000000000001 ffffffff8204dff0 ffff88013a1caa20 ffff88013b081ef8 ffff88013a2a5b48 ffff880001e16480 Call Trace: [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8216160d>] ? cfq_kick_queue+0x0/0x3a [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 kacpid S ffffffff82198b60 0 158 2 0x00000000 ffff88013b03caf0 0000000000000046 ffff88013b03d1e0 000000008202cfe7 ffff88013bbcfe90 ffff880001e16518 0000000000000001 ffffffff8204dff0 ffffffff8272bbe0 ffff88013bbcfef8 ffff88013bbb6620 ffff880001e16500 Call Trace: [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff82198b60>] ? bind_to_cpu0+0x0/0x22 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 kacpi_notify S ffffffff82198b60 0 159 2 0x00000000 ffff88013b03d1e0 0000000000000046 ffff88013b03d8d0 000000008202cfe7 ffff88013b021e90 ffff880001e16598 0000000000000001 ffffffff8204dff0 ffffffff8272bbe0 ffff88013b021ef8 ffff88013bbb6620 ffff880001e16580 Call Trace: [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff82198b60>] ? bind_to_cpu0+0x0/0x22 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 kacpi_hotplug S ffffffff82198b60 0 160 2 0x00000000 ffff88013b03d8d0 0000000000000046 ffff88013badb510 000000008202cfe7 ffff88013b023e90 ffff880001e16618 0000000000000001 ffffffff8204dff0 ffffffff8272bbe0 ffff88013b023ef8 ffff88013bbb6620 ffff880001e16600 Call Trace: [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff82198b60>] ? bind_to_cpu0+0x0/0x22 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 khubd S 0000000000000001 0 245 2 0x00000000 ffff88013b052770 0000000000000046 ffff88013bb63060 0000000000000004 ffff88013bbd9e70 ffffffff82752610 0000000000000001 ffffffff8204dff0 ffff88013a107230 ffffffff82752600 0000000000000000 ffff88013a107200 Call Trace: [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8221b6c0>] ? hub_thread+0xcee/0xddc [<ffffffff8202d2b5>] ? finish_task_switch+0x33/0x80 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8221a9d2>] ? hub_thread+0x0/0xddc [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 kseriod S ffff88013b0ab5c0 0 248 2 0x00000000 ffff88013b088100 0000000000000046 ffff88013bb63060 00000000821f3499 ffff88013b0b1e90 ffffffff82753c80 0000000000000001 ffffffff8204dff0 Call Trace: [<ffffffff8222cc37>] ? serio_thread+0x0/0x2c5 0000000000000000 ffffffff820420c2 ffffffff82894dc0 [<ffffffff82042139>] ? process_timeout+0x0/0x5 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 kswapd0 S ffffffff8204dff0 [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8209b521>] ? kswapd+0x0/0x591 [<ffffffff8204db33>] ? kthread+0x79/0x81 0 305 2 0x00000000 [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 0000000000000001 Call Trace: [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 0000000000000000 [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 ffff880001e1ab98 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 0000000000000000 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e ffff88013bb4cfe0 Call Trace: [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 ffff88013a2a5010 [<ffffffffa0064d38>] ? scsi_error_handler+0x0/0x576 [scsi_mod] [<ffffffff8204db33>] ? kthread+0x79/0x81 0000000000000293 ffff88013a342800 [<ffffffffa0064d38>] ? scsi_error_handler+0x0/0x576 [scsi_mod] ffff88013b053550 ffff88013a342000 [<ffffffffa0064d38>] ? scsi_error_handler+0x0/0x576 [scsi_mod] [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 000000003a383e18 00000000fffdb6c4 [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 0000000000000046 0000000000000000 [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 0000000000000400 ffffffff82894dc0 [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 [<ffffffff8204db33>] ? kthread+0x79/0x81 ffff88013b0887f0 0000000000000046 Call Trace: [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 0 555 2 0x00000000 0000000000000286 [<ffffffff82042139>] ? process_timeout+0x0/0x5 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 ffff88013baf9590 [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 flush-1:6 R 0000000000000000 [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 0000000000000046 00000000fffdb6cf [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 flush-1:8 R running task 000000003a3cfe18 [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 flush-1:9 R 0000000000000000 [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff820dc772>] ? bdi_writeback_task+0x79/0x104 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 000000003a3d5e18 00000000fffdb6d6 [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 0000000000000000 00000000fffdb6d6 [<ffffffff820a224d>] ? bdi_start_fn+0x0/0xc2 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 0000000000000046 0000000000000000 Call Trace: [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 00000000fffda355 [<ffffffff820a22b0>] ? bdi_start_fn+0x63/0xc2 running task [<ffffffff820dc772>] ? bdi_writeback_task+0x79/0x104 running task [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 ffff88013a3ee590 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 ffff88013b01c340 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 0000000000000000 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff82133275>] ? cap_file_permission+0x0/0x3 [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 0 638 2 0x00000000 Call Trace: [<ffffffff822fbe0d>] ? schedule_timeout+0x1ba/0x1e0 [<ffffffff82042139>] ? process_timeout+0x0/0x5 [<ffffffff8204db33>] ? kthread+0x79/0x81 ffff88013b027160 ffff88013abe37c0 [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 0000000000000046 Call Trace: [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 ffff88013747ae60 ffff880001e1b380 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 ffffffff8204dff0 [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff8204daba>] ? kthread+0x0/0x81 Call Trace: [<ffffffffa002907d>] ? kjournald+0x0/0x1f4 [jbd] 0000000000000000 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffffa0398550>] ? process_req+0x0/0x12d [ib_addr] [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 0000000000000046 0000000000000001 Call Trace: [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 0000000000000000 [<ffffffff8204af28>] ? worker_thread+0x93/0x1e4 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8204db33>] ? kthread+0x79/0x81 0 2512 2 0x00000000 Call Trace: [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 0000000000000286 Call Trace: [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b ffffffff82050937 Call Trace: [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff8216b8b0>] ? snprintf+0x44/0x4c [<ffffffff821072bd>] ? proc_flush_task+0x130/0x25a [<ffffffff82002807>] ? system_call_after_swapgs+0x17/0x65 [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b [<ffffffff820d053c>] ? do_select+0x46c/0x4ed [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff822642d5>] ? sock_aio_write+0xf5/0x10d [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b ffffffff8289a010 [<ffffffff82058213>] ? futex_wait_queue_me+0xb6/0xd1 [<ffffffff8209167e>] ? __generic_file_aio_write+0x338/0x3a2 [<ffffffff820e01b3>] ? vfs_statfs+0x5b/0x76 0000000000000086 [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff820c2986>] ? do_sync_read+0xab/0xeb ffff88013764cea0 Call Trace: [<ffffffff8204dd48>] ? bit_waitqueue+0x10/0xa0 [<ffffffff821524a5>] ? elv_insert+0x10e/0x1c0 [<ffffffff82160575>] ? cfq_close_cooperator+0xd3/0x163 [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 0 3158 1 0x00000080 [<ffffffff8210d843>] ? kmsg_read+0x44/0x4e [<ffffffff820c342d>] ? sys_read+0x45/0x6e ffff88013bb035d0 [<ffffffff82299263>] ? ip_cork_release+0x2e/0x3b [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e [<ffffffff82264670>] ? sock_sendmsg+0xc0/0xdb [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff822d8f16>] ? unix_dgram_sendmsg+0x3eb/0x473 [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa [<ffffffff82266086>] ? sys_sendto+0xf3/0x127 [<ffffffff820d0bda>] ? sys_select+0x9a/0xc3 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b rpciod/0 S 0000000000000000 ffff88012f42be90 Call Trace: [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 ffffffff8271b020 ffffffff82894dc0 [<ffffffff82042139>] ? process_timeout+0x0/0x5 [<ffffffff8202e0f3>] ? default_wake_function+0x0/0x9 ffff88013764cea0 [<ffffffff82031caa>] ? check_preempt_wakeup+0xed/0x14b [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 0000000000000286 [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb [<ffffffff8226a8fa>] ? skb_queue_tail+0x17/0x3e [<ffffffff822634e4>] ? sock_ioctl+0x207/0x215 [<ffffffff822fbc74>] ? schedule_timeout+0x21/0x1e0 [<ffffffff82091f4a>] ? filemap_fault+0xb6/0x310 0 3402 2 0x00000080 [<ffffffff8204db33>] ? kthread+0x79/0x81 ffff88012faafe98 [<ffffffffa023f268>] ? rfcomm_run+0x190/0x1198 [rfcomm] [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff82054c34>] ? ktime_get_ts+0x68/0xb1 [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa 0 3472 1 0x00000080 [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 0 3476 1 0x00000080 [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff822d8982>] ? unix_wait_for_peer+0x9e/0xaa [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8226a8fa>] ? skb_queue_tail+0x17/0x3e [<ffffffff822d8f16>] ? unix_dgram_sendmsg+0x3eb/0x473 [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 00007f92897f2010 Call Trace: [<ffffffff820451e1>] ? __dequeue_signal+0x15/0x11b [<ffffffff820c3ba4>] ? fput+0x1b0/0x1e1 ffff88013baf8ea0 ffff88012fadbb88 [<ffffffff82058324>] ? futex_wait+0xf6/0x230 [<ffffffff820598ac>] ? do_futex+0xa4/0xb1c [<ffffffff820a5dd7>] ? handle_mm_fault+0x754/0x7b3 0 3504 1 0x00000080 Call Trace: [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 [<ffffffff8202518b>] ? flush_tlb_page+0x4f/0x6e [<ffffffff8205a43f>] ? sys_futex+0x11b/0x139 000000003abe30c0 [<ffffffff8202d2d0>] ? finish_task_switch+0x4e/0x80 [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff82095445>] ? get_page_from_freelist+0x425/0x577 [<ffffffff820a3c2a>] ? __do_fault+0x3a4/0x3e5 [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 ffff88013baf8ea0 0000000000000000 [<ffffffff82074a5c>] ? delayacct_end+0x74/0x7f [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff820598d2>] ? do_futex+0xca/0xb1c [<ffffffff820c1708>] ? __dentry_open+0x18d/0x278 [<ffffffff8202d2b5>] ? finish_task_switch+0x33/0x80 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 000280da00000010 Call Trace: [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff82090947>] ? find_get_page+0x18/0x78 [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 0 3544 1 0x00000080 [<ffffffff822fd150>] ? _raw_spin_lock_bh+0x9/0x1f [<ffffffff8229f0d8>] ? inet_csk_accept+0xb6/0x205 [<ffffffff8206bf98>] ? audit_syscall_entry+0x1bd/0x1e8 00000000004c4b3f [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff8209583e>] ? __alloc_pages_nodemask+0x104/0x568 [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b ffff88013a08a8f0 Call Trace: [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb [<ffffffffa003c0d0>] ? ext3_get_blocks_handle+0x9b/0x8df [ext3] [<ffffffff82090947>] ? find_get_page+0x18/0x78 [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffffa003dd48>] ? ext3_writeback_write_end+0x94/0xb7 [ext3] [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa [<ffffffff8202511d>] ? flush_tlb_mm+0x51/0x70 0 3615 1 0x00000080 [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffffa003dd48>] ? ext3_writeback_write_end+0x94/0xb7 [ext3] [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b ffff88012fb99ad8 [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb [<ffffffff8226a8fa>] ? skb_queue_tail+0x17/0x3e 0 3657 1 0x00000080 Call Trace: [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff820d910d>] ? seq_printf+0x67/0x8f ffff88013a08b6d0 Call Trace: 0000000000000286 [<ffffffff82095445>] ? get_page_from_freelist+0x425/0x577 [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff82090947>] ? find_get_page+0x18/0x78 [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 ffffffff82050937 [<ffffffff822fc66f>] ? do_nanosleep+0x73/0xa7 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b ffffffff82050937 Call Trace: [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff8204de16>] ? wake_up_bit+0x11/0x22 [<ffffffffa0040503>] ? ext3_find_entry+0x4eb/0x586 [ext3] [<ffffffff820d079e>] ? core_sys_select+0x1e1/0x2aa atd S ffffffff822ffed3 [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 [<ffffffff822fc66f>] ? do_nanosleep+0x73/0xa7 [<ffffffff820509ea>] ? hrtimer_nanosleep+0x86/0xf1 [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 [<ffffffff82050aab>] ? sys_nanosleep+0x56/0x6c 0000000001e0ef88 [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e [<ffffffff820598d2>] ? do_futex+0xca/0xb1c [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 0000000001e0ef88 [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 0000000000000082 Call Trace: [<ffffffff822643ee>] ? sock_aio_read+0x101/0x119 [<ffffffff8216ca4e>] ? __strncpy_from_user+0x28/0x41 0 3818 1 0x00000080 0000000000000003 [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff82164448>] ? cpumask_any_but+0x27/0x33 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 0000000000000001 0000000000000003 [<ffffffff822fc53a>] ? schedule_hrtimeout_range_clock+0x3c/0xeb [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff82164448>] ? cpumask_any_but+0x27/0x33 [<ffffffff820a5dd7>] ? handle_mm_fault+0x754/0x7b3 [<ffffffff820c2de6>] ? do_readv_writev+0x182/0x197 [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 Call Trace: [<ffffffff8223449a>] ? evdev_read+0xc7/0x200 hald-addon-ke S [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b ffff88013b032ab0 ffff88013a5fa000 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e running task [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 0 3890 1 0x00000080 [<ffffffff820507e7>] ? hrtimer_try_to_cancel+0x90/0x9b [<ffffffff822fc5c2>] ? schedule_hrtimeout_range_clock+0xc4/0xeb [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 [<ffffffff820cfbc1>] ? poll_schedule_timeout+0x43/0x60 [<ffffffff820cff0e>] ? do_sys_poll+0x2ac/0x37e [<ffffffff820a0f11>] ? zone_statistics+0x3a/0x5b [<ffffffff820a5a3e>] ? handle_mm_fault+0x3bb/0x7b3 [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 Call Trace: [<ffffffff820585e1>] ? futex_wake+0xf2/0x104 [<ffffffff820d6dc9>] ? mntput_no_expire+0x1c/0x97 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 0000000000000286 [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff820a3c2a>] ? __do_fault+0x3a4/0x3e5 [<ffffffff820d0029>] ? sys_poll+0x49/0xa9 [<ffffffff820d0867>] ? __pollwait+0x0/0xd9 [<ffffffff82090947>] ? find_get_page+0x18/0x78 [<ffffffff820d0bda>] ? sys_select+0x9a/0xc3 [<ffffffff8204df83>] ? prepare_to_wait_exclusive+0x35/0x6d [<ffffffff822d9652>] ? unix_accept+0x54/0x101 [<ffffffff820503ee>] ? hrtimer_wakeup+0x0/0x22 [<ffffffff820d0940>] ? pollwake+0x0/0x54 [<ffffffff82091f4a>] ? filemap_fault+0xb6/0x310 [<ffffffff8203b7e7>] ? timespec_add_safe+0x37/0x66 ffff88013a475060 ffff88012f6c7eb8 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff820ce623>] ? sys_fcntl+0x2f4/0x48d Call Trace: [<ffffffff8216c84d>] ? __put_user_4+0x1d/0x30 saslauthd S ffffffff8204dff0 [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffff820f2070>] ? fcntl_setlk+0x1a6/0x25f [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8216c84d>] ? __put_user_4+0x1d/0x30 [<ffffffff820ce623>] ? sys_fcntl+0x2f4/0x48d [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b [<ffffffff82050937>] ? __hrtimer_start_range_ns+0x12f/0x142 [<ffffffff82050aab>] ? sys_nanosleep+0x56/0x6c ffff88013a08a200 ffff88013bb58240 [<ffffffff822ffed3>] ? do_page_fault+0x324/0x346 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 0000001300000000 [<ffffffff820a3c2a>] ? __do_fault+0x3a4/0x3e5 [<ffffffff821ce324>] ? n_tty_read+0x3e9/0x6b2 [<ffffffff821caf36>] ? tty_read+0x6f/0xbb 0 4071 1 0x00000080 ffff8801374da800 [<ffffffff822fbc74>] ? schedule_timeout+0x21/0x1e0 [<ffffffff821cfe10>] ? tty_ldisc_ref_wait+0x10/0x91 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 00000000821db81b ffff88012f758800 [<ffffffff8204ac92>] ? flush_work+0xe/0x7e [<ffffffff821caf36>] ? tty_read+0x6f/0xbb [<ffffffff820c342d>] ? sys_read+0x45/0x6e [<ffffffff820420c2>] ? try_to_del_timer_sync+0xa4/0xaf [<ffffffff821cfe10>] ? tty_ldisc_ref_wait+0x10/0x91 [<ffffffff8200286b>] ? system_call_fastpath+0x16/0x1b 0000001300000000 Call Trace: [<ffffffff8204ac92>] ? flush_work+0xe/0x7e [<ffffffff821cfe10>] ? tty_ldisc_ref_wait+0x10/0x91 0 4085 4069 0x00000080 [<ffffffff82039518>] ? session_of_pgrp+0xe/0x37 [<ffffffff8203a215>] ? do_wait+0x1ba/0x20e 0 4134 2 0x00000080 [<ffffffffa04954dc>] ? ocfs2_truncate_log_worker+0x0/0x140 [ocfs2] [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 o2quot/0 S Call Trace: [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 jbd2/sda11-15 R running task [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e ffff88013bb6f0a0 0000000000000000 [<ffffffff8203a2f9>] ? sys_wait4+0x90/0xab ffffffffa00bac05 [<ffffffff82034d46>] ? __cond_resched+0x1d/0x26 [<ffffffff822ffd62>] ? do_page_fault+0x1b3/0x346 [<ffffffff82029bf1>] ? update_curr+0x77/0xe4 [<ffffffffa005fcde>] ? scsi_done+0x0/0x41 [scsi_mod] [<ffffffff82030fa5>] ? enqueue_task_fair+0x213/0x23b [<ffffffffa02fbdb3>] ? jbd2_journal_start+0x90/0xbf [jbd2] [<ffffffff820911fb>] ? filemap_write_and_wait_range+0x3b/0x4a .sysctl_sched_features : 15471 .nr_uninterruptible : 0 .cpu_load[1] : 55559 .sched_goidle : 11896 .max_vruntime : 19789.211500 .nr_spread_over : 9 .rt_throttled : 0 ---------------------------------------------------------------------------------------------------------- 19786.211500 0.102040 152126.157713 19786.211500 0.184139 152631.345425 19786.211500 0.048046 147560.772980 19786.211500 0.051120 147561.204512 19786.211500 0.047465 147561.197120 last message repeated 3 times 19786.211500 0.036472 147561.895980 iscsid 2520 19786.211500 494 110 19786.211500 0.536960 117158.277565 19786.211500 0.015188 120124.852286 python 3549 19786.211500 24 120 sendmail 3657 19786.211500 27 120 19786.211500 18.205067 106358.280463 19786.211500 1.547554 105330.629658 19786.211500 1.231707 105331.170248 jbd2/sda11-15 4137 89173.901514 2 120 [-- Attachment #3: null.log --] [-- Type: text/x-log, Size: 4153 bytes --] BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff82161537>] cfq_yield+0x5f/0x135 PGD 1342ff067 PUD 1342ee067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:02.0/irq CPU 0 Modules linked in: ocfs2 jbd2 ocfs2_nodemanager configfs ocfs2_stackglue netconsole autofs4 hidp rfcomm l2cap crc16 bluetooth rfkill sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc battery acpi_memhotplug ac lp sg dcdbas sr_mod cdrom option usb_wwan usbserial serio_raw parport_pc parport rtc_cmos rtc_core rtc_lib snd_hda_codec_analog tpm_tis tpm tpm_bios button snd_hda_intel snd_hda_codec snd_seq_dummy tg3 snd_seq_oss snd_seq_midi_event snd_seq e1000 snd_seq_device libphy snd_pcm_oss snd_mixer_oss i2c_i801 i2c_core snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode] Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745 RIP: 0010:[<ffffffff82161537>] [<ffffffff82161537>] cfq_yield+0x5f/0x135 RSP: 0018:ffff880123061c60 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0 RDX: ffff8801255953d8 RSI: 0000000000000000 RDI: ffff88013a2a3800 RBP: ffff88012c3ba770 R08: ffff88012c3a30b8 R09: ffff880001e136a0 R10: ffff88012c3bb598 R11: ffff88012c3a30c8 R12: ffff88013a2a3800 R13: ffff88013a1d0a20 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000001342da000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ocfs2_wq (pid: 4130, threadinfo ffff880123060000, task ffff88013ab38240) Stack: ffff88012c3a3000 ffff88012c3a3024 ffff88013414e580 ffff8801255953d8 0000000000000003 ffffffffa030e260 ffff88012c3a3000 0000000000000003 ffff8801255953d8 0000000000000003 0000000000000282 ffffffffa030db11 Call Trace: [<ffffffffa030e260>] ? jbd2_log_wait_commit+0x3c/0x10e [jbd2] [<ffffffffa030db11>] ? __jbd2_log_start_commit+0x2c/0x33 [jbd2] [<ffffffffa030856b>] ? jbd2_journal_stop+0x1f7/0x21f [jbd2] [<ffffffffa0308db3>] ? jbd2_journal_start+0x90/0xbf [jbd2] [<ffffffffa04cd8f8>] ? ocfs2_commit_trans+0x23/0xb1 [ocfs2] [<ffffffffa04d2212>] ? ocfs2_complete_local_alloc_recovery+0x2fa/0x3a1 [ocfs2] [<ffffffff82029bf1>] ? update_curr+0x77/0xe4 [<ffffffffa04cf681>] ? ocfs2_complete_recovery+0x0/0xab2 [ocfs2] [<ffffffffa04cf887>] ? ocfs2_complete_recovery+0x206/0xab2 [ocfs2] [<ffffffff822fb8de>] ? schedule+0x616/0x6c4 [<ffffffff8204dff0>] ? prepare_to_wait+0x35/0x6d [<ffffffffa04cf681>] ? ocfs2_complete_recovery+0x0/0xab2 [ocfs2] [<ffffffff8204afdf>] ? worker_thread+0x14a/0x1e4 [<ffffffff8204de27>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8204ae95>] ? worker_thread+0x0/0x1e4 [<ffffffff8204db33>] ? kthread+0x79/0x81 [<ffffffff82003614>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8204daba>] ? kthread+0x0/0x81 [<ffffffff82003610>] ? kernel_thread_helper+0x0/0x10 Code: 0f 84 f2 00 last message repeated 2 times 48 8d bd a0 05 00 00 e8 ab ba 19 00 48 8b b5 00 06 00 00 4c 89 e7 e8 69 eb ff ff 49 89 c6 48 8b 85 00 06 00 00 e> 48 ff 00 fe 85 a0 05 00 00 4d 85 f6 0f 84 a6 00 last message repeated 2 times 49 8b RIP [<ffffffff82161537>] cfq_yield+0x5f/0x135 RSP <ffff880123061c60> CR2: 0000000000000000 ---[ end trace 374ddf0f57161b27 ]--- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-28 6:41 ` Tao Ma @ 2010-06-28 13:58 ` Jeff Moyer 2010-06-28 23:16 ` Tao Ma 2010-06-29 14:56 ` Jeff Moyer 1 sibling, 1 reply; 24+ messages in thread From: Jeff Moyer @ 2010-06-28 13:58 UTC (permalink / raw) To: Tao Ma Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker, Sunil Mushran, ocfs2-devel Tao Ma <tao.ma@oracle.com> writes: > btw, I also met with a NULL pointer deference in cfq_yield. I have > attached the null.log also. This seems to be related to the previous > deadlock and happens when I try to remount the same volume after > reboot and ocfs2 try to do some recovery. Since I can't reproduce your binary kernel even with your .config, could you send me the disassembly of the cfq_yield function from your vmlinux binary? Thanks! Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-28 13:58 ` Jeff Moyer @ 2010-06-28 23:16 ` Tao Ma 0 siblings, 0 replies; 24+ messages in thread From: Tao Ma @ 2010-06-28 23:16 UTC (permalink / raw) To: Jeff Moyer; +Cc: axboe, vgoyal, linux-kernel, linux-ext4, ocfs2-devel [-- Attachment #1: Type: text/plain, Size: 632 bytes --] Jeff Moyer wrote: > Tao Ma <tao.ma@oracle.com> writes: > > >> btw, I also met with a NULL pointer deference in cfq_yield. I have >> attached the null.log also. This seems to be related to the previous >> deadlock and happens when I try to remount the same volume after >> reboot and ocfs2 try to do some recovery. >> > > Since I can't reproduce your binary kernel even with your .config, could > you send me the disassembly of the cfq_yield function from your vmlinux > binary? > no problem. I have attached it. btw, if you have any debug patch, I am happy to run it to make the problem more clear to you. Regards, Tao [-- Attachment #2: cfq_yield.txt --] [-- Type: text/plain, Size: 5069 bytes --] ffffffff821614d8 <cfq_yield>: ffffffff821614d8: 41 56 push %r14 ffffffff821614da: 41 55 push %r13 ffffffff821614dc: 49 89 fd mov %rdi,%r13 ffffffff821614df: 41 54 push %r12 ffffffff821614e1: 55 push %rbp ffffffff821614e2: 48 89 f5 mov %rsi,%rbp ffffffff821614e5: 53 push %rbx ffffffff821614e6: 48 8b 47 18 mov 0x18(%rdi),%rax ffffffff821614ea: 4c 8b 60 08 mov 0x8(%rax),%r12 ffffffff821614ee: 65 48 8b 04 25 40 b5 mov %gs:0xb540,%rax ffffffff821614f5: 00 00 ffffffff821614f7: 48 8b b0 00 06 00 00 mov 0x600(%rax),%rsi ffffffff821614fe: 4c 89 e7 mov %r12,%rdi ffffffff82161501: e8 90 eb ff ff callq ffffffff82160096 <cfq_cic_lookup> ffffffff82161506: 48 85 c0 test %rax,%rax ffffffff82161509: 48 89 c3 mov %rax,%rbx ffffffff8216150c: 0f 84 f2 00 00 00 je ffffffff82161604 <cfq_yield+0x12c> ffffffff82161512: 48 8d bd a0 05 00 00 lea 0x5a0(%rbp),%rdi ffffffff82161519: e8 ab ba 19 00 callq ffffffff822fcfc9 <_raw_spin_lock> ffffffff8216151e: 48 8b b5 00 06 00 00 mov 0x600(%rbp),%rsi ffffffff82161525: 4c 89 e7 mov %r12,%rdi ffffffff82161528: e8 69 eb ff ff callq ffffffff82160096 <cfq_cic_lookup> ffffffff8216152d: 49 89 c6 mov %rax,%r14 ffffffff82161530: 48 8b 85 00 06 00 00 mov 0x600(%rbp),%rax ffffffff82161537: f0 48 ff 00 lock incq (%rax) ffffffff8216153b: fe 85 a0 05 00 00 incb 0x5a0(%rbp) ffffffff82161541: 4d 85 f6 test %r14,%r14 ffffffff82161544: 0f 84 a6 00 00 00 je ffffffff821615f0 <cfq_yield+0x118> ffffffff8216154a: 49 8b bd 48 03 00 00 mov 0x348(%r13),%rdi ffffffff82161551: e8 a0 ba 19 00 callq ffffffff822fcff6 <_raw_spin_lock_irq> ffffffff82161556: 48 8b 5b 10 mov 0x10(%rbx),%rbx ffffffff8216155a: 48 85 db test %rbx,%rbx ffffffff8216155d: 0f 84 83 00 00 00 je ffffffff821615e6 <cfq_yield+0x10e> ffffffff82161563: 49 39 9c 24 68 03 00 cmp %rbx,0x368(%r12) ffffffff8216156a: 00 ffffffff8216156b: 75 41 jne ffffffff821615ae <cfq_yield+0xd6> ffffffff8216156d: 41 83 bc 24 54 02 00 cmpl $0x1,0x254(%r12) ffffffff82161574: 00 01 ffffffff82161576: 75 36 jne ffffffff821615ae <cfq_yield+0xd6> ffffffff82161578: 48 8b 83 d0 00 00 00 mov 0xd0(%rbx),%rax ffffffff8216157f: 8b 70 38 mov 0x38(%rax),%esi ffffffff82161582: 83 fe ff cmp $0xffffffffffffffff,%esi ffffffff82161585: 74 27 je ffffffff821615ae <cfq_yield+0xd6> ffffffff82161587: 48 8b 48 30 mov 0x30(%rax),%rcx ffffffff8216158b: 48 85 c9 test %rcx,%rcx ffffffff8216158e: 74 1e je ffffffff821615ae <cfq_yield+0xd6> ffffffff82161590: 48 63 05 4d f2 5d 00 movslq 6156877(%rip),%rax # ffffffff827407e4 <cfq_slice_idle> ffffffff82161597: 48 8b 15 e2 42 63 00 mov 6505186(%rip),%rdx # ffffffff82795880 <jiffies> ffffffff8216159e: 48 01 c8 add %rcx,%rax ffffffff821615a1: 48 39 d0 cmp %rdx,%rax ffffffff821615a4: 79 08 jns ffffffff821615ae <cfq_yield+0xd6> ffffffff821615a6: 3b b3 c0 00 00 00 cmp 0xc0(%rbx),%esi ffffffff821615ac: 75 38 jne ffffffff821615e6 <cfq_yield+0x10e> ffffffff821615ae: 49 8b 04 24 mov (%r12),%rax ffffffff821615b2: 48 8b b8 80 04 00 00 mov 0x480(%rax),%rdi ffffffff821615b9: 48 85 ff test %rdi,%rdi ffffffff821615bc: 74 1a je ffffffff821615d8 <cfq_yield+0x100> ffffffff821615be: 8b 8d 68 02 00 00 mov 0x268(%rbp),%ecx ffffffff821615c4: 8b 93 c0 00 00 00 mov 0xc0(%rbx),%edx ffffffff821615ca: 48 c7 c6 1d 8c 55 82 mov $0xffffffff82558c1d,%rsi ffffffff821615d1: 31 c0 xor %eax,%eax ffffffff821615d3: e8 a1 07 f2 ff callq ffffffff82081d79 <__trace_note_message> ffffffff821615d8: 81 4b 04 00 20 00 00 orl $0x2000,0x4(%rbx) ffffffff821615df: 4c 89 b3 f0 00 00 00 mov %r14,0xf0(%rbx) ffffffff821615e6: 49 8b 85 48 03 00 00 mov 0x348(%r13),%rax ffffffff821615ed: fe 00 incb (%rax) ffffffff821615ef: fb sti ffffffff821615f0: 5b pop %rbx ffffffff821615f1: 48 8b bd 00 06 00 00 mov 0x600(%rbp),%rdi ffffffff821615f8: 5d pop %rbp ffffffff821615f9: 41 5c pop %r12 ffffffff821615fb: 41 5d pop %r13 ffffffff821615fd: 41 5e pop %r14 ffffffff821615ff: e9 51 8b ff ff jmpq ffffffff8215a155 <put_io_context> ffffffff82161604: 5b pop %rbx ffffffff82161605: 5d pop %rbp ffffffff82161606: 41 5c pop %r12 ffffffff82161608: 41 5d pop %r13 ffffffff8216160a: 41 5e pop %r14 ffffffff8216160c: c3 retq ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-28 6:41 ` Tao Ma 2010-06-28 13:58 ` Jeff Moyer @ 2010-06-29 14:56 ` Jeff Moyer 2010-06-30 0:31 ` Tao Ma 1 sibling, 1 reply; 24+ messages in thread From: Jeff Moyer @ 2010-06-29 14:56 UTC (permalink / raw) To: Tao Ma Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker, Sunil Mushran, ocfs2-devel Tao Ma <tao.ma@oracle.com> writes: > Hi Jeff, > > On 06/27/2010 09:48 PM, Jeff Moyer wrote: >> Tao Ma<tao.ma@oracle.com> writes: >>> I am sorry to say that the patch make jbd2 locked up when I tested >>> fs_mark using ocfs2. >>> I have attached the log from my netconsole server. After I reverted >>> the patch [3/3], the box works again. >> >> I can't reproduce this, unfortunately. Also, when building with the >> .config you sent me, the disassembly doesn't line up with the stack >> trace you posted. >> >> I'm not sure why yielding the queue would cause a deadlock. The only >> explanation I can come up with is that I/O is not being issued. I'm >> assuming that no other I/O will be completed to the file system in >> question. Is that right? Could you send along the output from sysrq-t? > yes, I just mounted it and begin the test, so there should be no > outstanding I/O. So do you need me to setup another disk for test? > I have attached the sysrq output in sysrq.log. please check. Well, if it doesn't take long to reproduce, then it might be helpful to see a blktrace of the run. However, it might also just be worth waiting for the next version of the patch to see if that fixes your issue. > btw, I also met with a NULL pointer deference in cfq_yield. I have > attached the null.log also. This seems to be related to the previous > deadlock and happens when I try to remount the same volume after > reboot and ocfs2 try to do some recovery. Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745 RIP: 0010:[<ffffffff82161537>] [<ffffffff82161537>] cfq_yield+0x5f/0x135 RSP: 0018:ffff880123061c60 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0 ffffffff82161528: e8 69 eb ff ff callq ffffffff82160096 <cfq_cic_lookup> ffffffff8216152d: 49 89 c6 mov %rax,%r14 ffffffff82161530: 48 8b 85 00 06 00 00 mov 0x600(%rbp),%rax ffffffff82161537: f0 48 ff 00 lock incq (%rax) I'm pretty sure that's a NULL pointer deref of the tsk->iocontext that was passed into the yield function. I've since fixed that, so your recovery code should be safe in the newest version (which I've not yet posted). Cheers, Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ 2010-06-29 14:56 ` Jeff Moyer @ 2010-06-30 0:31 ` Tao Ma 0 siblings, 0 replies; 24+ messages in thread From: Tao Ma @ 2010-06-30 0:31 UTC (permalink / raw) To: Jeff Moyer Cc: axboe, vgoyal, linux-kernel, linux-ext4, Joel Becker, Sunil Mushran, ocfs2-devel Hi Jeff, On 06/29/2010 10:56 PM, Jeff Moyer wrote: > Tao Ma<tao.ma@oracle.com> writes: > >> Hi Jeff, >> >> On 06/27/2010 09:48 PM, Jeff Moyer wrote: >>> Tao Ma<tao.ma@oracle.com> writes: >>>> I am sorry to say that the patch make jbd2 locked up when I tested >>>> fs_mark using ocfs2. >>>> I have attached the log from my netconsole server. After I reverted >>>> the patch [3/3], the box works again. >>> >>> I can't reproduce this, unfortunately. Also, when building with the >>> .config you sent me, the disassembly doesn't line up with the stack >>> trace you posted. >>> >>> I'm not sure why yielding the queue would cause a deadlock. The only >>> explanation I can come up with is that I/O is not being issued. I'm >>> assuming that no other I/O will be completed to the file system in >>> question. Is that right? Could you send along the output from sysrq-t? >> yes, I just mounted it and begin the test, so there should be no >> outstanding I/O. So do you need me to setup another disk for test? >> I have attached the sysrq output in sysrq.log. please check. > > Well, if it doesn't take long to reproduce, then it might be helpful to > see a blktrace of the run. However, it might also just be worth waiting > for the next version of the patch to see if that fixes your issue. > >> btw, I also met with a NULL pointer deference in cfq_yield. I have >> attached the null.log also. This seems to be related to the previous >> deadlock and happens when I try to remount the same volume after >> reboot and ocfs2 try to do some recovery. > > Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745 > RIP: 0010:[<ffffffff82161537>] > [<ffffffff82161537>] cfq_yield+0x5f/0x135 > RSP: 0018:ffff880123061c60 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0 > > ffffffff82161528: e8 69 eb ff ff callq ffffffff82160096<cfq_cic_lookup> > ffffffff8216152d: 49 89 c6 mov %rax,%r14 > ffffffff82161530: 48 8b 85 00 06 00 00 mov 0x600(%rbp),%rax > ffffffff82161537: f0 48 ff 00 lock incq (%rax) > > I'm pretty sure that's a NULL pointer deref of the tsk->iocontext that > was passed into the yield function. I've since fixed that, so your > recovery code should be safe in the newest version (which I've not yet > posted). ok, so could you please cc me when the new patches are out? It would be easier for me to track it. Thanks. Regards, Tao ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2010-06-30 0:32 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-06-22 21:34 [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Jeff Moyer 2010-06-22 21:35 ` [PATCH 1/3] block: Implement a blk_yield function to voluntarily give up the I/O scheduler Jeff Moyer 2010-06-23 5:04 ` Andrew Morton 2010-06-23 14:50 ` Jeff Moyer 2010-06-24 0:46 ` Vivek Goyal 2010-06-25 16:51 ` Jeff Moyer 2010-06-25 18:55 ` Jens Axboe 2010-06-25 19:57 ` Jeff Moyer 2010-06-25 20:02 ` Vivek Goyal 2010-06-22 21:35 ` [PATCH 2/3] jbd: yield the device queue when waiting for commits Jeff Moyer 2010-06-22 21:35 ` [PATCH 3/3] jbd2: yield the device queue when waiting for journal commits Jeff Moyer 2010-06-22 22:13 ` [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ Joel Becker 2010-06-23 9:20 ` Christoph Hellwig 2010-06-23 13:03 ` Jeff Moyer 2010-06-23 9:30 ` Tao Ma 2010-06-23 13:06 ` Jeff Moyer 2010-06-24 5:54 ` Tao Ma 2010-06-24 14:56 ` Jeff Moyer 2010-06-27 13:48 ` Jeff Moyer 2010-06-28 6:41 ` Tao Ma 2010-06-28 13:58 ` Jeff Moyer 2010-06-28 23:16 ` Tao Ma 2010-06-29 14:56 ` Jeff Moyer 2010-06-30 0:31 ` Tao Ma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).