From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756687Ab1CRNwq (ORCPT ); Fri, 18 Mar 2011 09:52:46 -0400 Received: from mx2.fusionio.com ([64.244.102.31]:44780 "EHLO mx2.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756651Ab1CRNwd (ORCPT ); Fri, 18 Mar 2011 09:52:33 -0400 X-ASG-Debug-ID: 1300456350-01de284cf885200001-xx1T2L X-Barracuda-Envelope-From: JAxboe@fusionio.com Message-ID: <4D83639A.3060407@fusionio.com> Date: Fri, 18 Mar 2011 14:52:26 +0100 From: Jens Axboe MIME-Version: 1.0 To: Shaohua Li CC: Vivek Goyal , "linux-kernel@vger.kernel.org" , "hch@infradead.org" , "jmoyer@redhat.com" Subject: Re: [PATCH 04/10] block: initial patch for on-stack per-task plugging References: <1295659049-2688-1-git-send-email-jaxboe@fusionio.com> <1295659049-2688-5-git-send-email-jaxboe@fusionio.com> <20110316173148.GC13562@redhat.com> <1300323609.2337.117.camel@sli10-conroe> <4D81D7D5.7000806@fusionio.com> <1300430212.2337.141.camel@sli10-conroe> <4D83560E.8000002@fusionio.com> X-ASG-Orig-Subj: Re: [PATCH 04/10] block: initial patch for on-stack per-task plugging In-Reply-To: <4D83560E.8000002@fusionio.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mail1.int.fusionio.com[10.101.1.21] X-Barracuda-Start-Time: 1300456350 X-Barracuda-URL: http://10.101.1.181:8000/cgi-mod/mark.cgi X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.58258 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2011-03-18 13:54, Jens Axboe wrote: > On 2011-03-18 07:36, Shaohua Li wrote: >> On Thu, 2011-03-17 at 17:43 +0800, Jens Axboe wrote: >>> On 2011-03-17 02:00, Shaohua Li wrote: >>>> On Thu, 2011-03-17 at 01:31 +0800, Vivek Goyal wrote: >>>>> On Wed, Mar 16, 2011 at 04:18:30PM +0800, Shaohua Li wrote: >>>>>> 2011/1/22 Jens Axboe : >>>>>>> Signed-off-by: Jens Axboe >>>>>>> --- >>>>>>> block/blk-core.c | 357 ++++++++++++++++++++++++++++++++------------ >>>>>>> block/elevator.c | 6 +- >>>>>>> include/linux/blk_types.h | 2 + >>>>>>> include/linux/blkdev.h | 30 ++++ >>>>>>> include/linux/elevator.h | 1 + >>>>>>> include/linux/sched.h | 6 + >>>>>>> kernel/exit.c | 1 + >>>>>>> kernel/fork.c | 3 + >>>>>>> kernel/sched.c | 11 ++- >>>>>>> 9 files changed, 317 insertions(+), 100 deletions(-) >>>>>>> >>>>>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>>>>> index 960f12c..42dbfcc 100644 >>>>>>> --- a/block/blk-core.c >>>>>>> +++ b/block/blk-core.c >>>>>>> @@ -27,6 +27,7 @@ >>>>>>> #include >>>>>>> #include >>>>>>> #include >>>>>>> +#include >>>>>>> >>>>>>> #define CREATE_TRACE_POINTS >>>>>>> #include >>>>>>> @@ -213,7 +214,7 @@ static void blk_delay_work(struct work_struct *work) >>>>>>> >>>>>>> q = container_of(work, struct request_queue, delay_work.work); >>>>>>> spin_lock_irq(q->queue_lock); >>>>>>> - q->request_fn(q); >>>>>>> + __blk_run_queue(q); >>>>>>> spin_unlock_irq(q->queue_lock); >>>>>>> } >>>>>> Hi Jens, >>>>>> I have some questions about the per-task plugging. Since the request >>>>>> list is per-task, and each task delivers its requests at finish flush >>>>>> or schedule. But when one cpu delivers requests to global queue, other >>>>>> cpus don't know. This seems to have problem. For example: >>>>>> 1. get_request_wait() can only flush current task's request list, >>>>>> other cpus/tasks might still have a lot of requests, which aren't sent >>>>>> to request_queue. >>>>> >>>>> But very soon these requests will be sent to request queue as soon task >>>>> is either scheduled out or task explicitly flushes the plug? So we might >>>>> wait a bit longer but that might not matter in general, i guess. >>>> Yes, I understand there is just a bit delay. I don't know how severe it >>>> is, but this still could be a problem, especially for fast storage or >>>> random I/O. My current tests show slight regression (3% or so) with >>>> Jens's for 2.6.39/core branch. I'm still checking if it's caused by the >>>> per-task plug, but the per-task plug is highly suspected. >>> >>> To check this particular case, you can always just bump the request >>> limit. What test is showing a slowdown? >> this is a simple multi-threaded seq read. The issue tends to be request >> merge related (not verified yet). The merge reduces about 60% with stack >> plug from fio reported data. From trace, without stack plug, requests >> from different threads get merged. But with it, such merge is impossible >> because flush_plug doesn't check merge, I thought we need add it again. > > What we could try is have the plug flush insert be > ELEVATOR_INSERT_SORT_MERGE and have it lookup potential backmerges. > > Here's a quick hack that does that, I have not tested it at all. Gave it a quick test spin, as suspected it had a few issues. This one seems to work. Can you toss it through that workload and see if it fares better? diff --git a/block/blk-core.c b/block/blk-core.c index e1fcf7a..e1b29e7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -55,7 +55,7 @@ struct kmem_cache *blk_requestq_cachep; */ static struct workqueue_struct *kblockd_workqueue; -static void drive_stat_acct(struct request *rq, int new_io) +void drive_stat_acct(struct request *rq, int new_io) { struct hd_struct *part; int rw = rq_data_dir(rq); @@ -2685,7 +2685,7 @@ static void flush_plug_list(struct blk_plug *plug) /* * rq is already accounted, so use raw insert */ - __elv_add_request(q, rq, ELEVATOR_INSERT_SORT); + __elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE); } if (q) { diff --git a/block/blk-merge.c b/block/blk-merge.c index ea85e20..27a7926 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -429,12 +429,14 @@ static int attempt_merge(struct request_queue *q, struct request *req, req->__data_len += blk_rq_bytes(next); - elv_merge_requests(q, req, next); + if (next->cmd_flags & REQ_SORTED) { + elv_merge_requests(q, req, next); - /* - * 'next' is going away, so update stats accordingly - */ - blk_account_io_merge(next); + /* + * 'next' is going away, so update stats accordingly + */ + blk_account_io_merge(next); + } req->ioprio = ioprio_best(req->ioprio, next->ioprio); if (blk_rq_cpu_valid(next)) @@ -465,3 +467,15 @@ int attempt_front_merge(struct request_queue *q, struct request *rq) return 0; } + +int blk_attempt_req_merge(struct request_queue *q, struct request *rq, + struct request *next) +{ + int ret; + + ret = attempt_merge(q, rq, next); + if (ret) + drive_stat_acct(rq, 0); + + return ret; +} diff --git a/block/blk.h b/block/blk.h index 49d21af..5b8ecbf 100644 --- a/block/blk.h +++ b/block/blk.h @@ -103,6 +103,9 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio); int attempt_back_merge(struct request_queue *q, struct request *rq); int attempt_front_merge(struct request_queue *q, struct request *rq); +int blk_attempt_req_merge(struct request_queue *q, struct request *rq, + struct request *next); +void drive_stat_acct(struct request *rq, int new_io); void blk_recalc_rq_segments(struct request *rq); void blk_rq_set_mixed_merge(struct request *rq); diff --git a/block/elevator.c b/block/elevator.c index 542ce82..88bdf81 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -521,6 +521,33 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio) return ELEVATOR_NO_MERGE; } +/* + * Returns true if we merged, false otherwise + */ +static bool elv_attempt_insert_merge(struct request_queue *q, + struct request *rq) +{ + struct request *__rq; + + if (blk_queue_nomerges(q) || blk_queue_noxmerges(q)) + return false; + + /* + * First try one-hit cache. + */ + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) + return true; + + /* + * See if our hash lookup can find a potential backmerge. + */ + __rq = elv_rqhash_find(q, blk_rq_pos(rq)); + if (__rq && blk_attempt_req_merge(q, __rq, rq)) + return true; + + return false; +} + void elv_merged_request(struct request_queue *q, struct request *rq, int type) { struct elevator_queue *e = q->elevator; @@ -647,6 +674,9 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) __blk_run_queue(q, false); break; + case ELEVATOR_INSERT_SORT_MERGE: + if (elv_attempt_insert_merge(q, rq)) + break; case ELEVATOR_INSERT_SORT: BUG_ON(rq->cmd_type != REQ_TYPE_FS && !(rq->cmd_flags & REQ_DISCARD)); diff --git a/include/linux/elevator.h b/include/linux/elevator.h index ec6f72b..d93efcc 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -166,6 +166,7 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t); #define ELEVATOR_INSERT_SORT 3 #define ELEVATOR_INSERT_REQUEUE 4 #define ELEVATOR_INSERT_FLUSH 5 +#define ELEVATOR_INSERT_SORT_MERGE 6 /* * return values from elevator_may_queue_fn -- Jens Axboe