From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C38AFC433EF for ; Tue, 14 Jun 2022 23:40:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232221AbiFNXkP (ORCPT ); Tue, 14 Jun 2022 19:40:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230343AbiFNXkO (ORCPT ); Tue, 14 Jun 2022 19:40:14 -0400 Received: from esa2.hgst.iphmx.com (esa2.hgst.iphmx.com [68.232.143.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E8644ECC0 for ; Tue, 14 Jun 2022 16:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1655250013; x=1686786013; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=PuylZKQ7uT6jPu6TT451nQIoW9XVuF7UsaTzxmAADZg=; b=IVDytOw1M793bdPwjH7iswr+/7JRg3PGWp2yOuOB31ytSUP0baFR5Sq4 lmsWHVBKd9Q4l7CTsyKA/NXP06VDaPhz2HpnrdH+gUVbGcTbkvvB4GlXq qKzHssv2GUTY+geIfSNoWps/47sq8dZ+DeXZ+KiD/8KBOCJh63arvx+Cx pOnOslV6O1laPRXUaOKocjUOMNo9w45pl/QAz1dRzwwPXfERDJIquUAQp Dqm9Gquhdvu9RGrUoid9Anubn0yoOsgt3/W+MdWxlF5CUjyzAf6SpM0ZT C0YmRHVZWuIf1SpoiHtg3mWQShuRFrpwY7qGT5H5YChJBVW2egrAhX//w Q==; X-IronPort-AV: E=Sophos;i="5.91,300,1647273600"; d="scan'208";a="307462102" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Jun 2022 07:40:12 +0800 IronPort-SDR: gfiih5BB4gBpMSaquaYLsKe7nmKWpgrARJu3qo9OmeAssM3JNkIY1cOAkW9W8FgSF2A5TBlWMa vdnes1asj7aTYAJgWsJDodWV1vjjxzzJ2jA9v05Aa7V4ybshfJYTYLLq5kpAVS4UTJMVW2jFPm lfzaHl9v/qAuloozqknwNxWyhsCCVH1k3YJV21b/Y9D6ty8I8AnxjSBgMzeTVougIQFg1v+DiJ Z3egEJkhehnOq0TWTmEo64EDXW5QpkZhlamTc4pwgqDKIiMUw0zsNyOfYGSb9CSuuOQy1Erqgf w6v1r3DqYll4g2QtPbxFOhzb Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 14 Jun 2022 16:03:10 -0700 IronPort-SDR: zoae6i8/EElwJ8wXaDREjArWx9l8KghJBD+OVKkbyWJYjV7a3a/ESuLIYwtTepOKXAsyreTl0c SBFqRLz4f2oY/rwGxcmr8MXpi9dRUoaAh1himzMfiAz4xGFJ4OH0Hg0lCEprMk7B2om6OTWbaC 11FGBlwHZ5CFEjxGyy9g0laGbK/+HQXEdIkbkezOG8Zv5swOIopQ5k3JZS/+sXo7y+D1UXpXTD hxnBbwJH3p7P62c9E6IhC6s2Lo4mnbafzut/RAJC2fZ4oUxRIB71lkZsyyAaCP2W2T1GykpUuy Th4= WDCIronportException: Internal Received: from usg-ed-osssrv.wdc.com ([10.3.10.180]) by uls-op-cesaip02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 14 Jun 2022 16:40:13 -0700 Received: from usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTP id 4LN4g02hsRz1Rvlx for ; Tue, 14 Jun 2022 16:40:12 -0700 (PDT) Authentication-Results: usg-ed-osssrv.wdc.com (amavisd-new); dkim=pass reason="pass (just generated, assumed good)" header.d=opensource.wdc.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d= opensource.wdc.com; h=content-transfer-encoding:content-type :in-reply-to:organization:from:references:to:content-language :subject:user-agent:mime-version:date:message-id; s=dkim; t= 1655250011; x=1657842012; bh=PuylZKQ7uT6jPu6TT451nQIoW9XVuF7UsaT zxmAADZg=; b=bOZeyQzVfgFR66elM+gdaM2VtVY+yyhjEEoNnd3eNPEDiWbJ3lr lKRK53ob4D2sfC4AyznsZjKY5TxguNlkxPQsL70WURUAJHcWwy2Yjh3l+jrFgLKW 9hv15D7RDQ8sEDtqsJI4m/Fsf7TVxesjh9+WJG8dViqe+X2Xn028Uwn/PLa1Pg2g 7T87uHoFA96TJc1xDJx97iTf76nMoRy/sxuPcT7d6f4j8FIe1+PD7IS67N12+2T2 Nq6W8nHBSvtIGL7ZD5FC1tiSMZhz9sc+rVV8r2+ifNpW7BTT7PjQe792p3NuE4wH 4sw0Lf1QL07jrEZsKwxoMCssRYS7ZVEmBrQ== X-Virus-Scanned: amavisd-new at usg-ed-osssrv.wdc.com Received: from usg-ed-osssrv.wdc.com ([127.0.0.1]) by usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 1fvOQ1uBCC1n for ; Tue, 14 Jun 2022 16:40:11 -0700 (PDT) Received: from [10.225.163.81] (unknown [10.225.163.81]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTPSA id 4LN4fx5nyrz1Rvlc; Tue, 14 Jun 2022 16:40:09 -0700 (PDT) Message-ID: <582c6c13-9bed-f4b8-ddfa-55071d32b4cc@opensource.wdc.com> Date: Wed, 15 Jun 2022 08:40:08 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH 5/5] block/mq-deadline: Remove zone locking Content-Language: en-US To: Bart Van Assche , Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Damien Le Moal , "Martin K . Petersen" , Khazhy Kumykov , Jaegeuk Kim References: <20220614174943.611369-1-bvanassche@acm.org> <20220614174943.611369-6-bvanassche@acm.org> From: Damien Le Moal Organization: Western Digital Research In-Reply-To: <20220614174943.611369-6-bvanassche@acm.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 6/15/22 02:49, Bart Van Assche wrote: > Measurements have shown that limiting the queue depth to one has a > significant negative performance impact on zoned UFS devices. Hence this > patch that removes zone locking from the mq-deadline scheduler. This > patch is based on the following assumptions: > - Applications submit write requests to sequential write required zones > in order. > - If such write requests get reordered by the software or hardware queue > mechanism, nr_hw_queues * nr_requests - 1 retries are sufficient to > reorder the write requests. As mentioned in my previous reply for patch 2, AHCI will not behave like that, at all. Retrying will be useless most of the time because the adapter send commands to the drive randomly from the set of commands that are marked ready in the ready register. So no. I am opposed to unconditionally removing zone write locking. If UFS LLD can deliver commands in order then use a queue flag to say "zone write locking not needed" to disable it for that device class. There probably are some SAS HBAs that could benefit from this too, but I have seen so many reordering bugs with these (e.g. requeue at tail of a write that got a TSF) that I would not want to remove zone write locking for these. And I also do not want to start getting 10 calls a day from customers complaining about very bad write performance due to all these retries which are likely going to be slower in the end than writing at QD=1 per zone. > - It happens infrequently that zoned write requests are reordered by the > block layer. What make you say that ? It only takes 2 contexts trying to dispatch commands to different queues. Or a write process being rescheduled to a different CPU/queue. > - Either no I/O scheduler is used or an I/O scheduler is used that > submits write requests per zone in LBA order. And unfortunately, as the AHCI example shows, having the scheduler dispatch requests in LBA order is not enough. > > DD_BE_PRIO is selected for sequential writes to preserve the LBA order. So if the application wanted the writes to have RT policy so that these commands get the high priority bit set on SATA disks, that will not be honored. No to that too. > > See also commit 5700f69178e9 ("mq-deadline: Introduce zone locking > support"). > > Cc: Damien Le Moal > Signed-off-by: Bart Van Assche > --- > block/mq-deadline.c | 74 ++++----------------------------------------- > 1 file changed, 6 insertions(+), 68 deletions(-) > > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index 6ed602b2f80a..e168fc9a980a 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -104,7 +104,6 @@ struct deadline_data { > int prio_aging_expire; > > spinlock_t lock; > - spinlock_t zone_lock; > }; > > /* Maps an I/O priority class to a deadline scheduler priority. */ > @@ -285,30 +284,10 @@ static struct request * > deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio, > enum dd_data_dir data_dir) > { > - struct request *rq; > - unsigned long flags; > - > if (list_empty(&per_prio->fifo_list[data_dir])) > return NULL; > > - rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next); > - if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q)) > - return rq; > - > - /* > - * Look for a write request that can be dispatched, that is one with > - * an unlocked target zone. > - */ > - spin_lock_irqsave(&dd->zone_lock, flags); > - list_for_each_entry(rq, &per_prio->fifo_list[DD_WRITE], queuelist) { > - if (blk_req_can_dispatch_to_zone(rq)) > - goto out; > - } > - rq = NULL; > -out: > - spin_unlock_irqrestore(&dd->zone_lock, flags); > - > - return rq; > + return rq_entry_fifo(per_prio->fifo_list[data_dir].next); > } > > /* > @@ -319,29 +298,7 @@ static struct request * > deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio, > enum dd_data_dir data_dir) > { > - struct request *rq; > - unsigned long flags; > - > - rq = per_prio->next_rq[data_dir]; > - if (!rq) > - return NULL; > - > - if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q)) > - return rq; > - > - /* > - * Look for a write request that can be dispatched, that is one with > - * an unlocked target zone. > - */ > - spin_lock_irqsave(&dd->zone_lock, flags); > - while (rq) { > - if (blk_req_can_dispatch_to_zone(rq)) > - break; > - rq = deadline_latter_request(rq); > - } > - spin_unlock_irqrestore(&dd->zone_lock, flags); > - > - return rq; > + return per_prio->next_rq[data_dir]; > } > > /* > @@ -467,10 +424,6 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd, > ioprio_class = dd_rq_ioclass(rq); > prio = ioprio_class_to_prio[ioprio_class]; > dd->per_prio[prio].stats.dispatched++; > - /* > - * If the request needs its target zone locked, do it. > - */ > - blk_req_zone_write_lock(rq); > rq->rq_flags |= RQF_STARTED; > return rq; > } > @@ -640,7 +593,6 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e) > dd->fifo_batch = fifo_batch; > dd->prio_aging_expire = prio_aging_expire; > spin_lock_init(&dd->lock); > - spin_lock_init(&dd->zone_lock); > > q->elevator = eq; > return 0; > @@ -716,17 +668,13 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); > struct dd_per_prio *per_prio; > enum dd_prio prio; > + bool seq_write = blk_rq_is_seq_write(rq); > LIST_HEAD(free); > > lockdep_assert_held(&dd->lock); > > - /* > - * This may be a requeue of a write request that has locked its > - * target zone. If it is the case, this releases the zone lock. > - */ > - blk_req_zone_write_unlock(rq); > - > - prio = ioprio_class_to_prio[ioprio_class]; > + prio = seq_write ? DD_BE_PRIO : > + ioprio_class_to_prio[ioprio_class]; > per_prio = &dd->per_prio[prio]; > if (!rq->elv.priv[0]) { > per_prio->stats.inserted++; > @@ -740,7 +688,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > > trace_block_rq_insert(rq); > > - if (at_head) { > + if (at_head && !seq_write) { > list_add(&rq->queuelist, &per_prio->dispatch); > rq->fifo_time = jiffies; > } else { > @@ -819,16 +767,6 @@ static void dd_finish_request(struct request *rq) > return; > > atomic_inc(&per_prio->stats.completed); > - > - if (blk_queue_is_zoned(q)) { > - unsigned long flags; > - > - spin_lock_irqsave(&dd->zone_lock, flags); > - blk_req_zone_write_unlock(rq); > - if (!list_empty(&per_prio->fifo_list[DD_WRITE])) > - blk_mq_sched_mark_restart_hctx(rq->mq_hctx); > - spin_unlock_irqrestore(&dd->zone_lock, flags); > - } > } > > static bool dd_has_work_for_prio(struct dd_per_prio *per_prio) -- Damien Le Moal Western Digital Research