From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BD19C4332F for ; Tue, 11 Oct 2022 23:42:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229616AbiJKXm1 (ORCPT ); Tue, 11 Oct 2022 19:42:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229545AbiJKXmZ (ORCPT ); Tue, 11 Oct 2022 19:42:25 -0400 Received: from esa6.hgst.iphmx.com (esa6.hgst.iphmx.com [216.71.154.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EC6F4DB5A for ; Tue, 11 Oct 2022 16:42:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1665531742; x=1697067742; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=FaTWp9o1MS7wh/WnHzwRcVamqGxOOcGnRjHXSp4zBD4=; b=rLZ7OQhXASV7quj6UBtKxlT6U3pFh6yR4En5SqwVwhmSPZ9+wDZhCbEo +y/0QP7r/Fp3zPUk7IuiD8QmuivNsF+FjY4AUE99a57BwqqE6J9Y7aw5j tB8f559ReoEy+nOH2f8NIwHEkuW+yUbRZvFsonS2w3ctrp+6C1m8zmz0A hEKp+9lql7IKHDqEFpVRGUDlM5B6yyadx+LDe0sKO7YN57n2qu55bjXQU DytWLMAI2O0kEJ51+0YvtbfHwxK23u9Bdk6uyJSlb6UecYyPjKweT3Ec9 RF598RK3rMas93My/NOuAddWhuh2GhXqjN3oIo0QRQRGz5b5wpNk1HWYA g==; X-IronPort-AV: E=Sophos;i="5.95,177,1661788800"; d="scan'208";a="213936081" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 Oct 2022 07:42:19 +0800 IronPort-SDR: tS/oVuImrH7GwkMi7RE8YeDaXJ+qOim+UpfuUkkZ5ZgTZ/GiLCAFvxLoVqial2kFabHPdyqsQe bwevE7daaCB6E9DJIVecbd8RId53RWkNNn3A9B4Oh2mxMHRlxAnvKCnWqGCK1p+FEO+8UsDvlc dLX576xQRa1A8DC8WLk/U5fPTQzRTil1LAUzi9BC3ryaSqpk7QcoOKqf5X2K+m22QKamu7GHeZ jnPlSq/ZSHNh4iyPkhmVVOKigEW/Xev/G+LqAXP0N9QkXq4C/shbZAc8UDofh4mSdtPUwLDXEN LwY8LM9kQDb301ExSz8RAWb/ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 11 Oct 2022 15:56:21 -0700 IronPort-SDR: /IYFFUeGqNsBrWqoXdsTm2XCX5llM1HPrUuiJ8/JL8iKgTZjp4csKtQN2NmUX8KDUY2hbQJeM8 MHdi0vSynvc+QCMzKkxFRSS4PdDDus6mw0STQ9ciPdZcdQei1LgDAo3kj9qvUj2IWVYI/H+CH6 PvspeTTTB7pBpiGXkb0aMjRmO06PnNR1Z7s9J3OLOe2ddI5QNiDpubhNcsT08XIDZx2pi2KL49 WBjhCHGTab72OvwOlpkkX8kCJtLGx6Mig79Q+RyfR6cNkhAvVQTpyYplQLr+9eDmIx6CiSnme7 gR4= WDCIronportException: Internal Received: from usg-ed-osssrv.wdc.com ([10.3.10.180]) by uls-op-cesaip02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 11 Oct 2022 16:42:20 -0700 Received: from usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTP id 4MnC4W0FL8z1Rwtl for ; Tue, 11 Oct 2022 16:42:19 -0700 (PDT) Authentication-Results: usg-ed-osssrv.wdc.com (amavisd-new); dkim=pass reason="pass (just generated, assumed good)" header.d=opensource.wdc.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d= opensource.wdc.com; h=content-transfer-encoding:content-type :in-reply-to:organization:from:content-language:references:to :subject:user-agent:mime-version:date:message-id; s=dkim; t= 1665531734; x=1668123735; bh=FaTWp9o1MS7wh/WnHzwRcVamqGxOOcGnRjH XSp4zBD4=; b=ZyGhwhXXdyP943Jr+2vb9y6YIRZfJzq1DFgxXi8RWuK7sVwZNhg GneF1urKuiB4kkJ3OOCxAol0ZY5NtMebK393f/Q6gLIyMR/TiGWKcrbFxZkpOTiW 1fNRrsSfGtLBKikXAAWkq0gYbJToFgf2Mph4D9vbafCacUpQD/dlTzDAkv8JXEXL iuYfzej4FAFJKLQ2JziQOgx79Sp0ZU+PaQE9AdROsFCG1ljumOdGxDHc1FFhWyGu WZZLF7miKgMWGyg69V+FMJNqDRmdfU2/DpYdW70S13QYTxEf8extYJ9hO3A7izom 83GhyQfP7G1LFp6KifnI+s+cuHBoeF9SttQ== X-Virus-Scanned: amavisd-new at usg-ed-osssrv.wdc.com Received: from usg-ed-osssrv.wdc.com ([127.0.0.1]) by usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 1fUK9hQhMScm for ; Tue, 11 Oct 2022 16:42:14 -0700 (PDT) Received: from [10.225.163.113] (unknown [10.225.163.113]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTPSA id 4MnC4N4TVLz1RvLy; Tue, 11 Oct 2022 16:42:12 -0700 (PDT) Message-ID: Date: Wed, 12 Oct 2022 08:42:10 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: [PATCH V3 1/8] block, bfq: split sync bfq_queues on a per-actuator basis To: Paolo VALENTE Cc: Jens Axboe , linux-block , linux-kernel , Jan Kara , Andrea Righi , Glen Valante , Arie van der Hoeven , Rory Chen , Gabriele Felici References: <20221004094010.80090-1-paolo.valente@linaro.org> <20221004094010.80090-2-paolo.valente@linaro.org> <3985B13F-86C5-4020-8A44-9823AE1C3FCE@unimore.it> Content-Language: en-US From: Damien Le Moal Organization: Western Digital Research In-Reply-To: <3985B13F-86C5-4020-8A44-9823AE1C3FCE@unimore.it> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12/22 00:45, Paolo VALENTE wrote: >=20 >=20 >> Il giorno 5 ott 2022, alle ore 01:04, Damien Le Moal ha scritto: >> >> On 10/4/22 18:40, Paolo Valente wrote: >>> Multi-actuator drives appear as a single device to the I/O subsystem = [1]. >> >> Not necessarilly. Multi-lun scsi model will show up as multiple drives >> with one actuator each. >> >=20 > Right. In that case, IIUC each LUN appears as a separate device (in > /dev/), and allows one of the actuators to be controlled separately. > So each such device has only one range in the air data structure, and > the extension in these patches is simply inactive. Yes. My comment was all about not being so general with how multi-actuato= r drives are seen from the host. Your sentence should be rephrased to something like: "Single LUN multi-actuator SCSI drives and all multi-actuator SATA drives appear as a single device to the I/O subsystem= ." No confusion possible this way :) >=20 > That said, I'll try to address all your suggestions for this and the > other patches, and send a new version of this series. >=20 > Thank you, > Paolo >=20 >>> Yet they address commands to different actuators internally, as a >>> function of Logical Block Addressing (LBAs). A given sector is >>> reachable by only one of the actuators. For example, Seagate=E2=80=99= s Serial >>> Advanced Technology Attachment (SATA) version contains two actuators >>> and maps the lower half of the SATA LBA space to the lower actuator >>> and the upper half to the upper actuator. >>> >>> Evidently, to fully utilize actuators, no actuator must be left idle >>> or underutilized while there is pending I/O for it. The block layer >>> must somehow control the load of each actuator individually. This >>> commit lays the ground for allowing BFQ to provide such a per-actuato= r >>> control. >>> >>> BFQ associates an I/O-request sync bfq_queue with each process doing >>> synchronous I/O, or with a group of processes, in case of queue >>> merging. Then BFQ serves one bfq_queue at a time. While in service, a >>> bfq_queue is emptied in request-position order. Yet the same process, >>> or group of processes, may generate I/O for different actuators. In >>> this case, different streams of I/O (each for a different actuator) >>> get all inserted into the same sync bfq_queue. So there is basically >>> no individual control on when each stream is served, i.e., on when th= e >>> I/O requests of the stream are picked from the bfq_queue and >>> dispatched to the drive. >>> >>> This commit enables BFQ to control the service of each actuator >>> individually for synchronous I/O, by simply splitting each sync >>> bfq_queue into N queues, one for each actuator. In other words, a syn= c >>> bfq_queue is now associated to a pair (process, actuator). As a >>> consequence of this split, the per-queue proportional-share policy >>> implemented by BFQ will guarantee that the sync I/O generated for eac= h >>> actuator, by each process, receives its fair share of service. >>> >>> This is just a preparatory patch. If the I/O of the same process >>> happens to be sent to different queues, then each of these queues may >>> undergo queue merging. To handle this event, the bfq_io_cq data >>> structure must be properly extended. In addition, stable merging must >>> be disabled to avoid loss of control on individual actuators. Finally= , >>> also async queues must be split. These issues are described in detail >>> and addressed in next commits. As for this commit, although multiple >>> per-process bfq_queues are provided, the I/O of each process or group >>> of processes is still sent to only one queue, regardless of the >>> actuator the I/O is for. The forwarding to distinct bfq_queues will b= e >>> enabled after addressing the above issues. >>> >>> [1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-sch= eduler-optimizations-for-multi-actuator-sata-hard-drives/ >>> >>> Signed-off-by: Gabriele Felici >>> Signed-off-by: Paolo Valente >>> --- >>> block/bfq-cgroup.c | 95 +++++++++++++++++-------------- >>> block/bfq-iosched.c | 135 +++++++++++++++++++++++++++----------------= - >>> block/bfq-iosched.h | 38 +++++++++---- >>> 3 files changed, 164 insertions(+), 104 deletions(-) >>> >>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c >>> index 30b15a9a47c4..a745dd9d658e 100644 >>> --- a/block/bfq-cgroup.c >>> +++ b/block/bfq-cgroup.c >>> @@ -705,6 +705,48 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct= bfq_queue *bfqq, >>> bfq_put_queue(bfqq); >>> } >>> >>> +static void bfq_sync_bfqq_move(struct bfq_data *bfqd, >>> + struct bfq_queue *sync_bfqq, >>> + struct bfq_io_cq *bic, >>> + struct bfq_group *bfqg, >>> + unsigned int act_idx) >>> +{ >>> + if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) { >>> + /* We are the only user of this bfqq, just move it */ >>> + if (sync_bfqq->entity.sched_data !=3D &bfqg->sched_data) >>> + bfq_bfqq_move(bfqd, sync_bfqq, bfqg); >>> + } else { >>> + struct bfq_queue *bfqq; >>> + >>> + /* >>> + * The queue was merged to a different queue. Check >>> + * that the merge chain still belongs to the same >>> + * cgroup. >>> + */ >>> + for (bfqq =3D sync_bfqq; bfqq; bfqq =3D bfqq->new_bfqq) >>> + if (bfqq->entity.sched_data !=3D >>> + &bfqg->sched_data) >>> + break; >>> + if (bfqq) { >>> + /* >>> + * Some queue changed cgroup so the merge is >>> + * not valid anymore. We cannot easily just >>> + * cancel the merge (by clearing new_bfqq) as >>> + * there may be other processes using this >>> + * queue and holding refs to all queues below >>> + * sync_bfqq->new_bfqq. Similarly if the merge >>> + * already happened, we need to detach from >>> + * bfqq now so that we cannot merge bio to a >>> + * request from the old cgroup. >>> + */ >>> + bfq_put_cooperator(sync_bfqq); >>> + bfq_release_process_ref(bfqd, sync_bfqq); >>> + bic_set_bfqq(bic, NULL, 1, act_idx); >>> + } >>> + } >>> +} >>> + >>> + >>> /** >>> * __bfq_bic_change_cgroup - move @bic to @bfqg. >>> * @bfqd: the queue descriptor. >>> @@ -719,53 +761,24 @@ static void *__bfq_bic_change_cgroup(struct bfq= _data *bfqd, >>> struct bfq_io_cq *bic, >>> struct bfq_group *bfqg) >>> { >>> - struct bfq_queue *async_bfqq =3D bic_to_bfqq(bic, 0); >>> - struct bfq_queue *sync_bfqq =3D bic_to_bfqq(bic, 1); >>> struct bfq_entity *entity; >>> + unsigned int act_idx; >>> >>> - if (async_bfqq) { >>> - entity =3D &async_bfqq->entity; >>> - >>> - if (entity->sched_data !=3D &bfqg->sched_data) { >>> - bic_set_bfqq(bic, NULL, 0); >>> - bfq_release_process_ref(bfqd, async_bfqq); >>> - } >>> - } >>> + for (act_idx =3D 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) { >> >> Why loop over all BFQ_NUM_ACTUATORS actuators even though this patch >> itself is not enough to support multiple actuators ? >> You then have patch 5 changing this macro to BFQ_MAX_ACTUATORS and the= n >> patch 6 finally introducing a nr_ia_range bfq field to indicate the >> effective number of actuators. >> >> Why not: >> 1) introduce BFQ_MAX_ACTUATORS in this patch and define the bfqq field >> using it >> 2) introduce a nr_actuators field defaultint to 1 for now and use that= as >> the upper bound for actuator earch loop >> >> That would be 100% consistent with the current code (no change in >> practice) and avoid all the code churn you have in the following patch= es. >> >>> + struct bfq_queue *async_bfqq =3D bic_to_bfqq(bic, 0, act_idx); >>> + struct bfq_queue *sync_bfqq =3D bic_to_bfqq(bic, 1, act_idx); >>> >>> - if (sync_bfqq) { >>> - if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) { >>> - /* We are the only user of this bfqq, just move it */ >>> - if (sync_bfqq->entity.sched_data !=3D &bfqg->sched_data) >>> - bfq_bfqq_move(bfqd, sync_bfqq, bfqg); >>> - } else { >>> - struct bfq_queue *bfqq; >>> + if (async_bfqq) { >>> + entity =3D &async_bfqq->entity; >>> >>> - /* >>> - * The queue was merged to a different queue. Check >>> - * that the merge chain still belongs to the same >>> - * cgroup. >>> - */ >>> - for (bfqq =3D sync_bfqq; bfqq; bfqq =3D bfqq->new_bfqq) >>> - if (bfqq->entity.sched_data !=3D >>> - &bfqg->sched_data) >>> - break; >>> - if (bfqq) { >>> - /* >>> - * Some queue changed cgroup so the merge is >>> - * not valid anymore. We cannot easily just >>> - * cancel the merge (by clearing new_bfqq) as >>> - * there may be other processes using this >>> - * queue and holding refs to all queues below >>> - * sync_bfqq->new_bfqq. Similarly if the merge >>> - * already happened, we need to detach from >>> - * bfqq now so that we cannot merge bio to a >>> - * request from the old cgroup. >>> - */ >>> - bfq_put_cooperator(sync_bfqq); >>> - bfq_release_process_ref(bfqd, sync_bfqq); >>> - bic_set_bfqq(bic, NULL, 1); >>> + if (entity->sched_data !=3D &bfqg->sched_data) { >>> + bic_set_bfqq(bic, NULL, 0, act_idx); >>> + bfq_release_process_ref(bfqd, async_bfqq); >>> } >>> } >>> + >>> + if (sync_bfqq) >>> + bfq_sync_bfqq_move(bfqd, sync_bfqq, bic, bfqg, act_idx); >>> } >>> >>> return bfqg; >>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c >>> index c740b41fe0a4..c2485b599d87 100644 >>> --- a/block/bfq-iosched.c >>> +++ b/block/bfq-iosched.c >>> @@ -377,14 +377,19 @@ static const unsigned long bfq_late_stable_merg= ing =3D 600; >>> #define RQ_BIC(rq) ((struct bfq_io_cq *)((rq)->elv.priv[0])) >>> #define RQ_BFQQ(rq) ((rq)->elv.priv[1]) >>> >>> -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync) >>> +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, >>> + bool is_sync, >>> + unsigned int actuator_idx) >>> { >>> - return bic->bfqq[is_sync]; >>> + return bic->bfqq[is_sync][actuator_idx]; >>> } >>> >>> static void bfq_put_stable_ref(struct bfq_queue *bfqq); >>> >>> -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, boo= l is_sync) >>> +void bic_set_bfqq(struct bfq_io_cq *bic, >>> + struct bfq_queue *bfqq, >>> + bool is_sync, >>> + unsigned int actuator_idx) >>> { >>> /* >>> * If bfqq !=3D NULL, then a non-stable queue merge between >>> @@ -399,7 +404,7 @@ void bic_set_bfqq(struct bfq_io_cq *bic, struct b= fq_queue *bfqq, bool is_sync) >>> * we cancel the stable merge if >>> * bic->stable_merge_bfqq =3D=3D bfqq. >>> */ >>> - bic->bfqq[is_sync] =3D bfqq; >>> + bic->bfqq[is_sync][actuator_idx] =3D bfqq; >>> >>> if (bfqq && bic->stable_merge_bfqq =3D=3D bfqq) { >>> /* >>> @@ -672,9 +677,9 @@ static void bfq_limit_depth(blk_opf_t opf, struct= blk_mq_alloc_data *data) >>> { >>> struct bfq_data *bfqd =3D data->q->elevator->elevator_data; >>> struct bfq_io_cq *bic =3D bfq_bic_lookup(data->q); >>> - struct bfq_queue *bfqq =3D bic ? bic_to_bfqq(bic, op_is_sync(opf)) = : NULL; >>> int depth; >>> unsigned limit =3D data->q->nr_requests; >>> + unsigned int act_idx; >>> >>> /* Sync reads have full depth available */ >>> if (op_is_sync(opf) && !op_is_write(opf)) { >>> @@ -684,14 +689,21 @@ static void bfq_limit_depth(blk_opf_t opf, stru= ct blk_mq_alloc_data *data) >>> limit =3D (limit * depth) >> bfqd->full_depth_shift; >>> } >>> >>> - /* >>> - * Does queue (or any parent entity) exceed number of requests that >>> - * should be available to it? Heavily limit depth so that it cannot >>> - * consume more available requests and thus starve other entities. >>> - */ >>> - if (bfqq && bfqq_request_over_limit(bfqq, limit)) >>> - depth =3D 1; >>> + for (act_idx =3D 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) { >>> + struct bfq_queue *bfqq =3D >>> + bic ? bic_to_bfqq(bic, op_is_sync(opf), act_idx) : NULL; >>> >>> + /* >>> + * Does queue (or any parent entity) exceed number of >>> + * requests that should be available to it? Heavily >>> + * limit depth so that it cannot consume more >>> + * available requests and thus starve other entities. >>> + */ >>> + if (bfqq && bfqq_request_over_limit(bfqq, limit)) { >>> + depth =3D 1; >>> + break; >>> + } >>> + } >>> bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", >>> __func__, bfqd->wr_busy_queues, op_is_sync(opf), depth); >>> if (depth) >>> @@ -2142,7 +2154,7 @@ static void bfq_check_waker(struct bfq_data *bf= qd, struct bfq_queue *bfqq, >>> * We reset waker detection logic also if too much time has passed >>> * since the first detection. If wakeups are rare, pointless idling >>> * doesn't hurt throughput that much. The condition below makes sure >>> - * we do not uselessly idle blocking waker in more than 1/64 cases.= =20 >>> + * we do not uselessly idle blocking waker in more than 1/64 cases. >>> */ >>> if (bfqd->last_completed_rq_bfqq !=3D >>> bfqq->tentative_waker_bfqq || >>> @@ -2454,6 +2466,16 @@ static void bfq_remove_request(struct request_= queue *q, >>> >>> } >>> >>> +/* get the index of the actuator that will serve bio */ >>> +static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct= bio *bio) >>> +{ >>> + /* >>> + * Multi-actuator support not complete yet, so always return 0 >>> + * for the moment. >>> + */ >>> + return 0; >>> +} >>> + >>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, >>> unsigned int nr_segs) >>> { >>> @@ -2478,7 +2500,8 @@ static bool bfq_bio_merge(struct request_queue = *q, struct bio *bio, >>> */ >>> bfq_bic_update_cgroup(bic, bio); >>> >>> - bfqd->bio_bfqq =3D bic_to_bfqq(bic, op_is_sync(bio->bi_opf)); >>> + bfqd->bio_bfqq =3D bic_to_bfqq(bic, op_is_sync(bio->bi_opf), >>> + bfq_actuator_index(bfqd, bio)); >>> } else { >>> bfqd->bio_bfqq =3D NULL; >>> } >>> @@ -3174,7 +3197,7 @@ bfq_merge_bfqqs(struct bfq_data *bfqd, struct b= fq_io_cq *bic, >>> /* >>> * Merge queues (that is, let bic redirect its requests to new_bfqq) >>> */ >>> - bic_set_bfqq(bic, new_bfqq, 1); >>> + bic_set_bfqq(bic, new_bfqq, 1, bfqq->actuator_idx); >>> bfq_mark_bfqq_coop(new_bfqq); >>> /* >>> * new_bfqq now belongs to at least two bics (it is a shared queue): >>> @@ -4808,11 +4831,12 @@ static struct bfq_queue *bfq_select_queue(str= uct bfq_data *bfqd) >>> */ >>> if (bfq_bfqq_wait_request(bfqq) || >>> (bfqq->dispatched !=3D 0 && bfq_better_to_idle(bfqq))) { >>> + unsigned int act_idx =3D bfqq->actuator_idx; >>> struct bfq_queue *async_bfqq =3D >>> - bfqq->bic && bfqq->bic->bfqq[0] && >>> - bfq_bfqq_busy(bfqq->bic->bfqq[0]) && >>> - bfqq->bic->bfqq[0]->next_rq ? >>> - bfqq->bic->bfqq[0] : NULL; >>> + bfqq->bic && bfqq->bic->bfqq[0][act_idx] && >>> + bfq_bfqq_busy(bfqq->bic->bfqq[0][act_idx]) && >>> + bfqq->bic->bfqq[0][act_idx]->next_rq ? >>> + bfqq->bic->bfqq[0][act_idx] : NULL; >>> struct bfq_queue *blocked_bfqq =3D >>> !hlist_empty(&bfqq->woken_list) ? >>> container_of(bfqq->woken_list.first, >>> @@ -4904,7 +4928,7 @@ static struct bfq_queue *bfq_select_queue(struc= t bfq_data *bfqd) >>> icq_to_bic(async_bfqq->next_rq->elv.icq) =3D=3D bfqq->bic && >>> bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <=3D >>> bfq_bfqq_budget_left(async_bfqq)) >>> - bfqq =3D bfqq->bic->bfqq[0]; >>> + bfqq =3D bfqq->bic->bfqq[0][act_idx]; >>> else if (bfqq->waker_bfqq && >>> bfq_bfqq_busy(bfqq->waker_bfqq) && >>> bfqq->waker_bfqq->next_rq && >>> @@ -5367,49 +5391,47 @@ static void bfq_exit_bfqq(struct bfq_data *bf= qd, struct bfq_queue *bfqq) >>> bfq_release_process_ref(bfqd, bfqq); >>> } >>> >>> -static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync) >>> +static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, >>> + bool is_sync, >>> + unsigned int actuator_idx) >>> { >>> - struct bfq_queue *bfqq =3D bic_to_bfqq(bic, is_sync); >>> + struct bfq_queue *bfqq =3D bic_to_bfqq(bic, is_sync, actuator_idx); >>> struct bfq_data *bfqd; >>> >>> if (bfqq) >>> bfqd =3D bfqq->bfqd; /* NULL if scheduler already exited */ >>> >>> if (bfqq && bfqd) { >>> - unsigned long flags; >>> - >>> - spin_lock_irqsave(&bfqd->lock, flags); >>> bfqq->bic =3D NULL; >>> bfq_exit_bfqq(bfqd, bfqq); >>> - bic_set_bfqq(bic, NULL, is_sync); >>> - spin_unlock_irqrestore(&bfqd->lock, flags); >>> + bic_set_bfqq(bic, NULL, is_sync, actuator_idx); >>> } >>> } >>> >>> static void bfq_exit_icq(struct io_cq *icq) >>> { >>> struct bfq_io_cq *bic =3D icq_to_bic(icq); >>> + struct bfq_data *bfqd =3D bic_to_bfqd(bic); >>> + unsigned long flags; >>> + unsigned int act_idx; >>> >>> - if (bic->stable_merge_bfqq) { >>> - struct bfq_data *bfqd =3D bic->stable_merge_bfqq->bfqd; >>> - >>> - /* >>> - * bfqd is NULL if scheduler already exited, and in >>> - * that case this is the last time bfqq is accessed. >>> - */ >>> - if (bfqd) { >>> - unsigned long flags; >>> + /* >>> + * bfqd is NULL if scheduler already exited, and in that case >>> + * this is the last time these queues are accessed. >>> + */ >>> + if (bfqd) >>> + spin_lock_irqsave(&bfqd->lock, flags); >>> >>> - spin_lock_irqsave(&bfqd->lock, flags); >>> - bfq_put_stable_ref(bic->stable_merge_bfqq); >>> - spin_unlock_irqrestore(&bfqd->lock, flags); >>> - } else { >>> + for (act_idx =3D 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) { >>> + if (bic->stable_merge_bfqq) >>> bfq_put_stable_ref(bic->stable_merge_bfqq); >>> - } >>> + >>> + bfq_exit_icq_bfqq(bic, true, act_idx); >>> + bfq_exit_icq_bfqq(bic, false, act_idx); >>> } >>> >>> - bfq_exit_icq_bfqq(bic, true); >>> - bfq_exit_icq_bfqq(bic, false); >>> + if (bfqd) >>> + spin_unlock_irqrestore(&bfqd->lock, flags); >>> } >>> >>> /* >>> @@ -5486,23 +5508,25 @@ static void bfq_check_ioprio_change(struct bf= q_io_cq *bic, struct bio *bio) >>> >>> bic->ioprio =3D ioprio; >>> >>> - bfqq =3D bic_to_bfqq(bic, false); >>> + bfqq =3D bic_to_bfqq(bic, false, bfq_actuator_index(bfqd, bio)); >>> if (bfqq) { >>> bfq_release_process_ref(bfqd, bfqq); >>> bfqq =3D bfq_get_queue(bfqd, bio, false, bic, true); >>> - bic_set_bfqq(bic, bfqq, false); >>> + bic_set_bfqq(bic, bfqq, false, bfq_actuator_index(bfqd, bio)); >>> } >>> >>> - bfqq =3D bic_to_bfqq(bic, true); >>> + bfqq =3D bic_to_bfqq(bic, true, bfq_actuator_index(bfqd, bio)); >>> if (bfqq) >>> bfq_set_next_ioprio_data(bfqq, bic); >>> } >>> >>> static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bf= qq, >>> - struct bfq_io_cq *bic, pid_t pid, int is_sync) >>> + struct bfq_io_cq *bic, pid_t pid, int is_sync, >>> + unsigned int act_idx) >>> { >>> u64 now_ns =3D ktime_get_ns(); >>> >>> + bfqq->actuator_idx =3D act_idx; >>> RB_CLEAR_NODE(&bfqq->entity.rb_node); >>> INIT_LIST_HEAD(&bfqq->fifo); >>> INIT_HLIST_NODE(&bfqq->burst_list_node); >>> @@ -5741,6 +5765,7 @@ static struct bfq_queue *bfq_get_queue(struct b= fq_data *bfqd, >>> struct bfq_group *bfqg; >>> >>> bfqg =3D bfq_bio_bfqg(bfqd, bio); >>> + >>> if (!is_sync) { >>> async_bfqq =3D bfq_async_queue_prio(bfqd, bfqg, ioprio_class, >>> ioprio); >>> @@ -5755,7 +5780,7 @@ static struct bfq_queue *bfq_get_queue(struct b= fq_data *bfqd, >>> >>> if (bfqq) { >>> bfq_init_bfqq(bfqd, bfqq, bic, current->pid, >>> - is_sync); >>> + is_sync, bfq_actuator_index(bfqd, bio)); >>> bfq_init_entity(&bfqq->entity, bfqg); >>> bfq_log_bfqq(bfqd, bfqq, "allocated"); >>> } else { >>> @@ -6070,7 +6095,8 @@ static bool __bfq_insert_request(struct bfq_dat= a *bfqd, struct request *rq) >>> * then complete the merge and redirect it to >>> * new_bfqq. >>> */ >>> - if (bic_to_bfqq(RQ_BIC(rq), 1) =3D=3D bfqq) >>> + if (bic_to_bfqq(RQ_BIC(rq), 1, >>> + bfq_actuator_index(bfqd, rq->bio)) =3D=3D bfqq) >>> bfq_merge_bfqqs(bfqd, RQ_BIC(rq), >>> bfqq, new_bfqq); >>> >>> @@ -6624,7 +6650,7 @@ bfq_split_bfqq(struct bfq_io_cq *bic, struct bf= q_queue *bfqq) >>> return bfqq; >>> } >>> >>> - bic_set_bfqq(bic, NULL, 1); >>> + bic_set_bfqq(bic, NULL, 1, bfqq->actuator_idx); >>> >>> bfq_put_cooperator(bfqq); >>> >>> @@ -6638,7 +6664,8 @@ static struct bfq_queue *bfq_get_bfqq_handle_sp= lit(struct bfq_data *bfqd, >>> bool split, bool is_sync, >>> bool *new_queue) >>> { >>> - struct bfq_queue *bfqq =3D bic_to_bfqq(bic, is_sync); >>> + unsigned int act_idx =3D bfq_actuator_index(bfqd, bio); >>> + struct bfq_queue *bfqq =3D bic_to_bfqq(bic, is_sync, act_idx); >>> >>> if (likely(bfqq && bfqq !=3D &bfqd->oom_bfqq)) >>> return bfqq; >>> @@ -6650,7 +6677,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_sp= lit(struct bfq_data *bfqd, >>> bfq_put_queue(bfqq); >>> bfqq =3D bfq_get_queue(bfqd, bio, is_sync, bic, split); >>> >>> - bic_set_bfqq(bic, bfqq, is_sync); >>> + bic_set_bfqq(bic, bfqq, is_sync, act_idx); >>> if (split && is_sync) { >>> if ((bic->was_in_burst_list && bfqd->large_burst) || >>> bic->saved_in_large_burst) >>> @@ -7092,8 +7119,10 @@ static int bfq_init_queue(struct request_queue= *q, struct elevator_type *e) >>> * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues. >>> * Grab a permanent reference to it, so that the normal code flow >>> * will not attempt to free it. >>> + * Set zero as actuator index: we will pretend that >>> + * all I/O requests are for the same actuator. >>> */ >>> - bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0); >>> + bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0, 0); >>> bfqd->oom_bfqq.ref++; >>> bfqd->oom_bfqq.new_ioprio =3D BFQ_DEFAULT_QUEUE_IOPRIO; >>> bfqd->oom_bfqq.new_ioprio_class =3D IOPRIO_CLASS_BE; >>> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h >>> index ad8e513d7e87..8b5225a9e080 100644 >>> --- a/block/bfq-iosched.h >>> +++ b/block/bfq-iosched.h >>> @@ -33,6 +33,8 @@ >>> */ >>> #define BFQ_SOFTRT_WEIGHT_FACTOR 100 >>> >>> +#define BFQ_NUM_ACTUATORS 2 >>> + >>> struct bfq_entity; >>> >>> /** >>> @@ -225,12 +227,14 @@ struct bfq_ttime { >>> * struct bfq_queue - leaf schedulable entity. >>> * >>> * A bfq_queue is a leaf request queue; it can be associated with an >>> - * io_context or more, if it is async or shared between cooperat= ing >>> - * processes. @cgroup holds a reference to the cgroup, to be sure th= at it >>> - * does not disappear while a bfqq still references it (mostly to av= oid >>> - * races between request issuing and task migration followed by cgro= up >>> - * destruction). >>> - * All the fields are protected by the queue lock of the containing = bfqd. >>> + * io_context or more, if it is async or shared between cooperating >>> + * processes. Besides, it contains I/O requests for only one actuato= r >>> + * (an io_context is associated with a different bfq_queue for each >>> + * actuator it generates I/O for). @cgroup holds a reference to the >>> + * cgroup, to be sure that it does not disappear while a bfqq still >>> + * references it (mostly to avoid races between request issuing and >>> + * task migration followed by cgroup destruction). All the fields a= re >>> + * protected by the queue lock of the containing bfqd. >>> */ >>> struct bfq_queue { >>> /* reference counter */ >>> @@ -399,6 +403,9 @@ struct bfq_queue { >>> * the woken queues when this queue exits. >>> */ >>> struct hlist_head woken_list; >>> + >>> + /* index of the actuator this queue is associated with */ >>> + unsigned int actuator_idx; >>> }; >>> >>> /** >>> @@ -407,8 +414,17 @@ struct bfq_queue { >>> struct bfq_io_cq { >>> /* associated io_cq structure */ >>> struct io_cq icq; /* must be the first member */ >>> - /* array of two process queues, the sync and the async */ >>> - struct bfq_queue *bfqq[2]; >>> + /* >>> + * Matrix of associated process queues: first row for async >>> + * queues, second row sync queues. Each row contains one >>> + * column for each actuator. An I/O request generated by the >>> + * process is inserted into the queue pointed by bfqq[i][j] if >>> + * the request is to be served by the j-th actuator of the >>> + * drive, where i=3D=3D0 or i=3D=3D1, depending on whether the requ= est >>> + * is async or sync. So there is a distinct queue for each >>> + * actuator. >>> + */ >>> + struct bfq_queue *bfqq[2][BFQ_NUM_ACTUATORS]; >>> /* per (request_queue, blkcg) ioprio */ >>> int ioprio; >>> #ifdef CONFIG_BFQ_GROUP_IOSCHED >>> @@ -968,8 +984,10 @@ struct bfq_group { >>> >>> extern const int bfq_timeout; >>> >>> -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync); >>> -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, boo= l is_sync); >>> +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync, >>> + unsigned int actuator_idx); >>> +void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, boo= l is_sync, >>> + unsigned int actuator_idx); >>> struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic); >>> void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *b= fqq); >>> void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bf= qq, >> >> --=20 >> Damien Le Moal >> Western Digital Research >=20 >=20 --=20 Damien Le Moal Western Digital Research