From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06B4BC10F25 for ; Tue, 10 Mar 2020 03:07:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A3A8720728 for ; Tue, 10 Mar 2020 03:07:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="MhKBYvEe"; dkim=pass (1024-bit key) header.d=sharedspace.onmicrosoft.com header.i=@sharedspace.onmicrosoft.com header.b="mpPoqZbF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726389AbgCJDHp (ORCPT ); Mon, 9 Mar 2020 23:07:45 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:52565 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726170AbgCJDHp (ORCPT ); Mon, 9 Mar 2020 23:07:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1583809664; x=1615345664; h=from:to:cc:subject:date:message-id:references: content-transfer-encoding:mime-version; bh=Cu7wvgOHIrb0rF2kflC+5CeoEC+M6d23nxNzq434F+U=; b=MhKBYvEeKBTvWGmDWmnwuE/cOcdYQcMun8OEr2bamfWROTHZxH9F4pIc OS46sbATwT0g1Kf5hoHGPMsMul7KRkXuj6I+RA18EAPFzJioXSIdK2R9V 57FuEJdX/1G2QEDJBUNNdcFANrUWkkZj775CRENayyaVmeI00Zv84Po7m w8tBuZTcTvyYPr4zbu4QGldvNQu3AyYpgNOSLvY1ZF80vA0wAKi0Dvwlk xiR116/eEKtACMqQlVRKaiixTH/p+YgV0FDPuy6UtMXBDtZswBFxj5ay7 cY7ipmi/9GYTFl5IJ5VOB+qWTajfPE4rizfFxqZnnWdxmb9fVvDFoEh2s g==; IronPort-SDR: 857nMt26dBC0ctro9TUWRRtnEbdhcy5MBJIJktXgwHXK+raTiAgycPqEgtOmvAMK4rB+7/+UD/ I07rg61ICD8rpyLHPGc78KHFtbSgksN8CUs3onhfFtf8PPV6GbzuBSV5r3QJDL1Xq/o3Hb3EiP 8bImz6HfKxEaKc6OpgSFsmRVDJYMJaKrMp+2d4JPylREBPv6uMGUVlA072Tsg+meyeSVgV34jk 7uk93adJcsFV+wtZ0hY/sL8Y8HaT+uX2xRc2fzJQSPiZlSWPRNpSDeUr5XDnPlcyn6gWt/tGGu DLc= X-IronPort-AV: E=Sophos;i="5.70,535,1574092800"; d="scan'208";a="132029265" Received: from mail-mw2nam12lp2049.outbound.protection.outlook.com (HELO NAM12-MW2-obe.outbound.protection.outlook.com) ([104.47.66.49]) by ob1.hgst.iphmx.com with ESMTP; 10 Mar 2020 11:07:43 +0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TUaNDm+zN5kzGIy0MSD/Se7c3TcEax2/pqBWtGza3c31A0BTGjvvVajWTxCEUNVXNzt6uzhlD1q94CcmLE24ydmka9Ko18eC8GX6rsHinUcB6LNvmYcy7G2KNczUYYiy6cRbi9fGOUtCbN/6MQ6bZ9ybXFCI1mFZbMYotUAgZhFvA6lrM1PIdK/SqW4J+6+A2Z6XJcAf65PQ4wP502gghgDqneF7V3DksB5FTfMXFdpAbrtWNhO/2fZiGtpheSKTOYQ8zZh2ibiNFduPmbtn+IgGnqzc6wYgKq+ixOxfo+r+Z/bbdteiwyVw6sDkVVkbPiRXhGS1HMMjyN6oz+Do8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3PT0grNzh7o1M0JGhQvGX+ILLbVznB8R/+APGBwrT7g=; b=EdJk7vZt0Cpay9MF/VfTLzMhWAZqx5yYuqX4rfGY+HCOXM3nHy5Yi/GEL4oCoMWKY1ROLh/Y5hudEpYLbr+3kXvwR25nI5fx/U5oHptUgTjDItoC+EZLX019Pz2l1N9DooZ4E21uZUiIa5pleuWgZEow1iUhV9LqgZbRhqz1ttYn6NTqg+r8vnRGphn6pexMfmW7CM/2ks8fERgI5t3bZj6N1qWv/jpbZR312i6HgKFF95kjcYLtkAfgM0frkkqId/MHkrnUktJIwlCdPBToofbb0K7e255um6V8JIKpwFI53rtltLW7QTp3frl4yxx0JoVFWRnBvUy8HS8B9ZmQ5A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=wdc.com; dmarc=pass action=none header.from=wdc.com; dkim=pass header.d=wdc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sharedspace.onmicrosoft.com; s=selector2-sharedspace-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3PT0grNzh7o1M0JGhQvGX+ILLbVznB8R/+APGBwrT7g=; b=mpPoqZbF8d0+X/X7LhDPolSGO/Lsmn/5YSchl1pupfACfmIwCQP7Qmd2JmQcyPHy15OjyVSUgFe44rhgAfUaE+TuhuU1Mepn42W2KNQw86m63CNCoZQUNZt/IpS9SZPAaeOLPBEAc95MuRbVRWgwU6dR4ddD4vZrTdo9QckcewE= Received: from BYAPR04MB5816.namprd04.prod.outlook.com (2603:10b6:a03:10e::16) by BYAPR04MB4950.namprd04.prod.outlook.com (2603:10b6:a03:4c::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2793.17; Tue, 10 Mar 2020 03:07:42 +0000 Received: from BYAPR04MB5816.namprd04.prod.outlook.com ([fe80::6daf:1b7c:1a61:8cb2]) by BYAPR04MB5816.namprd04.prod.outlook.com ([fe80::6daf:1b7c:1a61:8cb2%6]) with mapi id 15.20.2793.013; Tue, 10 Mar 2020 03:07:42 +0000 From: Damien Le Moal To: Ming Lei , Shinichiro Kawasaki CC: "linux-block@vger.kernel.org" , Jens Axboe Subject: Re: commit 01e99aeca397 causes longer runtime of block/004 Thread-Topic: commit 01e99aeca397 causes longer runtime of block/004 Thread-Index: AQHV8c4FlOkNXDceTU6Nwl3i3vnr/A== Date: Tue, 10 Mar 2020 03:07:42 +0000 Message-ID: References: <20200304034644.GA23012@ming.t460p> <20200304061137.l4hdqdt2dvs7dxgz@shindev.dhcp.fujisawa.hgst.com> <20200304095344.GA10390@ming.t460p> <20200305011900.2rtgtmclovmr2fbw@shindev.dhcp.fujisawa.hgst.com> <20200305024808.GA26733@ming.t460p> <20200306060622.t2jl7qkzvkwvvcbx@shindev.dhcp.fujisawa.hgst.com> <20200306081309.GA29683@ming.t460p> <20200307010222.gtrwivafqe2254i6@shindev.dhcp.fujisawa.hgst.com> <20200307041343.GB20579@ming.t460p> <20200309000715.sqgsssrauzmmpli2@shindev.dhcp.fujisawa.hgst.com> <20200309161430.GA4871@ming.t460p> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Damien.LeMoal@wdc.com; x-originating-ip: [2400:2411:43c0:6000:9183:2c1d:9c5e:d831] x-ms-publictraffictype: Email x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 596fc968-3e10-45b5-7ae8-08d7c4a032dc x-ms-traffictypediagnostic: BYAPR04MB4950: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: wdcipoutbound: EOP-TRUE x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 033857D0BD x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(4636009)(376002)(346002)(396003)(366004)(136003)(39860400002)(189003)(199004)(316002)(110136005)(54906003)(64756008)(6506007)(66446008)(86362001)(66476007)(66556008)(966005)(53546011)(4326008)(71200400001)(478600001)(8676002)(8936002)(81156014)(186003)(81166006)(66946007)(6636002)(76116006)(33656002)(66574012)(7696005)(9686003)(55016002)(2906002)(52536014)(5660300002)(148743002)(60764002)(309714004);DIR:OUT;SFP:1102;SCL:1;SRVR:BYAPR04MB4950;H:BYAPR04MB5816.namprd04.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: k8YdKjzarIuH35WSV1anDj36eqX0JP+h9d8Aqoq9cp9wQ8OkhhKVgamy64iuWucVZXnqGvj7GBEAdZiHMXypDUQOFNFUDlx+m56AEfqI6+egRiTj7t8C/ulvXKvQObP3X1RWKUoBKc0eOp7eLlWIA3OyQOyDEEidJkdzhvdn+l9O/uIQVADUNCbIoGqH3jD6fTDDYIZnEGae0fqFQg8vNme+s47z4t3o7kN6z6ZCEQPDvqm/8UGw2dS/rRCk+cq39xlSVfqUW8p+F79w2a2LoEpEXeAQ+dBOBsToSqN+eXFDpLhUz7IKkLgLcFd2pt1bZXYhTxcRX642L8iW/d3lcfrJbSg4rcVRQ6AQhr5CkISomx9xjZ70vpjTmaET4iNpcWo03jgilM5JPvVoeIPWBeAiIleF3d/sNPpfWDyBmRUXu7uQ8qlPbHezk+NPxPXVVGvnCM98Rzwk+6wjPqoiaiUobq1bMRCum11vYpZJy8wSZjwnwIGR/B5Kf2vQYJf1r3SrELWd/ExMmven2kXiEa2eMCVa6SalVkrOmoTb6MY5y5Qy+ThEpzoMZP6LznyzeIohZc7cyYVQRw9bPXJRlXQBqRASzbQ5Vyb9m0UHj1B70N2vW2pN0RCHpfQoCm75 x-ms-exchange-antispam-messagedata: iB4uhPg4YVTj3j9KlA8qC4xLbVc2ZJCDN+9W5yGhxEwsGyk1imTqk604GDZoO6YzMIK8pPkgga9ZT+Kknc7SzXVnL2q8GWeWBcEEcueSejkiVVSzMyKaMZTYc5EK0SJEWd6QWpLOv8I1kDzLpWDkbLS1sYiRf36VP1NeJmvrWW0LqIXW5791baQtgECtUMkXMqHa3r6G29v+GVlhbbq0PA== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: 596fc968-3e10-45b5-7ae8-08d7c4a032dc X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Mar 2020 03:07:42.2396 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: pOAYBE6XUqG//zdIK1NldFhRdTpVWEn1Xp/o1XKeEbbyJR/hU5wv868p7jadZb7NVKStFZ21qXRQXIOQlbGVDA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR04MB4950 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 2020/03/10 1:14, Ming Lei wrote:=0A= > On Mon, Mar 09, 2020 at 12:07:16AM +0000, Shinichiro Kawasaki wrote:=0A= >> On Mar 07, 2020 / 12:13, Ming Lei wrote:=0A= >>> On Sat, Mar 07, 2020 at 01:02:23AM +0000, Shinichiro Kawasaki wrote:=0A= >>>> On Mar 06, 2020 / 16:13, Ming Lei wrote:=0A= >>>>> On Fri, Mar 06, 2020 at 06:06:23AM +0000, Shinichiro Kawasaki wrote:= =0A= >>>>>> On Mar 05, 2020 / 10:48, Ming Lei wrote:=0A= >>>>>>> Hi Shinichiro,=0A= >>>>>>>=0A= >>>>>>> On Thu, Mar 05, 2020 at 01:19:02AM +0000, Shinichiro Kawasaki wrote= :=0A= >>>>>>>> On Mar 04, 2020 / 17:53, Ming Lei wrote:=0A= >>>>>>>>> On Wed, Mar 04, 2020 at 06:11:37AM +0000, Shinichiro Kawasaki wro= te:=0A= >>>>>>>>>> On Mar 04, 2020 / 11:46, Ming Lei wrote:=0A= >>>>>>>>>>> On Wed, Mar 04, 2020 at 02:38:43AM +0000, Shinichiro Kawasaki w= rote:=0A= >>>>>>>>>>>> I noticed that blktests block/004 takes longer runtime with 5.= 6-rc4 than=0A= >>>>>>>>>>>> 5.6-rc3, and found that the commit 01e99aeca397 ("blk-mq: inse= rt passthrough=0A= >>>>>>>>>>>> request into hctx->dispatch directly") triggers it.=0A= >>>>>>>>>>>>=0A= >>>>>>>>>>>> The longer runtime was observed with dm-linear device which ma= ps SATA SMR HDD=0A= >>>>>>>>>>>> connected via AHCI. It was not observed with dm-linear on SAS/= SATA SMR HDDs=0A= >>>>>>>>>>>> connected via SAS-HBA. Not observed with dm-linear on non-SMR = HDDs either.=0A= >>>>>>>>>>>>=0A= >>>>>>>>>>>> Before the commit, block/004 took around 130 seconds. After th= e commit, it takes=0A= >>>>>>>>>>>> around 300 seconds. I need to dig in further details to unders= tand why the=0A= >>>>>>>>>>>> commit makes the test case longer.=0A= >>>>>>>>>>>>=0A= >>>>>>>>>>>> The test case block/004 does "flush intensive workload". Is th= is longer runtime=0A= >>>>>>>>>>>> expected?=0A= >>>>>>>>>>>=0A= >>>>>>>>>>> The following patch might address this issue:=0A= >>>>>>>>>>>=0A= >>>>>>>>>>> https://lore.kernel.org/linux-block/20200207190416.99928-1-sqaz= i@google.com/#t=0A= >>>>>>>>>>>=0A= >>>>>>>>>>> Please test and provide us the result.=0A= >>>>>>>>>>>=0A= >>>>>>>>>>> thanks,=0A= >>>>>>>>>>> Ming=0A= >>>>>>>>>>>=0A= >>>>>>>>>>=0A= >>>>>>>>>> Hi Ming,=0A= >>>>>>>>>>=0A= >>>>>>>>>> I applied the patch to 5.6-rc4 but I observed the longer runtime= of block/004.=0A= >>>>>>>>>> Still it takes around 300 seconds.=0A= >>>>>>>>>=0A= >>>>>>>>> Hello Shinichiro,=0A= >>>>>>>>>=0A= >>>>>>>>> block/004 only sends 1564 sync randwrite, and seems 130s has been= slow=0A= >>>>>>>>> enough.=0A= >>>>>>>>>=0A= >>>>>>>>> There are two related effect in that commit for your issue:=0A= >>>>>>>>>=0A= >>>>>>>>> 1) 'at_head' is applied in blk_mq_sched_insert_request() for flus= h=0A= >>>>>>>>> request=0A= >>>>>>>>>=0A= >>>>>>>>> 2) all IO is added back to tail of hctx->dispatch after .queue_rq= ()=0A= >>>>>>>>> returns STS_RESOURCE=0A= >>>>>>>>>=0A= >>>>>>>>> Seems it is more related with 2) given you can't reproduce the is= sue on =0A= >>>>>>>>> SAS.=0A= >>>>>>>>>=0A= >>>>>>>>> So please test the following two patches, and see which one makes= a=0A= >>>>>>>>> difference for you.=0A= >>>>>>>>>=0A= >>>>>>>>> BTW, both two looks not reasonable, just for narrowing down the i= ssue.=0A= >>>>>>>>>=0A= >>>>>>>>> 1) patch 1=0A= >>>>>>>>>=0A= >>>>>>>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c=0A= >>>>>>>>> index 856356b1619e..86137c75283c 100644=0A= >>>>>>>>> --- a/block/blk-mq-sched.c=0A= >>>>>>>>> +++ b/block/blk-mq-sched.c=0A= >>>>>>>>> @@ -398,7 +398,7 @@ void blk_mq_sched_insert_request(struct reque= st *rq, bool at_head,=0A= >>>>>>>>> WARN_ON(e && (rq->tag !=3D -1));=0A= >>>>>>>>> =0A= >>>>>>>>> if (blk_mq_sched_bypass_insert(hctx, !!e, rq)) {=0A= >>>>>>>>> - blk_mq_request_bypass_insert(rq, at_head, false);=0A= >>>>>>>>> + blk_mq_request_bypass_insert(rq, true, false);=0A= >>>>>>>>> goto run;=0A= >>>>>>>>> }=0A= >>>>>>>>=0A= >>>>>>>> Ming, thank you for the trial patches.=0A= >>>>>>>> This "patch 1" reduced the runtime, as short as rc3.=0A= >>>>>>>>=0A= >>>>>>>>>=0A= >>>>>>>>>=0A= >>>>>>>>> 2) patch 2=0A= >>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c=0A= >>>>>>>>> index d92088dec6c3..447d5cb39832 100644=0A= >>>>>>>>> --- a/block/blk-mq.c=0A= >>>>>>>>> +++ b/block/blk-mq.c=0A= >>>>>>>>> @@ -1286,7 +1286,7 @@ bool blk_mq_dispatch_rq_list(struct request= _queue *q, struct list_head *list,=0A= >>>>>>>>> q->mq_ops->commit_rqs(hctx);=0A= >>>>>>>>> =0A= >>>>>>>>> spin_lock(&hctx->lock);=0A= >>>>>>>>> - list_splice_tail_init(list, &hctx->dispatch);=0A= >>>>>>>>> + list_splice_init(list, &hctx->dispatch);=0A= >>>>>>>>> spin_unlock(&hctx->lock);=0A= >>>>>>>>> =0A= >>>>>>>>> /*=0A= >>>>>>>>=0A= >>>>>>>> This patch 2 didn't reduce the runtime.=0A= >>>>>>>>=0A= >>>>>>>> Wish this report helps.=0A= >>>>>>>=0A= >>>>>>> Your feedback does help, then please test the following patch:=0A= >>>>>>=0A= >>>>>> Hi Ming, thank you for the patch. I applied it on top of rc4 and con= firmed=0A= >>>>>> it reduces the runtime as short as rc3. Good.=0A= >>>>>=0A= >>>>> Hi Shinichiro,=0A= >>>>>=0A= >>>>> Thanks for your test!=0A= >>>>>=0A= >>>>> Then I think the following change should make the difference actually= ,=0A= >>>>> you may double check that and confirm if it is that.=0A= >>>>>=0A= >>>>>> @@ -334,7 +334,7 @@ static void blk_kick_flush(struct request_queue = *q, struct blk_flush_queue *fq,=0A= >>>>>> flush_rq->rq_disk =3D first_rq->rq_disk;=0A= >>>>>> flush_rq->end_io =3D flush_end_io;=0A= >>>>>> =0A= >>>>>> - blk_flush_queue_rq(flush_rq, false);=0A= >>>>>> + blk_flush_queue_rq(flush_rq, true);=0A= >>>>=0A= >>>> Yes, with this the one line change above only, the runtime was reduced= .=0A= >>>>=0A= >>>>>=0A= >>>>> However, the flush request is added to tail of dispatch queue[1] for = long time.=0A= >>>>> 0cacba6cf825 ("blk-mq-sched: bypass the scheduler for flushes entirel= y")=0A= >>>>> and its predecessor(all mq scheduler start patches) changed to add fl= ush request=0A= >>>>> to front of dispatch queue for blk-mq by ignoring 'add_queue' paramet= er of=0A= >>>>> blk_mq_sched_insert_flush(). That change may be by accident, and not = sure it is=0A= >>>>> correct.=0A= >>>>>=0A= >>>>> I guess once flush rq is added to tail of dispatch queue in block/004= ,=0A= >>>>> in which lots of FS request may stay in hctx->dispatch because of low= =0A= >>>>> AHCI queue depth, then we may take a bit long for flush rq to be=0A= >>>>> submitted to LLD.=0A= >>>>>=0A= >>>>> I'd suggest to root cause/understand the issue given it isn't obvious= =0A= >>>>> correct to queue flush rq at front of dispatch queue, so could you co= llect=0A= >>>>> the following trace via the following script with/without the single = line=0A= >>>>> patch?=0A= >>>>=0A= >>>> Thank you for the thoughts for the correct design. I have taken the tw= o traces,=0A= >>>> with and without the one liner patch above. The gzip archived trace fi= les have=0A= >>>> 1.6MB size. It looks too large to post to the list. Please let me know= how you=0A= >>>> want the trace files shared.=0A= >>>=0A= >>> I didn't thought the trace can be so big given the ios should be just= =0A= >>> 256 * 64(1564).=0A= >>>=0A= >>> You may put the log somewhere in Internet, cloud storage, web, or=0A= >>> whatever. Then just provides us the link.=0A= >>>=0A= >>> Or if you can't find a place to hold it, just send to me, and I will pu= t=0A= >>> it in my RH people web link.=0A= >>=0A= >> I have sent another e-mail only to you attaching the log files gziped.= =0A= >> Your confirmation will be appreciated.=0A= > =0A= > Yeah, I got the log, and it has been put in the following link:=0A= > =0A= > http://people.redhat.com/minlei/tests/logs/blktests_block_004_perf_degrad= e.tar.gz=0A= > =0A= > Also I have run some analysis, and block/004 basically call pwrite() &=0A= > fsync() in each job.=0A= > =0A= > 1) v5.6-rc kernel=0A= > - write rq average latency: 0.091s =0A= > - flush rq average latency: 0.018s=0A= > - average issue times of write request: 1.978 //how many trace_block_rq_= issue() is called for one rq=0A= > - average issue times of flush request: 1.052=0A= > =0A= > 2) v5.6-rc patched kernel=0A= > - write rq average latency: 0.031=0A= > - flush rq average latency: 0.035=0A= > - average issue times of write request: 1.466=0A= > - average issue times of flush request: 1.610=0A= > =0A= > =0A= > block/004 starts 64 jobs and AHCI's queue can become saturated easily,=0A= > then BLK_MQ_S_SCHED_RESTART should be set, so write request in dispatch= =0A= > queue can only move on after one request is completed.=0A= > =0A= > Looks the flush request takes shorter time than normal write IO.=0A= > If flush request is added to front of dispatch queue, the next normal=0A= > write IO could be queued to lld quicker than adding to tail of dispatch= =0A= > queue.=0A= > trace_block_rq_issue() is called more than one time is because of=0A= > AHCI or the drive's implementation. It usually means that=0A= > host->hostt->queuecommand() fails for several times when queuing one=0A= > single request. For AHCI, I understand it shouldn't fail normally given= =0A= > we guarantee that the actual queue depth is <=3D either LUN or host's=0A= > queue depth. Maybe it depends on your SMR's implementation about handling= =0A= > flush/write IO. Could you check why .queuecommand() fails in block/004?= =0A= =0A= Indeed, that is weird that queuecommand fails. There is nothing SMR specifi= c in=0A= the AHCI code beside disk probe checks. So write & flush handling does not= =0A= differ between SMR and regular disks. The same applies to the drive side. w= rite=0A= and flush commands are the normal commands, no change at all. The only=0A= difference being the sequential write constraint which the drive honors by = not=0A= executing the queued write command out of order. But there are no constrain= t for=0A= flush on SMR, so we get whatever the drive does, that is, totally drive dep= endent.=0A= =0A= I wonder if the issue may be with the particular AHCI chipset used for this= test.=0A= =0A= > =0A= > Also can you provide queue flags via the following command?=0A= > =0A= > cat /sys/kernel/debug/block/sdN/state=0A= > =0A= > I understand flush request should be slower than normal write, however=0A= > looks not true in this hardware.=0A= =0A= Probably due to the fact that since the writes are sequential, there is a l= ot of=0A= drive internal optimization that can be done to minimize the cost of flush= =0A= (internal data streaming from cache to media, media-cache use, etc) That is= true=0A= for regular disks too. And all of this is highly dependent on the hardware= =0A= implementation.=0A= =0A= Thank you for your help !=0A= =0A= Shinichiro,=0A= =0A= It may be worth trying the same run with & without Ming's patch on a machin= e=0A= with a different chipset...=0A= =0A= > =0A= > =0A= > Thanks,=0A= > Ming=0A= > =0A= > =0A= =0A= =0A= -- =0A= Damien Le Moal=0A= Western Digital Research=0A=