From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers To: Paolo Valente References: <1481933536-12844-1-git-send-email-axboe@fb.com> <1481933536-12844-7-git-send-email-axboe@fb.com> <9936D33B-AE98-46DF-B1DA-D54142A338FD@linaro.org> <8d900827-3569-b695-86d6-3cfe27884d6a@fb.com> <91E04322-8248-48E6-A4F6-FD4914A31BFF@linaro.org> Cc: Jens Axboe , linux-block@vger.kernel.org, Linux-Kernal , osandov@fb.com From: Jens Axboe Message-ID: <1854448c-a7bb-e9a9-cfda-5c301429b301@fb.com> Date: Tue, 17 Jan 2017 04:38:40 -0800 MIME-Version: 1.0 In-Reply-To: <91E04322-8248-48E6-A4F6-FD4914A31BFF@linaro.org> Content-Type: text/plain; charset=windows-1252 List-ID: On 01/17/2017 02:13 AM, Paolo Valente wrote: > >> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe ha scritto: >> >> On 12/22/2016 04:13 AM, Paolo Valente wrote: >>> >>>> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente ha scritto: >>>> >>>>> >>>>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe ha scritto: >>>>> >>>>> This adds a set of hooks that intercepts the blk-mq path of >>>>> allocating/inserting/issuing/completing requests, allowing >>>>> us to develop a scheduler within that framework. >>>>> >>>>> We reuse the existing elevator scheduler API on the registration >>>>> side, but augment that with the scheduler flagging support for >>>>> the blk-mq interfce, and with a separate set of ops hooks for MQ >>>>> devices. >>>>> >>>>> Schedulers can opt in to using shadow requests. Shadow requests >>>>> are internal requests that the scheduler uses for for the allocate >>>>> and insert part, which are then mapped to a real driver request >>>>> at dispatch time. This is needed to separate the device queue depth >>>>> from the pool of requests that the scheduler has to work with. >>>>> >>>>> Signed-off-by: Jens Axboe >>>>> >>>> ... >>>> >>>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >>>>> new file mode 100644 >>>>> index 000000000000..b7e1839d4785 >>>>> --- /dev/null >>>>> +++ b/block/blk-mq-sched.c >>>> >>>>> ... >>>>> +static inline bool >>>>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq, >>>>> + struct bio *bio) >>>>> +{ >>>>> + struct elevator_queue *e = q->elevator; >>>>> + >>>>> + if (e && e->type->ops.mq.allow_merge) >>>>> + return e->type->ops.mq.allow_merge(q, rq, bio); >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>> >>>> Something does not seem to add up here: >>>> e->type->ops.mq.allow_merge may be called only in >>>> blk_mq_sched_allow_merge, which, in its turn, may be called only in >>>> blk_mq_attempt_merge, which, finally, may be called only in >>>> blk_mq_merge_queue_io. Yet the latter may be called only if there is >>>> no elevator (line 1399 and 1507 in blk-mq.c). >>>> >>>> Therefore, e->type->ops.mq.allow_merge can never be called, both if >>>> there is and if there is not an elevator. Be patient if I'm missing >>>> something huge, but I thought it was worth reporting this. >>>> >>> >>> Just another detail: if e->type->ops.mq.allow_merge does get invoked >>> from the above path, then it is invoked of course without the >>> scheduler lock held. In contrast, if this function gets invoked >>> from dd_bio_merge, then the scheduler lock is held. >> >> But the scheduler controls that itself. So it'd be perfectly fine to >> have a locked and unlocked variant. The way that's typically done is to >> have function() grabbing the lock, and __function() is invoked with the >> lock held. >> >>> To handle this opposite alternatives, I don't know whether checking if >>> the lock is held (and possibly taking it) from inside >>> e->type->ops.mq.allow_merge is a good solution. In any case, before >>> possibly trying it, I will wait for some feedback on the main problem, >>> i.e., on the fact that e->type->ops.mq.allow_merge >>> seems unreachable in the above path. >> >> Checking if a lock is held is NEVER a good idea, as it leads to both bad >> and incorrect code. If you just check if a lock is held when being >> called, you don't necessarily know if it was the caller that grabbed it >> or it just happens to be held by someone else for unrelated reasons. >> >> > > Thanks a lot for this and the above explanations. Unfortunately, I > still see the problem. To hopefully make you waste less time, I have > reported the problematic paths explicitly, so that you can quickly > point me to my mistake. > > The problem is caused by the existence of at least the following two > alternative paths to e->type->ops.mq.allow_merge. > > 1. In mq-deadline.c (line 374): spin_lock(&dd->lock); > blk_mq_sched_try_merge -> elv_merge -> elv_bio_merge_ok -> > elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge > > 2. In blk-core.c (line 1660): spin_lock_irq(q->queue_lock); > elv_merge -> elv_bio_merge_ok -> > elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge > > In the first path, the scheduler lock is held, while in the second > path, it is not. This does not cause problems with mq-deadline, > because the latter just has no allow_merge function. Yet it does > cause problems with the allow_merge implementation of bfq. There was > no issue in blk, as only the global queue lock was used. > > Where am I wrong? #2 can never happen for blk-mq, it's the old IO path. blk-mq is never invoked with ->queue_lock held. -- Jens Axboe From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751217AbdAQMi6 (ORCPT ); Tue, 17 Jan 2017 07:38:58 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:33371 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750826AbdAQMiz (ORCPT ); Tue, 17 Jan 2017 07:38:55 -0500 Subject: Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers To: Paolo Valente References: <1481933536-12844-1-git-send-email-axboe@fb.com> <1481933536-12844-7-git-send-email-axboe@fb.com> <9936D33B-AE98-46DF-B1DA-D54142A338FD@linaro.org> <8d900827-3569-b695-86d6-3cfe27884d6a@fb.com> <91E04322-8248-48E6-A4F6-FD4914A31BFF@linaro.org> CC: Jens Axboe , , Linux-Kernal , From: Jens Axboe Message-ID: <1854448c-a7bb-e9a9-cfda-5c301429b301@fb.com> Date: Tue, 17 Jan 2017 04:38:40 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <91E04322-8248-48E6-A4F6-FD4914A31BFF@linaro.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [50.233.49.230] X-ClientProxiedBy: BN6PR03CA0001.namprd03.prod.outlook.com (10.168.230.139) To BN6PR15MB1188.namprd15.prod.outlook.com (10.172.205.142) X-MS-Office365-Filtering-Correlation-Id: c1c8101a-37f0-4051-64db-08d43ed5c827 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:BN6PR15MB1188; X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1188;3:vuXwd+3v0hvFpqxsp1UMrGKtdNDZVnTeCylX0E8JHAQYKXWbKyCjhDRjAq8muLP1vHcVM/3w4j86E1VWhRnT9zKRfUBksSxZ3tnJNL2Q6BrYLRYR5iaEwd4X+Dz45CwUSrB0MuYdE8NpMR5F4jm//oV6cq4kHd2EoF55FclEPRThFrliL6Fb4axHm52Sg6hFEtUYiylhVU4+vyfEyvWNrLU4luCXL25c0blW1jk/9xUFaysYA5ZZN6NKvCJtDI5Oea6EX7WHaKDZQ0RXkK4+Dw== X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1188;25:56t7jpOscNt9juT54TKAbCdZaYH0xl2mDRxjwES3YJsJCrZXWWtF2WW9DU1llzj3b/ctQMhfhKNFRFOYjPL11yYgxyzxB4Qe8N2r8Q9omvyfz2d0ofi2qyaaoKC7QFXlNhXPKbFMh6pqJln0lvCJO/iJHg8XDnom+syvNiUhNFHqpgcMQFdl6SPDwz4fgn8uoCJNWPqScetT1+GDff5aD8uOycnBBgNBLXOSPj76xr8XL0qiAu8sxuT6mIJ0nKcjYdo3yuZReGtF4T1BlSkneW2NlpV67UFwE/2tXdPp5m887/cd7qgFH44Kk/zkZ0RIPKeExlVOUTCxFq7bjAtlVVHxu+Ccq0ET+AZ1wsPdNAB8D4vSkb3KyORa79vAfnABZqMchu3k2ix2MV3jScyPCJeB21X7ORFVnSlmjjn+gaG6lUZlfCEYrcYPffbgk35WAVujaKFrk/swNgM0sXpWBnVBTtbN9UL/dHD7viGIH+IY/cCZzr6W1+3fzkDKeYkVcQy5dKKULFSK1KiefITdQU25l8clwdAOBVgCHEBfMpMVdErw9FkbHQTTVPLzWToT57B87D99HuA37mbj/aKLtK1dxtCRlXBb4dIf+jZtWniDQnII4n4XHvBrIileTla5/7V5FohtgQIyPBS6FkYbl60AhXh2sK43WK+Ni0ZC8k6FmWDp6mpORPFKgHvO5CLaGDd+wa5u/ySxvsLdqtStEbEdo4wpKotggTJtAmDge3sC7GwtOFmNasGUgm4/kzis X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1188;31:GeSQ6lFYK1A0Zax/ytB8xtMEyXppsikXHkgAQpv8QAJLUEpBqVmr6Q1AFnwkL61thwNivY+fdd3iqqIgEUK4/Dufm4Poxse1+zof5tx3mHjiGqRopBbI1DIC+7wpdvA9omGFFN04EbmC4yo9mMeWi6JHVfuwNBJWyDGk8t8jWAqdu1l87Y9gT4rusrQZzk3yroF5yZLULnGh5KhZvx17zZ57FK8HVMcCBN9sRSkrLED9ilvRtIxYFQjH2ON3qAsVaXtLcq/zPETftdMLBhO8Lg==;20:jiNYDV/mAAUaiUXFkZ2lq95JP3DwVUpq7Odf5J8U9zpm02+OB+fk6QorXef4Fy3Lebp+DZtecRxgXTLYp7jjADXFLJh3CmMoW5vs2ZEpmxVovHMhfDRLSZD1l0SOSE2Y8ak+1iGYPS0kj5yYCXwFSQTs+BvSMoabMRV59GYTyy67LhBgjRrS9MBsvC3n1AIZv12RPs0XYTCL2pcTpx0jWTrDqXYod/4d5MhjejTj/C67FfKomPNkrJhyuz8kKE1x7WRjvhlh7ohxMMkyuZ6l+2dZma/7W8I1X16H+uu5jJJz2Tt/CVwMDgy114wA9nLoXlY2/nCzs3EfQgbylyGEVlfNFHuIy1zDlRlisWcFA1l6pwIGB9HD4q33SXD0cusCpUTfk0D6+aPZ0+l2y/yZkEtlX4ovM+IafIQa6VJT9bVLz+55fdzIas5X7CWnU7gtZdtkK7jTn2WD4Wr3LlfSw6tpjWUbKHlqnG9O5WxLFMg+BFM62OEdZwdN1E9/APMq X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(67672495146484)(211171220733660)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6041248)(20161123555025)(20161123560025)(20161123564025)(20161123562025)(6072148);SRVR:BN6PR15MB1188;BCL:0;PCL:0;RULEID:;SRVR:BN6PR15MB1188; X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1188;4:zM3q4boFG2mdetPNdad4n0fy9FY3NZv0yrQ/5woUjdZFqj+G23+Kf459w17qiL3S8LjOrprVDAy5Fs2FfcU0p8LTo0Ky5qmWiQ1XFCoB1FBzt1Kqx/Oc93AgRORFVmEh6T6V3CbNxbuOvnvOgOCaIWgOqJn3GcGYGw+4sTnv1D6gl3Pwq4HFGiAeWuMYNR6arBHpfsUPn4T5eGKPCNQ9+TlFcdpgA78jaCaT884NuXGlj0Pj2in6+YEnaJZiwVQIJmoDP0MrfZxwXz4wCelBn8Hi9jtqhbG2DvgUKyadXBpAiUajRyMKZNNKcS/HyXBW6qOX1yVGFVJr/q3kS98CLjUr51xmP8gqd+5r0cg/5vqWNExljgYUlS3COfMfogsMRx7VpNzyQ7UbY7cXqcrJMobcyGu10A1QAq/HPGVopopi5PIF5plrz/66St7g3kyR6IvOZpkCGM+R68/hA9O/xnhw+bJPnAyrZW6dlSrMTpvF7eDgCUmH1SlUzmtCd0k0LQ00E1nHzWqhGIiRwr+7HDwC8GTfoqSljtA4V9iKMHHvdYq3Pt042mjbFb/7HbBtsl2wUmodTN9lEWvJJJj1gEXJX9tOmyaJ47TgNwaexbxyRlbNvDte2a5fMNcPfoAejXnhb6ZblviuxgZXeOZQ1nxNKG7iqXUxWFo0Ypv8KgO5gstwDoQqGc+SFw36bb0P X-Forefront-PRVS: 01901B3451 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(7916002)(39410400002)(39450400003)(39830400002)(377454003)(24454002)(199003)(189002)(31686004)(2906002)(6916009)(31696002)(47776003)(6666003)(106356001)(65956001)(65806001)(230700001)(3846002)(2950100002)(6116002)(101416001)(81166006)(30001)(110136003)(25786008)(66066001)(230783001)(90366009)(93886004)(86362001)(6486002)(65826007)(81156014)(92566002)(8676002)(33646002)(105586002)(5660300001)(117156001)(83506001)(54906002)(36756003)(38730400001)(229853002)(42186005)(77096006)(76176999)(23746002)(50466002)(4001350100001)(189998001)(54356999)(97736004)(7736002)(64126003)(68736007)(50986999)(4326007)(305945005);DIR:OUT;SFP:1102;SCL:1;SRVR:BN6PR15MB1188;H:[192.168.102.217];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;BN6PR15MB1188;23:jbZWjC9ZtpKfpw27yuIyWYc82orI0Bvd4JUN4?= =?Windows-1252?Q?bT0jpcLtw5qjTL74rv6hOXrU+jXiGy6tDofNN6w8xjcBXYnXjTXTGtGv?= =?Windows-1252?Q?iH2UwmF9n82+Md7CcMEDkQD7jHqVV+Oqk2utRxMbbVR//hkyWvRnpbZV?= =?Windows-1252?Q?EvLfY57kKuUu9Yo30rJYx8ifdCLIRDvSEyhMLZEW4RGXEwS4jN54Fj7C?= =?Windows-1252?Q?msOvFRIkYkFUIXWCvzYS7PxNST1uxdrcEMhOjPeAbtMuRqaYM8lJb7bh?= =?Windows-1252?Q?cc+HimOHw5U+ilTfIe30rNDbvGIbuJtt7PkkClBM5A+35p0dXe8MW0sw?= =?Windows-1252?Q?YaB/73Jf0a6aWgkUKVKoHq4CsZfsawLNmdILZDLwhwcBW/RT0rvBD092?= =?Windows-1252?Q?RU1UxAb+Rj8BP8f1/v1LFXBcm0NUHTeG15+Fe5BMDDTkJOaSUGGn9oRE?= =?Windows-1252?Q?knVEhv3msjwwozC+opx/Ru3j8Is1/yWvPXrAXBOm9TJQ6QPwIOgEg9L+?= =?Windows-1252?Q?ynGVR3zl5RRf4FJ+AlvW/JsqC58x+B2phcXkD8+gAGgOv+bn/vhuPSr3?= =?Windows-1252?Q?tF7lOblPQVkD+M3inRTloOC+nt6+UW/HKU35ZTy0G/cx7uCZ5Zoq1Ofm?= =?Windows-1252?Q?nbrcoz426MsjbfWonquro1qiXuaXUP8a/V63AzWXKrccUsxug7eiTH+v?= =?Windows-1252?Q?SaoVuh49xlvRyeOwN0SoqYjKbJ8svLO2xnPesLiJ7J0a4eDUc3IyGpzn?= =?Windows-1252?Q?9kGxMqPVvS54C8Ao5rBoGVk9I9nHF23V6wZ0mfVFmhi8cbBMDS1lW8dG?= =?Windows-1252?Q?kfwAS7e5xdcxmpmlzgO4Z0sfX/QOAdHUoALlUkOHpd0ycVMK4y2QYs2o?= =?Windows-1252?Q?hFZ/yiSGzK9Rbk2FkDpzeSLdOlUp5oM+qLf/gFKiRRIC+7sFwZlbDvLE?= =?Windows-1252?Q?zWXeLSWAl1ljv5vdW4AsGxCSo8x+3U2SHrk3JupwoRoYejV+uEyLBobb?= =?Windows-1252?Q?w48+99BINm9MiaPEBNhf2ICCybwp0Llr6a+lTHbmnynrZabOtWlAxzjP?= =?Windows-1252?Q?OM/A8Poap3Z9H6WGujb9I39BhA/sdgAEQnzcsLA5vRS324nVa30h2Fq3?= =?Windows-1252?Q?YVWTTiCcBqyz88rDhYB1mGl2oIW2xItykI3Bz9a0z8uLCj6rhpdiEjOh?= =?Windows-1252?Q?DpehCDNd9m6MHu0qWfXssjEj29YctMGd9AEmgpbz8wSeJmN1bVjI2UAD?= =?Windows-1252?Q?2nwvVbIjzFcBU2qDI7P3IQiEG0MozOAoNXH9f1aPhqzB6125QI8YYVaF?= =?Windows-1252?Q?kQLB5flRSslzozbtGNRgDXl/COrKYZkcKf0F4tdIlMp1R/TsF/uWF+K0?= =?Windows-1252?Q?Z3+KJYwEGgCuPmLqeu/m2W/Y6FBYJl9rjlEYpWcKkL7Hg00ql+YD0/P/?= =?Windows-1252?Q?BB1g8FKquGZNXeRl6H7NrpjLB123p87erYx1e3fANnns4LFJeOO9cbtd?= =?Windows-1252?Q?pEJKjwJ12LoLPVWKOkLEkxu94bVDRrD4I9k7qzdBYhSmzKdQtsRMtagW?= =?Windows-1252?Q?uN3oDJem7/dMeKym7llDSUcYz40yeSHfjtnNprDJ5hltufo74xRfi4HE?= =?Windows-1252?B?QT09?= X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1188;6:Iw8LRCrvJz6n+oFBm4rQhSK3SEclB40q86AF+K70xPDX3ksAN9SKquwdgiVEKrCuTv0wC55c45ceAY5EsZDZhAkIm1IVmsDnwCf56S1LJYRvSI8lsyianvevFlmGlq/00Uz5snZHRnj7MeHSK6Zk9LGWPz89b2q2EF03FtEAK2U2BegTHvb+yb4fYIP5glHwVb5TlQFhVtXpFQloENFXIlpIiWNnAuEOd+nXb9uAlXWxDrHeGl63cSdmRIv20fRlDbqwNIPpWwicSBBhyuz4iBqlxpkZ6dlocayFXF2nkF89MV9qTqsFNGfLyHVIH7IM+pIAMjeU0zeJFHtKSC5AyHWPdn63/4zJjZQlgd+UaLuYg0MbFfGhncwRWyuLgX4hTg8VNTTcxtfeAvqZKcOY7T5S8mq9U+wyLpU7uyA2Lbw=;5:EFyM4y9va8f7hDrBQnS8YYgeWuEVIrfUTupgUBVds7LVeHFrpTz8V+NSex6lWmKqWM/6pVn0JB4bZwkydzMDvQF4Tt7/rvrPSBqNpq17ZnWcbwodiPw+K7rvkv5a0659m2LaX/oJRGRpzeyagBjIJ7YJlVQrBWhRsL5bdezBtjY=;24:jBLuI99ZL/K4DOVrDI140X6l3mshjSZGPW/ztovoujfSje+5D8XdHxVrJfYLgUjfXnxdeQr/Z1uXI/59tSF0sD+6XGdZyeFqqeC/X/vYDnA= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1188;7:1NxBh3muqoreciCx5kb0ZDuiA/sMDI4CS9V9nyT/SgwNIAEcEaqlHTC1o1D4xaUPaWUxAH2xn2QXdKisf8GmabysKkfOD5EYrkaqQxebnBWVoH8bHyZ22HOYCiLrgZiprziazOuXNQwa739oC0+PTsH4BxZHNSZWp5lFMW6Qzh7rX4GbtNzFRwi+wdtRr23VdIHKc20aqi0LevBrz0JX+Uf8mIq7JKYX1+4d/+mT4newRvf31SgRaThBdxPt+9eGKnFT+uVnoq1IRom5/Lav9ouJ2rSPSSZq6+XTSWyknR5wVO2q3luAEWqETVJoDsU/bFqFRmjK/YYPQa88p+oQVNkVeCgZHpctPvLdNTCa9w4U6JFsRhn69OLXE7po0y5Wwzg6wEw+xP2UzWiLt9fLYoYSGp4uRYCCNr7hTlhDWyLXwop/d5hhAOB7LBj8orgLT1ktFwOkbATvIKeaOd6jRw==;20:TPwDZ0v4zmqH2+0UasaxFRZ0JxlqyAa2QFt8vixEFd9NGq211kUf2exqI94zlgJtwrVdAkWbOdRmAJr7X0xMXXnCjeXE+1RNQcvEJic62q15D5XsEEXjVZpZKTVPLbxzABf4pdKcmJXFj0zdHNZJt+wliqiB0RzpQ8Gmy++0gqs= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Jan 2017 12:38:46.5287 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR15MB1188 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-01-17_08:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/17/2017 02:13 AM, Paolo Valente wrote: > >> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe ha scritto: >> >> On 12/22/2016 04:13 AM, Paolo Valente wrote: >>> >>>> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente ha scritto: >>>> >>>>> >>>>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe ha scritto: >>>>> >>>>> This adds a set of hooks that intercepts the blk-mq path of >>>>> allocating/inserting/issuing/completing requests, allowing >>>>> us to develop a scheduler within that framework. >>>>> >>>>> We reuse the existing elevator scheduler API on the registration >>>>> side, but augment that with the scheduler flagging support for >>>>> the blk-mq interfce, and with a separate set of ops hooks for MQ >>>>> devices. >>>>> >>>>> Schedulers can opt in to using shadow requests. Shadow requests >>>>> are internal requests that the scheduler uses for for the allocate >>>>> and insert part, which are then mapped to a real driver request >>>>> at dispatch time. This is needed to separate the device queue depth >>>>> from the pool of requests that the scheduler has to work with. >>>>> >>>>> Signed-off-by: Jens Axboe >>>>> >>>> ... >>>> >>>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >>>>> new file mode 100644 >>>>> index 000000000000..b7e1839d4785 >>>>> --- /dev/null >>>>> +++ b/block/blk-mq-sched.c >>>> >>>>> ... >>>>> +static inline bool >>>>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq, >>>>> + struct bio *bio) >>>>> +{ >>>>> + struct elevator_queue *e = q->elevator; >>>>> + >>>>> + if (e && e->type->ops.mq.allow_merge) >>>>> + return e->type->ops.mq.allow_merge(q, rq, bio); >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>> >>>> Something does not seem to add up here: >>>> e->type->ops.mq.allow_merge may be called only in >>>> blk_mq_sched_allow_merge, which, in its turn, may be called only in >>>> blk_mq_attempt_merge, which, finally, may be called only in >>>> blk_mq_merge_queue_io. Yet the latter may be called only if there is >>>> no elevator (line 1399 and 1507 in blk-mq.c). >>>> >>>> Therefore, e->type->ops.mq.allow_merge can never be called, both if >>>> there is and if there is not an elevator. Be patient if I'm missing >>>> something huge, but I thought it was worth reporting this. >>>> >>> >>> Just another detail: if e->type->ops.mq.allow_merge does get invoked >>> from the above path, then it is invoked of course without the >>> scheduler lock held. In contrast, if this function gets invoked >>> from dd_bio_merge, then the scheduler lock is held. >> >> But the scheduler controls that itself. So it'd be perfectly fine to >> have a locked and unlocked variant. The way that's typically done is to >> have function() grabbing the lock, and __function() is invoked with the >> lock held. >> >>> To handle this opposite alternatives, I don't know whether checking if >>> the lock is held (and possibly taking it) from inside >>> e->type->ops.mq.allow_merge is a good solution. In any case, before >>> possibly trying it, I will wait for some feedback on the main problem, >>> i.e., on the fact that e->type->ops.mq.allow_merge >>> seems unreachable in the above path. >> >> Checking if a lock is held is NEVER a good idea, as it leads to both bad >> and incorrect code. If you just check if a lock is held when being >> called, you don't necessarily know if it was the caller that grabbed it >> or it just happens to be held by someone else for unrelated reasons. >> >> > > Thanks a lot for this and the above explanations. Unfortunately, I > still see the problem. To hopefully make you waste less time, I have > reported the problematic paths explicitly, so that you can quickly > point me to my mistake. > > The problem is caused by the existence of at least the following two > alternative paths to e->type->ops.mq.allow_merge. > > 1. In mq-deadline.c (line 374): spin_lock(&dd->lock); > blk_mq_sched_try_merge -> elv_merge -> elv_bio_merge_ok -> > elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge > > 2. In blk-core.c (line 1660): spin_lock_irq(q->queue_lock); > elv_merge -> elv_bio_merge_ok -> > elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge > > In the first path, the scheduler lock is held, while in the second > path, it is not. This does not cause problems with mq-deadline, > because the latter just has no allow_merge function. Yet it does > cause problems with the allow_merge implementation of bfq. There was > no issue in blk, as only the global queue lock was used. > > Where am I wrong? #2 can never happen for blk-mq, it's the old IO path. blk-mq is never invoked with ->queue_lock held. -- Jens Axboe