From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Bart Van Assche To: "snitzer@redhat.com" CC: "dm-devel@redhat.com" , "linux-kernel@vger.kernel.org" , "hch@infradead.org" , "linux-block@vger.kernel.org" , "osandov@fb.com" , "axboe@kernel.dk" , "ming.lei@redhat.com" Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Date: Thu, 18 Jan 2018 22:20:13 +0000 Message-ID: <1516314012.2676.76.camel@wdc.com> References: <20180118170353.GB19734@redhat.com> <1516296056.2676.23.camel@wdc.com> <20180118183039.GA20121@redhat.com> <1516301278.2676.35.camel@wdc.com> <20180118204856.GA31679@redhat.com> <1516309128.2676.38.camel@wdc.com> <20180118212327.GB31679@redhat.com> <1516311554.2676.50.camel@wdc.com> <20180118220132.GA20860@redhat.com> In-Reply-To: <20180118220132.GA20860@redhat.com> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 List-ID: T24gVGh1LCAyMDE4LTAxLTE4IGF0IDE3OjAxIC0wNTAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+ IEFuZCB5ZXQgTGF1cmVuY2UgY2Fubm90IHJlcHJvZHVjZSBhbnkgc3VjaCBsb2NrdXBzIHdpdGgg eW91ciB0ZXN0Li4uDQoNCkhtbSAuLi4gbWF5YmUgSSBtaXN1bmRlcnN0b29kIExhdXJlbmNlIGJ1 dCBJIGRvbid0IHRoaW5rIHRoYXQgTGF1cmVuY2UgaGFzDQphbHJlYWR5IHN1Y2NlZWRlZCBhdCBy dW5uaW5nIGFuIHVubW9kaWZpZWQgdmVyc2lvbiBvZiBteSB0ZXN0cy4gSW4gb25lIG9mIHRoZQ0K ZS1tYWlscyBMYXVyZW5jZSBzZW50IG1lIHRoaXMgbW9ybmluZyBJIHJlYWQgdGhhdCBoZSBtb2Rp ZmllZCB0aGVzZSBzY3JpcHRzDQp0byBnZXQgcGFzdCBhIGtlcm5lbCBtb2R1bGUgdW5sb2FkIGZh aWx1cmUgdGhhdCB3YXMgcmVwb3J0ZWQgd2hpbGUgc3RhcnRpbmcNCnRoZXNlIHRlc3RzLiBTbyB0 aGUgbmV4dCBzdGVwIGlzIHRvIGNoZWNrIHdoaWNoIGNoYW5nZXMgd2VyZSBtYWRlIHRvIHRoZSB0 ZXN0DQpzY3JpcHRzIGFuZCBhbHNvIHdoZXRoZXIgdGhlIHRlc3QgcmVzdWx0cyBhcmUgc3RpbGwg dmFsaWQuDQoNCj4gQXJlIHlvdSBhYnNvbHV0ZWx5IGNlcnRhaW4gdGhpcyBwYXRjaCBkb2Vzbid0 IGhlbHAgeW91Pw0KPiBodHRwczovL3BhdGNod29yay5rZXJuZWwub3JnL3BhdGNoLzEwMTc0MDM3 Lw0KPiANCj4gSWYgaXQgZG9lc24ndCB0aGVuIHRoYXQgaXMgYWN0dWFsbHkgdmVyeSB1c2VmdWwg dG8ga25vdy4NCg0KVGhlIGZpcnN0IEkgdHJpZWQgdGhpcyBtb3JuaW5nIGlzIHRvIHJ1biB0aGUg c3JwLXRlc3Qgc29mdHdhcmUgYWdhaW5zdCBhIG1lcmdlDQpvZiBKZW5zJyBmb3ItbmV4dCBicmFu Y2ggYW5kIHlvdXIgZG0tNC4xNiBicmFuY2guIFNpbmNlIEkgbm90aWNlZCB0aGF0IHRoZSBkbQ0K cXVldWUgbG9ja2VkIHVwIEkgcmVpbnNlcnRlZCBhIGJsa19tcV9kZWxheV9ydW5faHdfcXVldWUo KSBjYWxsIGluIHRoZSBkbSBjb2RlLg0KU2luY2UgZXZlbiB0aGF0IHdhcyBub3Qgc3VmZmljaWVu dCBJIHRyaWVkIHRvIGtpY2sgdGhlIHF1ZXVlcyB2aWEgZGVidWdmcyAoZm9yDQpzIGluIC9zeXMv a2VybmVsL2RlYnVnL2Jsb2NrLyovc3RhdGU7IGRvIGVjaG8ga2ljayA+JHM7IGRvbmUpLiBTaW5j ZSB0aGF0IHdhcw0Kbm90IHN1ZmZpY2llbnQgdG8gcmVzb2x2ZSB0aGUgcXVldWUgc3RhbGwgSSBy ZXZlcnRlZCB0aGUgZm9sbG93aW5nIHRyZWUgcGF0Y2hlcw0KdGhhdCBhcmUgaW4gSmVucycgdHJl ZToNCiogImJsay1tcTogaW1wcm92ZSBETSdzIGJsay1tcSBJTyBtZXJnaW5nIHZpYSBibGtfaW5z ZXJ0X2Nsb25lZF9yZXF1ZXN0IGZlZWRiYWNrIg0KKiAiYmxrLW1xLXNjaGVkOiByZW1vdmUgdW51 c2VkICdjYW5fYmxvY2snIGFyZyBmcm9tIGJsa19tcV9zY2hlZF9pbnNlcnRfcmVxdWVzdCINCiog ImJsay1tcTogZG9uJ3QgZGlzcGF0Y2ggcmVxdWVzdCBpbiBibGtfbXFfcmVxdWVzdF9kaXJlY3Rf aXNzdWUgaWYgcXVldWUgaXMgYnVzeSINCg0KT25seSBhZnRlciBJIGhhZCBkb25lIHRoaXMgdGhl IHNycC10ZXN0IHNvZnR3YXJlIHJhbiBhZ2FpbiB3aXRob3V0IHRyaWdnZXJpbmcNCmRtIHF1ZXVl IGxvY2t1cHMuIFNvcnJ5IGJ1dCBJIGhhdmUgbm90IHlldCBoYWQgdGhlIHRpbWUgdG8gdGVzdCBw YXRjaCAiW1JGQ10NCmJsay1tcTogZml4dXAgUkVTVEFSVCB3aGVuIHF1ZXVlIGJlY29tZXMgaWRs ZSIuDQoNCj4gUGxlYXNlIGp1c3QgZm9jdXMgb24gaGVscGluZyBMYXVyZW5jZSBnZXQgaGlzIHZl cnkgY2FwYWJsZSB0ZXN0YmVkIHRvDQo+IHJlcHJvZHVjZSB0aGlzIGlzc3VlLiAgT25jZSB3ZSBj YW4gcmVwcm9kdWNlIHRoZXNlICJ1bmtpbGxhYmxlIiAic3RhbGxzIg0KPiBpbi1ob3VzZSBpdCds bCBiZSBfbXVjaF8gZWFzaWVyIHRvIGFuYWx5emUgYW5kIGZpeC4NCg0KT0ssIEkgd2lsbCB3b3Jr IHdpdGggTGF1cmVuY2Ugb24gdGhpcy4gTWF5YmUgTGF1cmVuY2UgYW5kIEkgc2hvdWxkIHdvcmsg b24gdGhpcw0KYmVmb3JlIGFuYWx5emluZyB0aGUgbG9ja3VwIHRoYXQgd2FzIG1lbnRpb25lZCBh Ym92ZSBmdXJ0aGVyPw0KDQpCYXJ0Lg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932457AbeARWUZ (ORCPT ); Thu, 18 Jan 2018 17:20:25 -0500 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:8239 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932362AbeARWUR (ORCPT ); Thu, 18 Jan 2018 17:20:17 -0500 X-IronPort-AV: E=Sophos;i="5.46,378,1511798400"; d="scan'208";a="69477782" From: Bart Van Assche To: "snitzer@redhat.com" CC: "dm-devel@redhat.com" , "linux-kernel@vger.kernel.org" , "hch@infradead.org" , "linux-block@vger.kernel.org" , "osandov@fb.com" , "axboe@kernel.dk" , "ming.lei@redhat.com" Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Thread-Topic: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Thread-Index: AQHTkKfyH9yE3b9g50uHm4ZmGS9HzaN6M7MA Date: Thu, 18 Jan 2018 22:20:13 +0000 Message-ID: <1516314012.2676.76.camel@wdc.com> References: <20180118170353.GB19734@redhat.com> <1516296056.2676.23.camel@wdc.com> <20180118183039.GA20121@redhat.com> <1516301278.2676.35.camel@wdc.com> <20180118204856.GA31679@redhat.com> <1516309128.2676.38.camel@wdc.com> <20180118212327.GB31679@redhat.com> <1516311554.2676.50.camel@wdc.com> <20180118220132.GA20860@redhat.com> In-Reply-To: <20180118220132.GA20860@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [199.255.44.172] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;CY1PR0401MB1323;7:gZc7Odc/Xlj+JQBzAQlqFJVou2nZh4km63ValPlXYtJ8J/eYLeyCo+RVPGk2WwjGGwk3oVFwwc9wCOcgyUSoaRJP0EqJCMAL3IW7XHUT7vGc+tOdttFW+OLgeVznfeZLpBW1skKCHAVuLE16XBOkrAS+ikYK3qcr+0RYEsnb+6Q5b0o9YVrz2FKWM49V+DyCQU66L9cR4wieYjX+kk41rCQMzmWIC6XJ4TqwrkPQMabUa5zcn1sRbEYYRcNdytiy;20:6FecVkP3L9Pbw3PQWudgUlerTXxvzTp2oClG+784FVbQtAwaZAIJK4MOmg0J3ZPWboLZfpllRDuYo9RP8xfCz+550ACp2criOb8Vu8AVhyMOqLPvVBVT2XZLdynQsjTj0oH/726lK55EycmZJny6rtNZF4pUQWQYx0Vc+Ad88sI= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: e8b68199-32c9-4aba-6e4c-08d55ec1a57c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(5600026)(4604075)(3008032)(48565401081)(2017052603307)(7153060)(7193020);SRVR:CY1PR0401MB1323; x-ms-traffictypediagnostic: CY1PR0401MB1323: wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040470)(2401047)(5005006)(8121501046)(3002001)(10201501046)(93006095)(93001095)(3231023)(2400067)(944501161)(6055026)(6041268)(20161123558120)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123562045)(6072148)(201708071742011);SRVR:CY1PR0401MB1323;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:CY1PR0401MB1323; x-forefront-prvs: 05568D1FF7 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(39850400004)(376002)(366004)(39380400002)(346002)(396003)(189003)(199004)(377424004)(54906003)(99286004)(5640700003)(6436002)(93886005)(6512007)(6306002)(316002)(6486002)(66066001)(102836004)(2906002)(3846002)(59450400001)(53936002)(76176011)(7736002)(103116003)(966005)(5660300001)(305945005)(2351001)(478600001)(3280700002)(2950100002)(72206003)(3660700001)(229853002)(2900100001)(77096007)(8676002)(68736007)(81166006)(14454004)(81156014)(1730700003)(8936002)(86362001)(25786009)(97736004)(36756003)(6506007)(6246003)(6916009)(26005)(106356001)(2501003)(105586002)(4326008)(6116002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR0401MB1323;H:CY1PR0401MB1536.namprd04.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; x-microsoft-antispam-message-info: 9t4J4S2zdFm87+KwIbQUfpjUQgAIqz/mHCIIT295zMce6QMXGMYTsnUkenPInin5jXcf9FQgPtWYIG2TgETwJA== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: e8b68199-32c9-4aba-6e4c-08d55ec1a57c X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jan 2018 22:20:13.9950 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1323 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id w0IMKU66009605 On Thu, 2018-01-18 at 17:01 -0500, Mike Snitzer wrote: > And yet Laurence cannot reproduce any such lockups with your test... Hmm ... maybe I misunderstood Laurence but I don't think that Laurence has already succeeded at running an unmodified version of my tests. In one of the e-mails Laurence sent me this morning I read that he modified these scripts to get past a kernel module unload failure that was reported while starting these tests. So the next step is to check which changes were made to the test scripts and also whether the test results are still valid. > Are you absolutely certain this patch doesn't help you? > https://patchwork.kernel.org/patch/10174037/ > > If it doesn't then that is actually very useful to know. The first I tried this morning is to run the srp-test software against a merge of Jens' for-next branch and your dm-4.16 branch. Since I noticed that the dm queue locked up I reinserted a blk_mq_delay_run_hw_queue() call in the dm code. Since even that was not sufficient I tried to kick the queues via debugfs (for s in /sys/kernel/debug/block/*/state; do echo kick >$s; done). Since that was not sufficient to resolve the queue stall I reverted the following tree patches that are in Jens' tree: * "blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback" * "blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_request" * "blk-mq: don't dispatch request in blk_mq_request_direct_issue if queue is busy" Only after I had done this the srp-test software ran again without triggering dm queue lockups. Sorry but I have not yet had the time to test patch "[RFC] blk-mq: fixup RESTART when queue becomes idle". > Please just focus on helping Laurence get his very capable testbed to > reproduce this issue. Once we can reproduce these "unkillable" "stalls" > in-house it'll be _much_ easier to analyze and fix. OK, I will work with Laurence on this. Maybe Laurence and I should work on this before analyzing the lockup that was mentioned above further? Bart.