From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Bart Van Assche To: "tj@kernel.org" , "axboe@kernel.dk" CC: "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "linux-block@vger.kernel.org" , "kernel-team@fb.com" , "oleg@redhat.com" , "hch@lst.de" , "jianchao.w.wang@oracle.com" , "osandov@fb.com" Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme Date: Thu, 14 Dec 2017 18:51:11 +0000 Message-ID: <1513277469.2475.43.camel@wdc.com> References: <20171212190134.535941-1-tj@kernel.org> <20171212190134.535941-3-tj@kernel.org> In-Reply-To: <20171212190134.535941-3-tj@kernel.org> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 List-ID: T24gVHVlLCAyMDE3LTEyLTEyIGF0IDExOjAxIC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IHJ1 bGVzLiAgVW5mb3J0dW5hdGxleSwgaXQgY29udGFpbnMgcXVpdGUgYSBmZXcgaG9sZXMuDQogICAg ICAgICAgXl5eXl5eXl5eXl5eXg0KICAgICAgICAgIFVuZm9ydHVuYXRlbHk/DQoNCj4gV2hpbGUg dGhpcyBjaGFuZ2UgbWFrZXMgUkVRX0FUT01fQ09NUExFVEUgc3luY2hvcm5pemF0aW9uIHVubmVj ZXNzYXJ5DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIF5eXl5e Xl5eXl5eXl5eXg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBz eW5jaHJvbml6YXRpb24/DQoNCj4gLS0tIGEvYmxvY2svYmxrLWNvcmUuYw0KPiArKysgYi9ibG9j ay9ibGstY29yZS5jDQo+IEBAIC0xMjYsNiArMTI2LDggQEAgdm9pZCBibGtfcnFfaW5pdChzdHJ1 Y3QgcmVxdWVzdF9xdWV1ZSAqcSwgc3RydWN0IHJlcXVlc3QgKnJxKQ0KPiAgCXJxLT5zdGFydF90 aW1lID0gamlmZmllczsNCj4gIAlzZXRfc3RhcnRfdGltZV9ucyhycSk7DQo+ICAJcnEtPnBhcnQg PSBOVUxMOw0KPiArCXNlcWNvdW50X2luaXQoJnJxLT5nc3RhdGVfc2VxKTsNCj4gKwl1NjRfc3Rh dHNfaW5pdCgmcnEtPmFib3J0ZWRfZ3N0YXRlX3N5bmMpOw0KPiAgfQ0KPiAgRVhQT1JUX1NZTUJP TChibGtfcnFfaW5pdCk7DQoNClNvcnJ5IGJ1dCB0aGUgYWJvdmUgY2hhbmdlIGxvb2tzIHVnbHkg dG8gbWUuIE15IHVuZGVyc3RhbmRpbmcgaXMgdGhhdCANCmJsa19ycV9pbml0KCkgaXMgb25seSB1 c2VkIGluc2lkZSB0aGUgYmxvY2sgbGF5ZXIgdG8gaW5pdGlhbGl6ZSBsZWdhY3kgYmxvY2sNCmxh eWVyIHJlcXVlc3RzIHdoaWxlIGdzdGF0ZV9zZXEgYW5kIGFib3J0ZWRfZ3N0YXRlX3N5bmMgYXJl IG9ubHkgcmVsZXZhbnQNCmZvciBibGstbXEgcmVxdWVzdHMuIFdvdWxkbid0IGl0IGJlIGJldHRl ciB0byBhdm9pZCB0aGF0IGJsa19ycV9pbml0KCkgaXMNCmNhbGxlZCBmb3IgYmxrLW1xIHJlcXVl c3RzIHN1Y2ggdGhhdCB0aGUgYWJvdmUgY2hhbmdlIGNhbiBiZSBsZWZ0IG91dD8gVGhlDQpvbmx5 IGNhbGxlcnMgb3V0c2lkZSB0aGUgYmxvY2sgbGF5ZXIgY29yZSBvZiBibGtfcnFfaW5pdCgpIEkg a25vdyBvZiBhcmUNCmlkZV9wcmVwX3NlbnNlKCkgYW5kIHNjc2lfaW9jdGxfcmVzZXQoKS4gSSBj YW4gaGVscCB3aXRoIGNvbnZlcnRpbmcgdGhlIFNDU0kNCmNvZGUgaWYgeW91IHdhbnQuDQoNCj4g Kwl3cml0ZV9zZXFjb3VudF9iZWdpbigmcnEtPmdzdGF0ZV9zZXEpOw0KPiArCWJsa19tcV9ycV91 cGRhdGVfc3RhdGUocnEsIE1RX1JRX0lOX0ZMSUdIVCk7DQo+ICsJYmxrX2FkZF90aW1lcihycSk7 DQo+ICsJd3JpdGVfc2VxY291bnRfZW5kKCZycS0+Z3N0YXRlX3NlcSk7DQoNCk15IHVuZGVyc3Rh bmRpbmcgaXMgdGhhdCBib3RoIHdyaXRlX3NlcWNvdW50X2JlZ2luKCkgYW5kIHdyaXRlX3NlcWNv dW50X2VuZCgpDQp0cmlnZ2VyIGEgd3JpdGUgbWVtb3J5IGJhcnJpZXIuIElzIGEgc2VxY291bnQg cmVhbGx5IGZhc3RlciB0aGFuIGEgc3BpbmxvY2s/DQoNCj4gDQo+IEBAIC03OTIsNiArODExLDE0 IEBAIHZvaWQgYmxrX21xX3JxX3RpbWVkX291dChzdHJ1Y3QgcmVxdWVzdCAqcmVxLCBib29sIHJl c2VydmVkKQ0KPiAgCQlfX2Jsa19tcV9jb21wbGV0ZV9yZXF1ZXN0KHJlcSk7DQo+ICAJCWJyZWFr Ow0KPiAgCWNhc2UgQkxLX0VIX1JFU0VUX1RJTUVSOg0KPiArCQkvKg0KPiArCQkgKiBBcyBub3Ro aW5nIHByZXZlbnRzIGZyb20gY29tcGxldGlvbiBoYXBwZW5pbmcgd2hpbGUNCj4gKwkJICogLT5h Ym9ydGVkX2dzdGF0ZSBpcyBzZXQsIHRoaXMgbWF5IGxlYWQgdG8gaWdub3JlZA0KPiArCQkgKiBj b21wbGV0aW9ucyBhbmQgZnVydGhlciBzcHVyaW91cyB0aW1lb3V0cy4NCj4gKwkJICovDQo+ICsJ CXU2NF9zdGF0c191cGRhdGVfYmVnaW4oJnJlcS0+YWJvcnRlZF9nc3RhdGVfc3luYyk7DQo+ICsJ CXJlcS0+YWJvcnRlZF9nc3RhdGUgPSAwOw0KPiArCQl1NjRfc3RhdHNfdXBkYXRlX2VuZCgmcmVx LT5hYm9ydGVkX2dzdGF0ZV9zeW5jKTsNCg0KSWYgYSBibGstbXEgcmVxdWVzdCBpcyByZXN1Ym1p dHRlZCAyKio2MiB0aW1lcywgY2FuIHRoYXQgcmVzdWx0IGluIHRoZSBhYm92ZQ0KY29kZSBzZXR0 aW5nIGFib3J0ZWRfZ3N0YXRlIHRvIHRoZSBzYW1lIHZhbHVlIGFzIGdzdGF0ZT8gSXNuJ3QgdGhh dCBhIGJ1Zz8NCklmIHNvLCBob3cgYWJvdXQgc2V0dGluZyBhYm9ydGVkX2dzdGF0ZSBpbiB0aGUg YWJvdmUgY29kZSB0byBlLmcuIGdzdGF0ZSBeICgyKio2Myk/DQoNCj4gQEAgLTIyOCw2ICsyMzAs MjcgQEAgc3RydWN0IHJlcXVlc3Qgew0KPiAgDQo+ICAJdW5zaWduZWQgc2hvcnQgd3JpdGVfaGlu dDsNCj4gIA0KPiArCS8qDQo+ICsJICogT24gYmxrLW1xLCB0aGUgbG93ZXIgYml0cyBvZiAtPmdz dGF0ZSBjYXJyeSB0aGUgTVFfUlFfKiBzdGF0ZQ0KPiArCSAqIHZhbHVlIGFuZCB0aGUgdXBwZXIg Yml0cyB0aGUgZ2VuZXJhdGlvbiBudW1iZXIgd2hpY2ggaXMNCj4gKwkgKiBtb25vdG9uaWNhbGx5 IGluY3JlbWVudGVkIGFuZCB1c2VkIHRvIGRpc3Rpbmd1aXNoIHRoZSByZXVzZQ0KPiArCSAqIGlu c3RhbmNlcy4NCj4gKwkgKg0KPiArCSAqIC0+Z3N0YXRlX3NlcSBhbGxvd3MgdXBkYXRlcyB0byAt PmdzdGF0ZSBhbmQgb3RoZXIgZmllbGRzDQo+ICsJICogKGN1cnJlbnRseSAtPmRlYWRsaW5lKSBk dXJpbmcgcmVxdWVzdCBzdGFydCB0byBiZSByZWFkDQo+ICsJICogYXRvbWljYWxseSBmcm9tIHRo ZSB0aW1lb3V0IHBhdGgsIHNvIHRoYXQgaXQgY2FuIG9wZXJhdGUgb24gYQ0KPiArCSAqIGNvaGVy ZW50IHNldCBvZiBpbmZvcm1hdGlvbi4NCj4gKwkgKi8NCj4gKwlzZXFjb3VudF90IGdzdGF0ZV9z ZXE7DQo+ICsJdTY0IGdzdGF0ZTsNCj4gKw0KPiArCS8qDQo+ICsJICogLT5hYm9ydGVkX2dzdGF0 ZSBpcyB1c2VkIGJ5IHRoZSB0aW1lb3V0IHRvIGNsYWltIGEgc3BlY2lmaWMNCj4gKwkgKiByZWN5 Y2xlIGluc3RhbmNlIG9mIHRoaXMgcmVxdWVzdC4gIFNlZSBibGtfbXFfdGltZW91dF93b3JrKCku DQo+ICsJICovDQo+ICsJc3RydWN0IHU2NF9zdGF0c19zeW5jIGFib3J0ZWRfZ3N0YXRlX3N5bmM7 DQo+ICsJdTY0IGFib3J0ZWRfZ3N0YXRlOw0KPiArDQo+ICAJdW5zaWduZWQgbG9uZyBkZWFkbGlu ZTsNCj4gIAlzdHJ1Y3QgbGlzdF9oZWFkIHRpbWVvdXRfbGlzdDsNCg0KV2h5IGFyZSBnc3RhdGUg YW5kIGFib3J0ZWRfZ3N0YXRlIDY0LWJpdCB2YXJpYWJsZXM/IFdoYXQgbWFrZXMgeW91IHRoaW5r IHRoYXQNCjMyIGJpdHMgd291bGQgbm90IGJlIGVub3VnaD8NCg0KVGhhbmtzLA0KDQpCYXJ0Lg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754120AbdLNSvQ (ORCPT ); Thu, 14 Dec 2017 13:51:16 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:49006 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753800AbdLNSvO (ORCPT ); Thu, 14 Dec 2017 13:51:14 -0500 X-IronPort-AV: E=Sophos;i="5.45,401,1508774400"; d="scan'208";a="65511402" From: Bart Van Assche To: "tj@kernel.org" , "axboe@kernel.dk" CC: "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "linux-block@vger.kernel.org" , "kernel-team@fb.com" , "oleg@redhat.com" , "hch@lst.de" , "jianchao.w.wang@oracle.com" , "osandov@fb.com" Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme Thread-Topic: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme Thread-Index: AQHTc3vSEJieVXIS9EOPhdZv8hd1dKNDMgqA Date: Thu, 14 Dec 2017 18:51:11 +0000 Message-ID: <1513277469.2475.43.camel@wdc.com> References: <20171212190134.535941-1-tj@kernel.org> <20171212190134.535941-3-tj@kernel.org> In-Reply-To: <20171212190134.535941-3-tj@kernel.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [199.255.44.171] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;BY1PR0401MB1530;20:3i+5nBlC5oMIPX0olsXPZsam5aAiJnL9tX5UolZaAPrgyV4ehrphjxPZumzpC6hyPBkD7shkKspkEhVveaMXvu9s0h1YeojmLa49hMS+H6lXaxoMGfpdBejmxPJV5d8TTPHu8peF4O4zB7t0TybVVMaXo0D/zOBd2XSWRdOaiBs= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: c09de200-ee9a-4f66-92c9-08d54323a534 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(48565401081)(2017052603307)(7153051);SRVR:BY1PR0401MB1530; x-ms-traffictypediagnostic: BY1PR0401MB1530: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(788757137089)(17755550239193); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(2401047)(5005006)(8121501046)(10201501046)(3002001)(93006095)(93001095)(3231023)(6055026)(6041248)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123555025)(20161123564025)(20161123560025)(6072148)(201708071742011);SRVR:BY1PR0401MB1530;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:BY1PR0401MB1530; x-forefront-prvs: 05214FD68E x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(366004)(39860400002)(346002)(376002)(396003)(377424004)(24454002)(199004)(189003)(53936002)(478600001)(72206003)(54906003)(2900100001)(86362001)(3660700001)(6246003)(105586002)(3280700002)(106356001)(14454004)(316002)(5250100002)(110136005)(25786009)(99286004)(6512007)(103116003)(97736004)(7416002)(66066001)(5660300001)(8676002)(81166006)(81156014)(36756003)(4001150100001)(7736002)(229853002)(59450400001)(76176011)(6436002)(6506007)(6486002)(305945005)(2501003)(102836003)(68736007)(6116002)(3846002)(8936002)(4326008)(2906002)(2950100002);DIR:OUT;SFP:1102;SCL:1;SRVR:BY1PR0401MB1530;H:BY1PR0401MB1532.namprd04.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: <6EB1DF471A072F4DBF2DACE3295024BB@namprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: c09de200-ee9a-4f66-92c9-08d54323a534 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Dec 2017 18:51:11.5364 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR0401MB1530 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id vBEIpVnE009878 On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > rules. Unfortunatley, it contains quite a few holes. ^^^^^^^^^^^^^ Unfortunately? > While this change makes REQ_ATOM_COMPLETE synchornization unnecessary ^^^^^^^^^^^^^^^ synchronization? > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -126,6 +126,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq) > rq->start_time = jiffies; > set_start_time_ns(rq); > rq->part = NULL; > + seqcount_init(&rq->gstate_seq); > + u64_stats_init(&rq->aborted_gstate_sync); > } > EXPORT_SYMBOL(blk_rq_init); Sorry but the above change looks ugly to me. My understanding is that blk_rq_init() is only used inside the block layer to initialize legacy block layer requests while gstate_seq and aborted_gstate_sync are only relevant for blk-mq requests. Wouldn't it be better to avoid that blk_rq_init() is called for blk-mq requests such that the above change can be left out? The only callers outside the block layer core of blk_rq_init() I know of are ide_prep_sense() and scsi_ioctl_reset(). I can help with converting the SCSI code if you want. > + write_seqcount_begin(&rq->gstate_seq); > + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT); > + blk_add_timer(rq); > + write_seqcount_end(&rq->gstate_seq); My understanding is that both write_seqcount_begin() and write_seqcount_end() trigger a write memory barrier. Is a seqcount really faster than a spinlock? > > @@ -792,6 +811,14 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved) > __blk_mq_complete_request(req); > break; > case BLK_EH_RESET_TIMER: > + /* > + * As nothing prevents from completion happening while > + * ->aborted_gstate is set, this may lead to ignored > + * completions and further spurious timeouts. > + */ > + u64_stats_update_begin(&req->aborted_gstate_sync); > + req->aborted_gstate = 0; > + u64_stats_update_end(&req->aborted_gstate_sync); If a blk-mq request is resubmitted 2**62 times, can that result in the above code setting aborted_gstate to the same value as gstate? Isn't that a bug? If so, how about setting aborted_gstate in the above code to e.g. gstate ^ (2**63)? > @@ -228,6 +230,27 @@ struct request { > > unsigned short write_hint; > > + /* > + * On blk-mq, the lower bits of ->gstate carry the MQ_RQ_* state > + * value and the upper bits the generation number which is > + * monotonically incremented and used to distinguish the reuse > + * instances. > + * > + * ->gstate_seq allows updates to ->gstate and other fields > + * (currently ->deadline) during request start to be read > + * atomically from the timeout path, so that it can operate on a > + * coherent set of information. > + */ > + seqcount_t gstate_seq; > + u64 gstate; > + > + /* > + * ->aborted_gstate is used by the timeout to claim a specific > + * recycle instance of this request. See blk_mq_timeout_work(). > + */ > + struct u64_stats_sync aborted_gstate_sync; > + u64 aborted_gstate; > + > unsigned long deadline; > struct list_head timeout_list; Why are gstate and aborted_gstate 64-bit variables? What makes you think that 32 bits would not be enough? Thanks, Bart.