From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready Date: Mon, 10 Feb 2014 13:10:07 -0800 Message-ID: <1392066607.19708.5.camel@dabdike.int.hansenpartnership.com> References: <20140205123930.150608699@bombadil.infradead.org> <20140205124021.286457268@bombadil.infradead.org> <1391705819.22335.8.camel@dabdike> <20140210113932.GA31405@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: 7bit Return-path: Received: from bedivere.hansenpartnership.com ([66.63.167.143]:46658 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752369AbaBJVKJ (ORCPT ); Mon, 10 Feb 2014 16:10:09 -0500 In-Reply-To: <20140210113932.GA31405@infradead.org> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Christoph Hellwig Cc: Jens Axboe , Nicholas Bellinger , linux-scsi@vger.kernel.org On Mon, 2014-02-10 at 03:39 -0800, Christoph Hellwig wrote: > On Thu, Feb 06, 2014 at 08:56:59AM -0800, James Bottomley wrote: > > I'm dubious about replacing a locked set of checks and increments with > > atomics for the simple reason that atomics are pretty expensive on > > non-x86, so you've likely slowed the critical path down for them. Even > > on x86, atomics can be very expensive because of the global bus lock. I > > think about three of them in a row is where you might as well stick with > > the lock. > > The three of them replace two locks at least when using blk-mq. Until > we use blk-mq and those avoid the queue_lock we could keep the > per-device counters as-is. I'm not saying never, I'm just really dubious about this bit and the potential cost on other platforms ... at the very least, surely the device busy can still be a counter because of the current threading guarantees? > As Bart's numbers have shown this defintively shows a major improvement > on x86, for other architecture we'd need someone to run benchmarks > on useful hardware. Maybe some of the IBM people on the list could > help out on PPC and S/390? Well, no, they haven't. The number were an assertion not a benchmark and 3.8% can be in the error margin of anybody's tests. Even if the actual benchmark were published, I can't see it being convincing because there's too much potential variation (other architectures, different ways of testing) for such a small result. I'm happy to take shortening critical sections and removing atomics in get/put because they're obvious wins without needing benchmark justification. I don't really want to take the dubious stuff (like spin lock replacement with atomic) until we see the shape of what we can do with the block mq stuff and what's necessary and what isn't. James