From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756568AbaAFVre (ORCPT ); Mon, 6 Jan 2014 16:47:34 -0500 Received: from mail-pd0-f176.google.com ([209.85.192.176]:35127 "EHLO mail-pd0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756523AbaAFVr3 (ORCPT ); Mon, 6 Jan 2014 16:47:29 -0500 Date: Mon, 6 Jan 2014 13:47:26 -0800 From: Kent Overstreet To: Jens Axboe Cc: Shaohua Li , linux-kernel@vger.kernel.org, hch@infradead.org Subject: Re: [patch 1/2]percpu_ida: fix a live lock Message-ID: <20140106214726.GD9037@kmo> References: <20131231033827.GA31994@kernel.org> <20140104210804.GA24199@kmo-pixel> <20140105131300.GB4186@kernel.org> <20140106204641.GB9037@kmo> <52CB1783.4050205@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52CB1783.4050205@kernel.dk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 06, 2014 at 01:52:19PM -0700, Jens Axboe wrote: > On 01/06/2014 01:46 PM, Kent Overstreet wrote: > > On Sun, Jan 05, 2014 at 09:13:00PM +0800, Shaohua Li wrote: > > >>> - we explicitly don't guarantee that all > >>> the tags will be available for allocation at any given time, only half > >>> of them. > >> > >> only half of the tags can be used? this is scaring. Of course we hope all tags > >> are available. > > > > No: that behaviour is explicitly documented and it's the behaviour we want. > > That is going to end horribly, for the cases (ie most cases) where we > really want to be able to use all tags. If we can't support that > reliably or in a fast manner with percpu ida, then that's pretty close > to a show stopper. And I suspect that will be the case for most use > cases. We can't just feasibly throw away half of the tag space, that's > crazy. Sounds like we're coming at this from different use cases, maybe we're going to need to add some flexibility for different use cases. For background, I wrote this code for SSDs where the you get 16 bits for the tag ID; we really don't want the queue to be anywhere near that deep so potentially wasting half of however many tags you have is no big deal. If you have a device where the max tag id is small enough that you really do need to use the entire tag space... yeah, that's going to be an issue. > But that wont work at all. To take a concrete example, lets assume the > device is some sort of NCQ. We have 0..31 tags. If we throw away half of > those tags, we're at 16. You can't double the tag space in software. Ok, so I hadn't really given any thought to that kind of use case; insofar as I had I would've been skeptical percpu tag allocation made sense for 32 different tags at all. We really don't want to screw over the users that aren't so constrained by the size of their tag space; there really is a huge performance tradeoff here (otherwise you're stealing tags and bouncing cachelines for _every_ tag allocation when the queue is full, and your percpu tag allocation is no longer very percpu). I'm not sure what the best strategy is for NCQ-type max nr_tags, though - thoughts? Easy thing to do for now is just to add another parameter to percpu_ida_init() for the number of tags that are allowed to sit unused on other cpu's freelists - users that have large relatively unbounded nr_tags can set that to nr_tags / 2, for NCQ you'd set it to 0. I suspect we can do better for NCQ though, w.r.t. worst case performance.